Skip to content

Day 053 — pprof Quickstart

Month 2 · Week 4 · ⬅ Day 052 · Day 054 ➡ · Journal index

🎯 Learning Objective

Capture CPU and heap profiles with runtime/pprof (file-based) and net/http/pprof (live endpoints), then read them with go tool pprof to find the hot spot before optimizing.

📚 Topics

  • runtime/pprof: StartCPUProfile/StopCPUProfile, WriteHeapProfile
  • net/http/pprof: the blank-import that registers /debug/pprof/*
  • Reading profiles: go tool pprof -top, -http, flat vs cum

📖 Reading / Sources

📝 Notes

  • A profile is a statistical sample of where the program spends a resource. The CPU profile interrupts ~100×/sec and records the stack; the heap profile is a snapshot of live (and cumulative) allocations. → links to [[performance]] and [[benchmarks]].
  • File mode (runtime/pprof): f, _ := os.Create("cpu.prof"); pprof.StartCPUProfile(f); /* work */; pprof.StopCPUProfile(). The profile is only valid after StopCPUProfile flushes it — and you must Close the file. For heap, call runtime.GC() then pprof.WriteHeapProfile(f) so the snapshot reflects what's actually reachable.
  • Server mode (net/http/pprof): a blank import _ "net/http/pprof" runs an init() that registers handlers on http.DefaultServeMux under /debug/pprof/. You just serve the default mux. Never expose this publicly — put it on an internal-only port/mux.
  • Grab a live CPU profile over HTTP: go tool pprof http://host:6060/debug/pprof/profile?seconds=5 (it samples for N seconds). Heap is /debug/pprof/heap; goroutines /debug/pprof/goroutine.
  • Reading: go tool pprof -top cpu.prof ranks functions. flat = time in that function itself; cum = time in it plus everything it called. A high-cum/low-flat frame is a dispatcher; high-flat is the actual hot loop. -http=: opens an interactive flame graph in the browser.
  • Profiling needs the work to survive the optimizer: return/print results so the compiler can't dead-code-eliminate the loop (same sink trick as [[benchmarks]]).
  • For tests, go test -cpuprofile cpu.prof -memprofile mem.prof -bench . writes profiles straight from a benchmark — usually the fastest way to profile a hot function in isolation. The newer runtime trace (runtime/trace + go tool trace) covers scheduling/latency, which pprof's sampling can miss.

💻 Code Examples

// File-mode CPU profile: start before the work, stop (and close) after.
f, err := os.Create("cpu.prof")
if err != nil {
    log.Fatal(err)
}
if err := pprof.StartCPUProfile(f); err != nil {
    log.Fatal(err)
}
result := busyWork() // do the real work here
pprof.StopCPUProfile() // profile is only complete after Stop
_ = f.Close()
_ = result
// Analyze:  go tool pprof -top cpu.prof   (or  -http=:  for a flame graph)

Full example: examples/month-02/pprof/ · File mode: go run ./examples/month-02/pprof · Live endpoints: go run ./examples/month-02/pprof -http then open http://localhost:6060/debug/pprof/.

🏋️ Exercises / Practice

Exercise Status Link
Write cpu.prof + mem.prof from a CPU-bound job, then go tool pprof -top examples/month-02/pprof
Serve /debug/pprof/ and capture a 5s live profile over HTTP examples/month-02/pprof

🐛 Mistakes Made

  • Opened cpu.prof while the program was still running and saw an empty/garbage profile — it's only valid after StopCPUProfile().
  • Imported net/http/pprof normally and got "imported and not used" — it's a blank (_) import; you use its side-effecting init, not its symbols.

❓ Open Questions

  • When does pprof mislead vs runtime/trace? (Sampling can miss short, latency-critical events and blocking; trace shows scheduler/blocking timelines — reach for trace when tail latency is the problem.)

🧠 Active Recall (answer without looking)

  1. Q: What's the difference between flat and cum in pprof -top?
    A

flat is time spent inside that function's own code; cum (cumulative) is flat plus all time spent in functions it called. High-cum/low-flat = a caller/dispatcher; high-flat = the real hot loop. 2. Q: Why is net/http/pprof imported with a blank _?

A

You don't reference its identifiers — you want its init(), which registers the /debug/pprof/* handlers on http.DefaultServeMux. A blank import runs init without binding a name.

🪶 Feynman Reflection

A profiler is a pollster for your program: instead of asking every line "how long did you take?", it periodically snapshots the call stack and tallies which functions were on the stack most often. That sample is statistically enough to point at the few percent of code burning the cycles or holding the memory, so you optimize the proven hot spot instead of guessing.

🕳️ Knowledge Gaps

  • Allocation-rate (alloc_space) vs in-use (inuse_space) heap views, and block/mutex profiles — skimmed; deepen when chasing a real leak.

✅ Summary

I can produce CPU and heap profiles two ways (file and HTTP), keep the work alive for the sampler, and read flat/cum in go tool pprof to locate the hot spot.

⏭️ Next Steps / Prep for Tomorrow

  • Day 054: move the fmt/vet/test/build gate into GitHub Actions so every push is checked in CI.

Time spent Difficulty Confidence
90 min 🟦🟦🟦⬜⬜ 🟦🟦🟦⬜⬜

Suggested commit: docs(journal): pprof quickstart (day 053)