Day 083 — Profiling Concurrent Code¶
Month 3 · Week 4 · ⬅ Day 082 · Day 084 ➡ · Journal index
🎯 Learning Objective¶
Measure, don't guess: capture CPU, heap, goroutine, and block profiles of the concurrent crawler with runtime/pprof and benchmarks, and read them with go tool pprof to find real bottlenecks and contention.
📚 Topics¶
runtime/pprof(CPU + heap) andnet/http/pprof(live)go test -bench -cpuprofile -memprofile -blockprofilego tool pprof(top,list,web);GOMAXPROCS, contention profiles
📖 Reading / Sources¶
📝 Notes¶
- Profile types answer different questions: CPU (where wall/CPU time goes), heap (where live allocations are), goroutine (what every goroutine is doing — leak hunting), block (time blocked on sync primitives/channels), mutex (lock contention) → [[profiling]].
- Two capture paths:
- Batch:
runtime/pprof—pprof.StartCPUProfile(f)…StopCPUProfile(), andpprof.WriteHeapProfile(f). Best for CLIs/one-shot programs → [[runtime-pprof]]. - Live: import
_ "net/http/pprof"and serve; hit/debug/pprof/*. Best for long-running servers → [[net-http-pprof]]. - Benchmarks are the cleanest source:
go test -bench=. -cpuprofile cpu.prof -memprofile mem.prof -blockprofile block.prof, thengo tool pprof cpu.prof. Add-benchmemto see B/op and allocs/op inline → [[benchmarks]]. - Reading a profile: in
go tool pprof,topranks by flat/cum cost;list Funcshows line-level cost;web/svgdraws the call graph (needs Graphviz). Look at cum% to find the hot subtree, flat% for the hot leaf → [[pprof-tool]]. - For concurrency specifically: the block and mutex profiles reveal contention — enable with
runtime.SetBlockProfileRate(1)/runtime.SetMutexProfileFraction(1)(sampling has overhead). A hot mutex says "shard the lock or use atomics"; lots of blocked goroutines on a channel says "wrong buffering or a slow stage" → [[contention]]. - Goroutine profile / leaks: a steadily climbing
runtime.NumGoroutine()or a goroutine profile full of the same stack = a leak (usually a missingctx.Done()select or an unclosed channel) → [[goroutine-leak]]. GOMAXPROCScaps OS threads running Go code (defaults to CPU count); more workers than that just adds scheduling, not parallelism, for CPU-bound work. Tune the pool size to the bottleneck resource (CPU for compute, much higher for I/O-bound fetches) → [[gomaxprocs]].- Don't optimise blind: profile → fix the top item → re-measure.
benchstatover-countruns tells you whether a change is real or noise → [[measure-first]].
💻 Code Examples¶
Capturing a CPU profile around the parallel workload (from the example):
f, _ := os.Create("cpu.prof")
pprof.StartCPUProfile(f) // begin sampling
parallelWork(200_000, runtime.GOMAXPROCS(0))
pprof.StopCPUProfile() // flush before the file is closed
f.Close()
// then: go tool pprof cpu.prof -> top / list parallelWork / web
Full code (CPU + heap profiles, prints the pprof commands):
examples/month-03/pprof-demo/· Run:go run ./examples/month-03/pprof-demo
🏋️ Exercises / Practice¶
| Exercise | Status | Link |
|---|---|---|
Profile the crawler example, find the hottest function with go tool pprof top |
✅ | examples/month-03/pprof-demo/ |
🐛 Mistakes Made¶
- Closed the profile file before
StopCPUProfile()→ an empty/truncated profile. Order matters: stop (flush) then close. - Expected the block profile to show contention but saw nothing — I hadn't called
runtime.SetBlockProfileRate; blocking events aren't sampled by default.
❓ Open Questions¶
- How do I attribute CPU to a specific goroutine/worker? (pprof aggregates by call stack, not goroutine identity; use labels via
pprof.Do/runtime/pprofprofiler labels.)
🧠 Active Recall (answer without looking)¶
-
Q: Which profile do you reach for to diagnose lock contention vs. goroutine leaks?
A
The **mutex** (and **block**) profile for contention — enable with `SetMutexProfileFraction`/`SetBlockProfileRate`. The **goroutine** profile (or a climbing `NumGoroutine`) for leaks — it shows every goroutine's stack. -
Q: In
go tool pprof top, what's the difference between flat and cum?
A
*Flat* is time spent in that function itself (the hot leaf); *cum* (cumulative) includes everything it called (the hot subtree). High cum + low flat points you down into callees.
🪶 Feynman Reflection¶
Profiling is taking an X-ray instead of poking the patient. The CPU profile shows where the program spends its time, the heap profile where it keeps memory, and the block/mutex profiles where goroutines are stuck waiting on each other. For concurrent code that last pair is gold: a fat mutex or a clogged channel explains why adding workers stopped helping. Fix the top line, re-measure, repeat.
🕳️ Knowledge Gaps¶
- pprof profiler labels (
pprof.Do) to slice CPU by request/host, and execution traces (runtime/trace+go tool trace) for scheduler-level analysis.
✅ Summary¶
I can capture CPU/heap (and, with sampling enabled, block/mutex/goroutine) profiles of concurrent Go via runtime/pprof or benchmarks, then read them with go tool pprof (top/list/web) to find the real bottleneck before changing anything.
⏭️ Next Steps / Prep for Tomorrow¶
- Day 084: Month 3 review, consolidate the concurrency toolkit, and tag
v0.3.0.
| Time spent | Difficulty | Confidence |
|---|---|---|
| 90 min | 🟦🟦🟦⬜⬜ | 🟦🟦🟦⬜⬜ |
Suggested commit: feat(examples): pprof profiling of concurrent code (day 083)