Skip to content

Day 083 — Profiling Concurrent Code

Month 3 · Week 4 · ⬅ Day 082 · Day 084 ➡ · Journal index

🎯 Learning Objective

Measure, don't guess: capture CPU, heap, goroutine, and block profiles of the concurrent crawler with runtime/pprof and benchmarks, and read them with go tool pprof to find real bottlenecks and contention.

📚 Topics

  • runtime/pprof (CPU + heap) and net/http/pprof (live)
  • go test -bench -cpuprofile -memprofile -blockprofile
  • go tool pprof (top, list, web); GOMAXPROCS, contention profiles

📖 Reading / Sources

📝 Notes

  • Profile types answer different questions: CPU (where wall/CPU time goes), heap (where live allocations are), goroutine (what every goroutine is doing — leak hunting), block (time blocked on sync primitives/channels), mutex (lock contention) → [[profiling]].
  • Two capture paths:
  • Batch: runtime/pprofpprof.StartCPUProfile(f)StopCPUProfile(), and pprof.WriteHeapProfile(f). Best for CLIs/one-shot programs → [[runtime-pprof]].
  • Live: import _ "net/http/pprof" and serve; hit /debug/pprof/*. Best for long-running servers → [[net-http-pprof]].
  • Benchmarks are the cleanest source: go test -bench=. -cpuprofile cpu.prof -memprofile mem.prof -blockprofile block.prof, then go tool pprof cpu.prof. Add -benchmem to see B/op and allocs/op inline → [[benchmarks]].
  • Reading a profile: in go tool pprof, top ranks by flat/cum cost; list Func shows line-level cost; web/svg draws the call graph (needs Graphviz). Look at cum% to find the hot subtree, flat% for the hot leaf → [[pprof-tool]].
  • For concurrency specifically: the block and mutex profiles reveal contention — enable with runtime.SetBlockProfileRate(1) / runtime.SetMutexProfileFraction(1) (sampling has overhead). A hot mutex says "shard the lock or use atomics"; lots of blocked goroutines on a channel says "wrong buffering or a slow stage" → [[contention]].
  • Goroutine profile / leaks: a steadily climbing runtime.NumGoroutine() or a goroutine profile full of the same stack = a leak (usually a missing ctx.Done() select or an unclosed channel) → [[goroutine-leak]].
  • GOMAXPROCS caps OS threads running Go code (defaults to CPU count); more workers than that just adds scheduling, not parallelism, for CPU-bound work. Tune the pool size to the bottleneck resource (CPU for compute, much higher for I/O-bound fetches) → [[gomaxprocs]].
  • Don't optimise blind: profile → fix the top item → re-measure. benchstat over -count runs tells you whether a change is real or noise → [[measure-first]].

💻 Code Examples

Capturing a CPU profile around the parallel workload (from the example):

f, _ := os.Create("cpu.prof")
pprof.StartCPUProfile(f)        // begin sampling
parallelWork(200_000, runtime.GOMAXPROCS(0))
pprof.StopCPUProfile()          // flush before the file is closed
f.Close()
// then: go tool pprof cpu.prof   ->  top / list parallelWork / web

Full code (CPU + heap profiles, prints the pprof commands): examples/month-03/pprof-demo/ · Run: go run ./examples/month-03/pprof-demo

🏋️ Exercises / Practice

Exercise Status Link
Profile the crawler example, find the hottest function with go tool pprof top examples/month-03/pprof-demo/

🐛 Mistakes Made

  • Closed the profile file before StopCPUProfile() → an empty/truncated profile. Order matters: stop (flush) then close.
  • Expected the block profile to show contention but saw nothing — I hadn't called runtime.SetBlockProfileRate; blocking events aren't sampled by default.

❓ Open Questions

  • How do I attribute CPU to a specific goroutine/worker? (pprof aggregates by call stack, not goroutine identity; use labels via pprof.Do/runtime/pprof profiler labels.)

🧠 Active Recall (answer without looking)

  1. Q: Which profile do you reach for to diagnose lock contention vs. goroutine leaks?

    A The **mutex** (and **block**) profile for contention — enable with `SetMutexProfileFraction`/`SetBlockProfileRate`. The **goroutine** profile (or a climbing `NumGoroutine`) for leaks — it shows every goroutine's stack.

  2. Q: In go tool pprof top, what's the difference between flat and cum?

    A *Flat* is time spent in that function itself (the hot leaf); *cum* (cumulative) includes everything it called (the hot subtree). High cum + low flat points you down into callees.

🪶 Feynman Reflection

Profiling is taking an X-ray instead of poking the patient. The CPU profile shows where the program spends its time, the heap profile where it keeps memory, and the block/mutex profiles where goroutines are stuck waiting on each other. For concurrent code that last pair is gold: a fat mutex or a clogged channel explains why adding workers stopped helping. Fix the top line, re-measure, repeat.

🕳️ Knowledge Gaps

  • pprof profiler labels (pprof.Do) to slice CPU by request/host, and execution traces (runtime/trace + go tool trace) for scheduler-level analysis.

✅ Summary

I can capture CPU/heap (and, with sampling enabled, block/mutex/goroutine) profiles of concurrent Go via runtime/pprof or benchmarks, then read them with go tool pprof (top/list/web) to find the real bottleneck before changing anything.

⏭️ Next Steps / Prep for Tomorrow

  • Day 084: Month 3 review, consolidate the concurrency toolkit, and tag v0.3.0.

Time spent Difficulty Confidence
90 min 🟦🟦🟦⬜⬜ 🟦🟦🟦⬜⬜

Suggested commit: feat(examples): pprof profiling of concurrent code (day 083)