Skip to content

Day 047 — Benchmarks (testing.B)

Month 2 · Week 3 · ⬅ Day 046 · Day 048 ➡ · Journal index

🎯 Learning Objective

Write trustworthy micro-benchmarks with testing.B: drive the b.N loop, exclude setup with b.ResetTimer, report allocations, and avoid dead-code elimination.

📚 Topics

  • func BenchmarkXxx(b *testing.B) and the b.N loop
  • b.ResetTimer / b.StopTimer / b.StartTimer
  • b.ReportAllocs · sub-benchmarks with b.Run · benchstat

📖 Reading / Sources

📝 Notes

  • A benchmark loops the operation b.N times. The framework auto-tunes b.N (1 → 100 → ... ) until the run lasts ~1s, then reports ns/op = total/b.N. You never set b.N. → [[bench-b-n]]
  • Run with go test -bench=. -benchmem. -benchmem (or b.ReportAllocs() in code) adds B/op and allocs/op — often more decision-relevant than ns/op.
  • Exclude setup from timing: do expensive prep before the loop and call b.ResetTimer(). For per-iteration setup, bracket it with b.StopTimer() / b.StartTimer().
  • Dead-code elimination: if the result is unused, the optimizer may delete the work and you benchmark nothing. Assign to a package-level sink or _ = it. Don't include fmt.Println inside the loop.
  • Sub-benchmarks: b.Run("size=1k", func(b *testing.B){...}) to sweep input sizes; great for spotting O(n²) growth.
  • Results are noisy. Use -benchmem, multiple runs (-count=10), and benchstat to compare old vs new with a confidence interval. A single run proves little.
  • -cpu=1,4,8 runs the benchmark at different GOMAXPROCS to see parallel scaling; b.RunParallel benchmarks contended code with a b.PB callback.
  • Benchmark realistic inputs; micro-optimizing an unrealistic case wastes effort.

💻 Code Examples

func BenchmarkReverse(b *testing.B) {
    const s = "the quick brown fox — héllo 🚀"
    b.ReportAllocs() // also print B/op and allocs/op
    b.ResetTimer()   // don't count the const setup
    for i := 0; i < b.N; i++ {
        _ = Reverse(s) // sink the result so it isn't optimized away
    }
}

// Sweep sizes to expose super-linear growth:
func BenchmarkJoin(b *testing.B) {
    for _, n := range []int{10, 1000} {
        parts := make([]string, n)
        b.Run(fmt.Sprintf("n=%d", n), func(b *testing.B) {
            for i := 0; i < b.N; i++ {
                _ = strings.Join(parts, "")
            }
        })
    }
}

Runnable hand-rolled version of the b.N loop: examples/month-02/bench-demo/ · Run: go run ./examples/month-02/bench-demo Real BenchmarkReverse: exercises/month-02/week-3/reverse/ · Run: go test -bench=. -benchmem ./exercises/month-02/week-3/reverse

🏋️ Exercises / Practice

Exercise Status Link
BenchmarkReverse with ReportAllocs exercises/month-02/week-3/reverse
Compare += vs strings.Builder vs Join examples/month-02/bench-demo

🐛 Mistakes Made

  • Left fmt.Sprintf (and printing) inside the benchmark loop — measured the formatter, not the target. Moved it out.
  • Didn't sink the result; suspiciously fast ns/op was the optimizer deleting the call.

❓ Open Questions

  • How many -count runs does benchstat need for a stable p-value? (Rule of thumb: ≥10; more for noisy machines.)

🧠 Active Recall (answer without looking)

  1. Q: Who sets b.N and why?
    A

The testing framework auto-tunes b.N upward until the loop runs long enough (~1s) to time reliably; it reports total/b.N as ns/op. You never set it. 2. Q: Two ways a benchmark can lie?

A

(1) Setup counted in the timed region — fix with b.ResetTimer. (2) Dead-code elimination removing unused results — fix by sinking the result. Also: single noisy run; use -count + benchstat.

🪶 Feynman Reflection

A benchmark is a stopwatch wrapped around a loop the framework runs enough times to get a stable average. Your jobs: don't time the setup, don't let the compiler delete the work, and report allocations — because allocation count often predicts real-world cost better than raw nanoseconds.

🕳️ Knowledge Gaps

  • Reading pprof CPU/heap profiles to explain why one variant is faster — next layer down.

✅ Summary

I can write honest benchmarks: drive b.N, reset the timer, report allocs, sink results, sweep sizes, and compare runs with benchstat.

⏭️ Next Steps / Prep for Tomorrow

  • Day 048: fuzzing with testing.F and runnable Example functions.

Time spent Difficulty Confidence
90 min 🟦🟦⬜⬜⬜ 🟦🟦🟦⬜⬜

Suggested commit: test(week-3): testing.B benchmarks and benchmem (day 047)