Day 158 — Load testing (k6 / vegeta)¶

Month 6 · Week 3 · ⬅ Day 157 · Day 159 ➡ · Journal index

🎯 Learning Objective¶

Drive synthetic load at a Go service with vegeta (constant-rate) and k6 (scripted scenarios), and read the results in terms of throughput and latency percentiles rather than averages.

📚 Topics¶

Open-model (constant rate, vegeta) vs closed-model (fixed VUs, k6) load
Percentiles (p50/p95/p99) vs the misleading mean; tail latency
Finding the knee: where p99 and error rate climb as RPS rises

📖 Reading / Sources¶

vegeta — HTTP load testing tool
k6 documentation
Gil Tene — "How NOT to Measure Latency" (coordinated omission)
Brendan Gregg — USE method

📝 Notes¶

Measure percentiles, not the mean. An average hides the tail; users feel p95/p99. A 5 ms mean with a 900 ms p99 is a bad service. Always report p50/p95/p99 and max → [[latency-percentiles]].
Open vs closed model. vegeta sends a constant request rate regardless of responses (open model) — best for "can we sustain N RPS?". k6's default VUs each wait for a response before sending the next (closed model) — best for "how do U concurrent users behave?". They answer different questions.
Coordinated omission: a closed-model tester that waits during a stall under-counts slow requests, flattering the tail. Open-model constant-rate tools (vegeta) or k6's arrival-rate executors avoid it. This is Gil Tene's core warning.
Find the knee. Ramp RPS until p99 and error rate inflect — that's your capacity. Report throughput at an acceptable latency/error budget, not peak RPS with 30% errors.
Warm up first. Let the JIT-free Go binary fill caches/pools and the GC reach steady state before recording; discard the first seconds.
Watch the server side too. Correlate the load result with the service's own pprof/metrics: CPU profile during the run ([[day-155]]), allocs/op and GC frequency, goroutine count, and saturation of the real bottleneck (DB, locks). Load testing finds that there's a limit; profiling finds why.
k6 thresholds turn a load test into a pass/fail CI gate (e.g. http_req_duration: p(95)<300). vegeta pipes into vegeta report / vegeta plot for text/HTML output.
k6 scripts are JavaScript, vegeta is a CLI/Go library — neither is part of a normal Go service's code, so there's no stdlib example today; the snippets below are the real invocations.

💻 Code Examples¶

# vegeta: constant 200 req/s for 30s against one endpoint, then a report.
echo "GET http://localhost:8080/healthz" \
  | vegeta attack -rate=200/s -duration=30s \
  | tee results.bin \
  | vegeta report                 # shows p50/p95/p99, throughput, status codes
vegeta report -type='hist[0,10ms,50ms,100ms,300ms]' results.bin

// k6: ramp arrival rate (open model — avoids coordinated omission) with a
// p95 latency threshold that fails the run (and CI) if breached.
// Run:  k6 run load.js
import http from 'k6/http';
import { check } from 'k6';

export const options = {
  scenarios: {
    ramp: {
      executor: 'ramping-arrival-rate',
      startRate: 50, timeUnit: '1s',
      preAllocatedVUs: 200,
      stages: [
        { target: 200, duration: '30s' },
        { target: 500, duration: '30s' },
      ],
    },
  },
  thresholds: {
    http_req_duration: ['p(95)<300', 'p(99)<800'], // ms
    http_req_failed:   ['rate<0.01'],               // <1% errors
  },
};

export default function () {
  const res = http.get('http://localhost:8080/healthz');
  check(res, { 'status 200': (r) => r.status === 200 });
}

🏋️ Exercises / Practice¶

Exercise	Status	Link
Token-bucket limiter (defends the service under load)	✅	exercises/month-06/week-3/tokenbucket

No stdlib runnable example today — k6 (JS) and vegeta (external CLI) aren't part of the service's Go code. The real commands are the snippets above.

🐛 Mistakes Made¶

Reported the mean latency and called it fast; the p99 was 40× higher. Switched to percentiles.
Ran a closed-model test and missed a stall (coordinated omission). Re-ran with an arrival-rate executor and the tail appeared.

❓ Open Questions¶

How do I keep load-test infra from being the bottleneck (client CPU/conntrack/file descriptors) and skewing results?

🧠 Active Recall (answer without looking)¶

Q: Why prefer p99 over the mean latency when reporting a load test?
A

The mean is dominated by the common fast case and hides the tail. Real users hit the slow requests, and at scale "1-in-100" (p99) happens constantly. Tail latency, not the average, determines perceived performance and SLOs. 2. Q: What is coordinated omission and which load model avoids it?

A

In a closed model, when the server stalls the load generator also pauses (it waits for each response), so it never sends the requests that would have been slow — under-counting the tail. An open / constant-arrival-rate model (vegeta, or k6's arrival-rate executors) keeps issuing requests on schedule, so stalls show up.

🪶 Feynman Reflection¶

Load testing is a wind tunnel: you blow a controlled, measured stream of requests at the service and watch where it shakes. The trick is blowing at a constant rate (open model) so a momentary stall doesn't make you politely slow down and pretend the turbulence wasn't there. And you judge the wing by its worst flutter (p99), not its average smoothness.

🕳️ Knowledge Gaps¶

Distributed/cloud-scale load generation (k6 Cloud, multiple agents) and aggregating their percentiles correctly.
Modeling realistic traffic shapes (think time, mixed endpoints, payload sizes).

✅ Summary¶

I can generate constant-rate load with vegeta and scripted/threshold-gated scenarios with k6, read results as p50/p95/p99 + error rate, avoid coordinated omission, and correlate the knee with server-side profiles.

⏭️ Next Steps / Prep for Tomorrow¶

Day 159: shift left on security — scan the dependency graph and code with govulncheck and gosec.

Time spent	Difficulty	Confidence
90 min	🟦🟦⬜⬜⬜	🟦🟦🟦⬜⬜

Suggested commit: docs(journal): load testing with k6 & vegeta (day 158)