Day 158 — Load testing (k6 / vegeta)¶
Month 6 · Week 3 · ⬅ Day 157 · Day 159 ➡ · Journal index
🎯 Learning Objective¶
Drive synthetic load at a Go service with vegeta (constant-rate) and k6 (scripted scenarios), and read the results in terms of throughput and latency percentiles rather than averages.
📚 Topics¶
- Open-model (constant rate, vegeta) vs closed-model (fixed VUs, k6) load
- Percentiles (p50/p95/p99) vs the misleading mean; tail latency
- Finding the knee: where p99 and error rate climb as RPS rises
📖 Reading / Sources¶
- vegeta — HTTP load testing tool
- k6 documentation
- Gil Tene — "How NOT to Measure Latency" (coordinated omission)
- Brendan Gregg — USE method
📝 Notes¶
- Measure percentiles, not the mean. An average hides the tail; users feel p95/p99. A 5 ms mean with a 900 ms p99 is a bad service. Always report p50/p95/p99 and max → [[latency-percentiles]].
- Open vs closed model. vegeta sends a constant request rate regardless of responses (open model) — best for "can we sustain N RPS?". k6's default VUs each wait for a response before sending the next (closed model) — best for "how do U concurrent users behave?". They answer different questions.
- Coordinated omission: a closed-model tester that waits during a stall under-counts slow requests, flattering the tail. Open-model constant-rate tools (vegeta) or k6's arrival-rate executors avoid it. This is Gil Tene's core warning.
- Find the knee. Ramp RPS until p99 and error rate inflect — that's your capacity. Report throughput at an acceptable latency/error budget, not peak RPS with 30% errors.
- Warm up first. Let the JIT-free Go binary fill caches/pools and the GC reach steady state before recording; discard the first seconds.
- Watch the server side too. Correlate the load result with the service's own pprof/metrics: CPU profile during the run ([[day-155]]),
allocs/opand GC frequency, goroutine count, and saturation of the real bottleneck (DB, locks). Load testing finds that there's a limit; profiling finds why. - k6 thresholds turn a load test into a pass/fail CI gate (e.g.
http_req_duration: p(95)<300). vegeta pipes intovegeta report/vegeta plotfor text/HTML output. - k6 scripts are JavaScript, vegeta is a CLI/Go library — neither is part of a normal Go service's code, so there's no stdlib example today; the snippets below are the real invocations.
💻 Code Examples¶
# vegeta: constant 200 req/s for 30s against one endpoint, then a report.
echo "GET http://localhost:8080/healthz" \
| vegeta attack -rate=200/s -duration=30s \
| tee results.bin \
| vegeta report # shows p50/p95/p99, throughput, status codes
vegeta report -type='hist[0,10ms,50ms,100ms,300ms]' results.bin
// k6: ramp arrival rate (open model — avoids coordinated omission) with a
// p95 latency threshold that fails the run (and CI) if breached.
// Run: k6 run load.js
import http from 'k6/http';
import { check } from 'k6';
export const options = {
scenarios: {
ramp: {
executor: 'ramping-arrival-rate',
startRate: 50, timeUnit: '1s',
preAllocatedVUs: 200,
stages: [
{ target: 200, duration: '30s' },
{ target: 500, duration: '30s' },
],
},
},
thresholds: {
http_req_duration: ['p(95)<300', 'p(99)<800'], // ms
http_req_failed: ['rate<0.01'], // <1% errors
},
};
export default function () {
const res = http.get('http://localhost:8080/healthz');
check(res, { 'status 200': (r) => r.status === 200 });
}
🏋️ Exercises / Practice¶
| Exercise | Status | Link |
|---|---|---|
| Token-bucket limiter (defends the service under load) | ✅ | exercises/month-06/week-3/tokenbucket |
No stdlib runnable example today — k6 (JS) and vegeta (external CLI) aren't part of the service's Go code. The real commands are the snippets above.
🐛 Mistakes Made¶
- Reported the mean latency and called it fast; the p99 was 40× higher. Switched to percentiles.
- Ran a closed-model test and missed a stall (coordinated omission). Re-ran with an arrival-rate executor and the tail appeared.
❓ Open Questions¶
- How do I keep load-test infra from being the bottleneck (client CPU/conntrack/file descriptors) and skewing results?
🧠 Active Recall (answer without looking)¶
- Q: Why prefer p99 over the mean latency when reporting a load test?
A
The mean is dominated by the common fast case and hides the tail. Real users hit the slow requests, and at scale "1-in-100" (p99) happens constantly. Tail latency, not the average, determines perceived performance and SLOs.
2. Q: What is coordinated omission and which load model avoids it? A
In a closed model, when the server stalls the load generator also pauses (it waits for each response), so it never sends the requests that would have been slow — under-counting the tail. An open / constant-arrival-rate model (vegeta, or k6's arrival-rate executors) keeps issuing requests on schedule, so stalls show up.
🪶 Feynman Reflection¶
Load testing is a wind tunnel: you blow a controlled, measured stream of requests at the service and watch where it shakes. The trick is blowing at a constant rate (open model) so a momentary stall doesn't make you politely slow down and pretend the turbulence wasn't there. And you judge the wing by its worst flutter (p99), not its average smoothness.
🕳️ Knowledge Gaps¶
- Distributed/cloud-scale load generation (k6 Cloud, multiple agents) and aggregating their percentiles correctly.
- Modeling realistic traffic shapes (think time, mixed endpoints, payload sizes).
✅ Summary¶
I can generate constant-rate load with vegeta and scripted/threshold-gated scenarios with k6, read results as p50/p95/p99 + error rate, avoid coordinated omission, and correlate the knee with server-side profiles.
⏭️ Next Steps / Prep for Tomorrow¶
- Day 159: shift left on security — scan the dependency graph and code with
govulncheckandgosec.
| Time spent | Difficulty | Confidence |
|---|---|---|
| 90 min | 🟦🟦⬜⬜⬜ | 🟦🟦🟦⬜⬜ |
Suggested commit: docs(journal): load testing with k6 & vegeta (day 158)