Day 142 — Prometheus Metrics (client_golang)¶
Month 6 · Week 1 · ⬅ Day 141 · Day 143 ➡ · Journal index
🎯 Learning Objective¶
Instrument a Go service with the four Prometheus metric types and expose them for scraping, using prometheus/client_golang correctly (registration, labels, histograms, the RED method).
📚 Topics¶
- Counter · Gauge · Histogram · Summary; labels & cardinality
promauto, the default registry,promhttp.Handler, exposition format
📖 Reading / Sources¶
-
client_golangdocs - Prometheus — Metric types
- Instrumenting a Go application (official)
- The RED method (Tom Wilkie)
📝 Notes¶
- Four metric types → [[metrics]]:
- Counter — monotonically increasing total (requests, errors). Only
Inc/Add; queried withrate(). Never goes down except on process restart (a reset, whichrate()handles). - Gauge — a value that goes up and down (in-flight requests, queue depth, temperature).
Set/Inc/Dec. - Histogram — bucketed observations (latency, payload size). Pre-defined buckets; gives you
_count,_sum, and_bucket{le=...}; quantiles computed server-side withhistogram_quantile()→ aggregatable across instances. - Summary — client-side quantiles; cannot be aggregated across instances. Prefer histograms unless you need an exact local quantile.
- Labels add a time series per combination. Keep label values bounded — never put user IDs, emails, or raw URLs in a label, or you get cardinality explosion → [[cardinality]]. Use the route template (
/users/{id}), not the concrete path. - Every metric must be registered exactly once.
promautoregisters on creation; double-registering the same name panics. Define metrics as package vars in aninit/metrics.go. - Expose with
promhttp.Handler()at/metrics; Prometheus scrapes it on an interval. The body is the text exposition format (# HELP,# TYPE,name{label="v"} value). - RED method for request-driven services: Rate, Errors, Duration — a counter for requests, a counter (or label) for errors, a histogram for latency. (USE — Utilization/Saturation/Errors — is the resource-side counterpart.)
- Histogram buckets are cumulative (
le= "less than or equal"); choose buckets around your SLO (e.g. 5ms…2.5s), not the defaults, for latency.
💻 Code Examples¶
client_golang is third-party, so this is a snippet (no runnable stdlib example). The mechanics of a counter registry + exposition format are rebuilt with the stdlib in examples/month-05/metrics.
import (
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"
"github.com/prometheus/client_golang/prometheus/promhttp"
)
var (
httpRequests = promauto.NewCounterVec(prometheus.CounterOpts{
Name: "http_requests_total",
Help: "Total HTTP requests.",
}, []string{"route", "method", "code"}) // bounded label values only
httpDuration = promauto.NewHistogramVec(prometheus.HistogramOpts{
Name: "http_request_duration_seconds",
Help: "Request latency.",
Buckets: []float64{.005, .01, .025, .05, .1, .25, .5, 1, 2.5}, // SLO-shaped
}, []string{"route"})
)
func instrument(route string, next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
start := time.Now()
sr := &statusRecorder{ResponseWriter: w, code: 200}
next.ServeHTTP(sr, r)
httpDuration.WithLabelValues(route).Observe(time.Since(start).Seconds())
httpRequests.WithLabelValues(route, r.Method, strconv.Itoa(sr.code)).Inc()
})
}
http.Handle("/metrics", promhttp.Handler()) // scraped by Prometheus
🏋️ Exercises / Practice¶
| Exercise | Status | Link |
|---|---|---|
| (concept) rebuild a counter registry + exposition format | ✅ | examples/month-05/metrics |
🐛 Mistakes Made¶
- Put the raw request path (with IDs) in a label → cardinality blew up. Switched to the route template.
- Reached for a Summary for latency; learned histograms aggregate across replicas and Summaries don't.
❓ Open Questions¶
- Native histograms (the newer sparse-bucket type) vs classic fixed buckets — when to switch?
🧠 Active Recall (answer without looking)¶
- Q: Why prefer a Histogram over a Summary for request latency in a replicated service?
A
Histogram buckets are exposed raw and combined server-side with histogram_quantile(), so you can aggregate across all replicas. Summary quantiles are computed client-side per instance and cannot be averaged/merged meaningfully.
2. Q: What's the danger of using a user ID as a metric label value? A
Cardinality explosion: each distinct label value is a separate time series, so unbounded values create millions of series and OOM the scraper/TSDB. Labels must have bounded, low-cardinality values.
🪶 Feynman Reflection¶
A counter only climbs (you ask Prometheus for its rate); a gauge is a dial that moves both ways; a histogram drops each measurement into a bucket so you can ask "what fraction was under 100ms?" later. Labels slice each metric into separate lines — powerful, but each new value is a new line, so keep them few and bounded.
🕳️ Knowledge Gaps¶
- Exemplars (linking a histogram sample to a trace ID) — ties into Day 143 tracing.
✅ Summary¶
I can choose the right metric type, instrument the RED signals with bounded labels, register metrics once, and expose /metrics for scraping.
⏭️ Next Steps / Prep for Tomorrow¶
- Day 143: propagate a trace across services with OpenTelemetry and the W3C
traceparentheader.
| Time spent | Difficulty | Confidence |
|---|---|---|
| 90 min | 🟦🟦⬜⬜⬜ | 🟦🟦🟦⬜⬜ |
Suggested commit: docs(journal): prometheus metrics and the RED method (day 142)