Weekly Review — Month 6 · Week 1 (Days 141–147)¶
📅 The Week in One Line¶
Made a service legible: structured logging with slog, Prometheus metrics (the RED method), OpenTelemetry tracing + W3C traceparent propagation, liveness/readiness probes with graceful draining, correlation IDs through context, and symptom/SLO-based alerting.
✅ What I Completed¶
- Day 141 — Structured logging with
log/slog: JSON/Text handlers, levels, typed attrs,With/WithGroup,SetDefault,ReplaceAttrredaction - Day 142 — Prometheus metrics: Counter/Gauge/Histogram/Summary, bounded labels, the RED method,
/metricsexposition - Day 143 — OpenTelemetry tracing: trace/span model, sampling, W3C
traceparentpropagation, TracerProvider/exporter/propagator - Day 144 — Health probes: liveness (restart) vs readiness (drain) vs startup, per-check timeouts, graceful shutdown order
- Day 145 — Correlation IDs: unexported context key type, extract-or-mint middleware, a context-reading
slog.Handler - Day 146 — Dashboards & alerting: four golden signals, PromQL (rate / error ratio / histogram quantiles), symptom- and SLO-burn-rate alerts
- Day 147 — Week review + recall
- Stdlib examples:
slog,tracecontext,healthcheck,correlation - Exercises solved: 3 (
traceparent,health,redact) — allgo testgreen
💡 Lessons Learned¶
- slog is stdlib: prefer typed attrs over loose pairs (avoids
!BADKEY, preserves JSON types);ReplaceAttris the single choke point for redaction and time-pinning. - Counters are queried as
rate(), never raw; Histograms (not Summaries) for latency because buckets aggregate across replicas viahistogram_quantile(). - Keep metric label values bounded — never user IDs/raw paths — or cardinality explodes. Use the route template.
- In a trace, the trace-id is constant; each hop mints a new span-id whose parent is the incoming one. Sampling is decided once at the root and carried in trace-flags.
- Liveness must not check dependencies (failure ⇒ restart loop); readiness does, and failing it drains traffic with no restart.
- Graceful shutdown order: flip readiness false → drain →
srv.Shutdown. Reverse drops requests. - Context keys must be an unexported named type to avoid cross-package collisions; a context-reading
slog.Handlerstamps the correlation ID on every record viaHandle(ctx, …)+InfoContext. - Alert on symptoms (errors/latency) and SLO burn rate, not causes (CPU); use
for:to kill flapping; every alert needs a runbook.
💪 Strengths (what clicked)¶
- The "three pillars + one shared ID" mental model unified logs/metrics/traces fast.
- Context propagation transferred straight from Month 5 (ctx-first, cancellation) into both tracing and correlation IDs.
- The stdlib examples (counter registry from Month 5, traceparent, probes) demystified the third-party libraries.
🧩 Weaknesses (what's still fuzzy)¶
- Writing a fully spec-compliant decorating
slog.Handler(correctWithAttrs/WithGroupre-wrapping). - Choosing histogram buckets and SLO burn-rate windows/thresholds from real targets rather than guessing.
- Head vs tail sampling cost trade-offs at high request volume.
🔁 Spaced-Repetition Re-quiz (topics from earlier weeks)¶
- Q: (Day 138) Why guard a shared metrics map with a mutex, and what scales better?
A
Concurrent goroutines mutate it, so unsynchronized access is a data race. Per-seriesatomic.Int64scales better than one global mutex; the mutex is just simpler/obviously correct. - Q: (Day 137) How do you keep exponential backoff from overflowing and from thundering-herd?
A
Double-and-cap the delay in-loop (don't compute2^ndirectly), then add full jitter. Classify transient vs permanent before spending the retry budget. - Q: (Month 3) What does a receive from a closed channel return?
A
The element type's zero value immediately, withok == falsein the comma-ok form. Close is a broadcast to all receivers. - Q: (Month 1) How do you match a sentinel error through wrapping vs. extract a typed one?
A
errors.Is(err, ErrTarget)for sentinels,errors.As(err, &target)for typed errors; wrap with%wso the chain is walkable. - Q: (Day 135) What does
var _ Port = (*Adapter)(nil)buy you?A
A free compile-time assertion that*AdapterimplementsPort; it fails to build if the interface drifts.
🎯 Action Items¶
- Wire
slog(JSON) + the correlation-ID middleware into the Month 5 capstone, defaulting viaSetDefault. - Add
/healthzand/readyzto the capstone with a real DB/Redis readiness check and graceful drain on SIGTERM. - Instrument the capstone's RED signals (request counter + latency histogram with bounded route labels).
- Put the trace-id into every log line and propagate
traceparenton outbound queue/RPC calls. - Draft one symptom-based SLO burn-rate alert with a runbook link.
🚀 Next Week Goals¶
- Wire the full observability stack into the capstone end to end (logs + metrics + traces sharing one ID).
- Containerize and run with a local collector; load-test and read the dashboards.
- Continue Month 6: profiling/pprof, performance, and deployment.
📊 Metrics¶
| Hours | Days hit | Exercises | Commits | Avg confidence |
|---|---|---|---|---|
| 10.5 | 7/7 | 3 | 7 | 3.⅘ |
Suggested commit: docs(journal): month 6 week 1 review