Skip to content

Weekly Review — Month 6 · Week 1 (Days 141–147)

Journal index · Roadmap › this week

📅 The Week in One Line

Made a service legible: structured logging with slog, Prometheus metrics (the RED method), OpenTelemetry tracing + W3C traceparent propagation, liveness/readiness probes with graceful draining, correlation IDs through context, and symptom/SLO-based alerting.

✅ What I Completed

  • Day 141 — Structured logging with log/slog: JSON/Text handlers, levels, typed attrs, With/WithGroup, SetDefault, ReplaceAttr redaction
  • Day 142 — Prometheus metrics: Counter/Gauge/Histogram/Summary, bounded labels, the RED method, /metrics exposition
  • Day 143 — OpenTelemetry tracing: trace/span model, sampling, W3C traceparent propagation, TracerProvider/exporter/propagator
  • Day 144 — Health probes: liveness (restart) vs readiness (drain) vs startup, per-check timeouts, graceful shutdown order
  • Day 145 — Correlation IDs: unexported context key type, extract-or-mint middleware, a context-reading slog.Handler
  • Day 146 — Dashboards & alerting: four golden signals, PromQL (rate / error ratio / histogram quantiles), symptom- and SLO-burn-rate alerts
  • Day 147 — Week review + recall
  • Stdlib examples: slog, tracecontext, healthcheck, correlation
  • Exercises solved: 3 (traceparent, health, redact) — all go test green

💡 Lessons Learned

  • slog is stdlib: prefer typed attrs over loose pairs (avoids !BADKEY, preserves JSON types); ReplaceAttr is the single choke point for redaction and time-pinning.
  • Counters are queried as rate(), never raw; Histograms (not Summaries) for latency because buckets aggregate across replicas via histogram_quantile().
  • Keep metric label values bounded — never user IDs/raw paths — or cardinality explodes. Use the route template.
  • In a trace, the trace-id is constant; each hop mints a new span-id whose parent is the incoming one. Sampling is decided once at the root and carried in trace-flags.
  • Liveness must not check dependencies (failure ⇒ restart loop); readiness does, and failing it drains traffic with no restart.
  • Graceful shutdown order: flip readiness false → drain → srv.Shutdown. Reverse drops requests.
  • Context keys must be an unexported named type to avoid cross-package collisions; a context-reading slog.Handler stamps the correlation ID on every record via Handle(ctx, …) + InfoContext.
  • Alert on symptoms (errors/latency) and SLO burn rate, not causes (CPU); use for: to kill flapping; every alert needs a runbook.

💪 Strengths (what clicked)

  • The "three pillars + one shared ID" mental model unified logs/metrics/traces fast.
  • Context propagation transferred straight from Month 5 (ctx-first, cancellation) into both tracing and correlation IDs.
  • The stdlib examples (counter registry from Month 5, traceparent, probes) demystified the third-party libraries.

🧩 Weaknesses (what's still fuzzy)

  • Writing a fully spec-compliant decorating slog.Handler (correct WithAttrs/WithGroup re-wrapping).
  • Choosing histogram buckets and SLO burn-rate windows/thresholds from real targets rather than guessing.
  • Head vs tail sampling cost trade-offs at high request volume.

🔁 Spaced-Repetition Re-quiz (topics from earlier weeks)

  1. Q: (Day 138) Why guard a shared metrics map with a mutex, and what scales better?
    AConcurrent goroutines mutate it, so unsynchronized access is a data race. Per-series atomic.Int64 scales better than one global mutex; the mutex is just simpler/obviously correct.
  2. Q: (Day 137) How do you keep exponential backoff from overflowing and from thundering-herd?
    ADouble-and-cap the delay in-loop (don't compute 2^n directly), then add full jitter. Classify transient vs permanent before spending the retry budget.
  3. Q: (Month 3) What does a receive from a closed channel return?
    AThe element type's zero value immediately, with ok == false in the comma-ok form. Close is a broadcast to all receivers.
  4. Q: (Month 1) How do you match a sentinel error through wrapping vs. extract a typed one?
    Aerrors.Is(err, ErrTarget) for sentinels, errors.As(err, &target) for typed errors; wrap with %w so the chain is walkable.
  5. Q: (Day 135) What does var _ Port = (*Adapter)(nil) buy you?
    AA free compile-time assertion that *Adapter implements Port; it fails to build if the interface drifts.

🎯 Action Items

  • Wire slog (JSON) + the correlation-ID middleware into the Month 5 capstone, defaulting via SetDefault.
  • Add /healthz and /readyz to the capstone with a real DB/Redis readiness check and graceful drain on SIGTERM.
  • Instrument the capstone's RED signals (request counter + latency histogram with bounded route labels).
  • Put the trace-id into every log line and propagate traceparent on outbound queue/RPC calls.
  • Draft one symptom-based SLO burn-rate alert with a runbook link.

🚀 Next Week Goals

  • Wire the full observability stack into the capstone end to end (logs + metrics + traces sharing one ID).
  • Containerize and run with a local collector; load-test and read the dashboards.
  • Continue Month 6: profiling/pprof, performance, and deployment.

📊 Metrics

Hours Days hit Exercises Commits Avg confidence
10.5 7/7 3 7 3.⅘

Suggested commit: docs(journal): month 6 week 1 review