Skip to content

Day 143 — OpenTelemetry Tracing

Month 6 · Week 1 · ⬅ Day 142 · Day 144 ➡ · Journal index

🎯 Learning Objective

Understand distributed tracing: spans, the trace/span ID model, context propagation via the W3C traceparent header, and how OpenTelemetry wires it in Go.

📚 Topics

  • Trace · span · span context; parent/child; sampling
  • W3C Trace Context propagation; OTel TracerProvider, exporters, instrumentation

📖 Reading / Sources

📝 Notes

  • A trace is one request's journey across services; it's a tree of spans. Each span = a named, timed operation with attributes, events, and a status → [[tracing]].
  • Span context (the wire-propagated part): a 16-byte trace-id (same for the whole trace), an 8-byte span-id (this operation), and trace-flags (low bit = sampled). All-zero IDs are invalid.
  • Propagation: the caller serialises its span context into the traceparent header 00-<trace-id>-<span-id>-<flags>; the callee parses it, keeps the trace-id, and creates a new span whose parent is the incoming span-id. That parent/child chain becomes the waterfall in Jaeger/Tempo → [[context-propagation]].
  • Sampling is decided once at the root and carried in trace-flags, so all services agree (head sampling). Tail sampling decides after the fact in the collector.
  • OTel pieces: a TracerProvider (configured with a sampler + resource), a Tracer (tracer.Start(ctx, "name") returns a new ctx + span), an exporter (OTLP → collector), and a propagator (otel.SetTextMapPropagator(propagation.TraceContext{})).
  • Context first: spans live in context.Context. Always pass ctx down and call tracer.Start(ctx, …); defer span.End(). Forgetting to thread ctx breaks the parent link and orphans spans.
  • Record failures with span.RecordError(err) + span.SetStatus(codes.Error, msg). Add cheap dimensions with span.SetAttributes(attribute.String(...)) — but high-cardinality detail is fine on spans (unlike metric labels).
  • Tie it together: put the trace-id into your logs (Day 145) and as a metric exemplar, so one trace-id pivots across logs, metrics, and traces.

💻 Code Examples

The OTel SDK is third-party, so the propagation format — the load-bearing part — is rebuilt with the stdlib in the example below; the SDK wiring is shown as a snippet.

// Real OTel: start a child span and propagate it over HTTP.
ctx, span := tracer.Start(ctx, "GetUser")
defer span.End()
span.SetAttributes(attribute.Int("user.id", id))

req, _ := http.NewRequestWithContext(ctx, "GET", url, nil)
// Inject the current span context into the outgoing headers as `traceparent`.
otel.GetTextMapPropagator().Inject(ctx, propagation.HeaderCarrier(req.Header))

Stdlib traceparent parse/propagate: examples/month-06/tracecontext/main.go · Run: go run ./examples/month-06/tracecontext

🏋️ Exercises / Practice

Exercise Status Link
traceparent — parse/format the W3C header exercises/month-06/week-1/traceparent

🐛 Mistakes Made

  • Created a span from context.Background() inside a handler → it became a new root, orphaned from the request trace. Must start from r.Context().
  • Forgot defer span.End() → spans never closed, showed as still-running.

❓ Open Questions

  • Head vs tail sampling trade-offs at high volume — where does the cost actually land?

🧠 Active Recall (answer without looking)

  1. Q: Across a 3-service request, which ID is constant and which changes per hop?
    A

The trace-id is constant for the entire request; each service mints a new span-id whose parent is the incoming span-id. That parent chain renders as the trace waterfall. 2. Q: Where does the sampling decision come from on a downstream service?

A

From the incoming traceparent trace-flags (low bit), set once at the root. Downstream services honor it rather than re-deciding, so the whole trace is sampled consistently (head sampling).

🪶 Feynman Reflection

A trace is a stopwatch that follows one request everywhere it goes. Every service it touches starts its own little timer (a span) and tags it with the shared trace-id so they can all be stitched back into one timeline. The traceparent header is just that shared ID handed from caller to callee in a fixed-format string.

🕳️ Knowledge Gaps

  • Span links and baggage (cross-cutting key/values) — not yet used.

✅ Summary

I understand the trace/span model, can parse and propagate the W3C traceparent header by hand, and know how OTel's TracerProvider/exporter/propagator fit together.

⏭️ Next Steps / Prep for Tomorrow

  • Day 144: liveness vs readiness probes and graceful traffic draining.

Time spent Difficulty Confidence
90 min 🟦🟦🟦⬜⬜ 🟦🟦🟦⬜⬜

Suggested commit: feat(examples): W3C trace-context propagation (day 143)