Skip to content

Day 153 — Signals & graceful shutdown

Month 6 · Week 2 · ⬅ Day 152 · Day 154 ➡ · Journal index

🎯 Learning Objective

Catch OS termination signals and drain an HTTP server cleanly: stop accepting new connections, let in-flight requests finish within a deadline, then exit.

📚 Topics

  • signal.NotifyContext; SIGINT vs SIGTERM vs SIGKILL
  • http.Server.Shutdown(ctx), http.ErrServerClosed, bounded drain
  • errors.Is for sentinel errors; per-request r.Context()

📖 Reading / Sources

📝 Notes

  • signal.NotifyContext(parent, sigs...) returns a context cancelled when one of the signals arrives, plus a stop() to release the handler. Cleaner than the channel form for the common "cancel on signal" case → [[signals]].
  • SIGTERM is the polite "please stop" sent by docker stop and Kubernetes; SIGINT is Ctrl-C. SIGKILL (9) cannot be caught — it's the hammer after the grace period. So handle SIGINT/SIGTERM and finish fast → [[graceful-shutdown]].
  • Run srv.ListenAndServe() in a goroutine so main can block on the signal context. On a clean shutdown it returns the sentinel http.ErrServerClosed — check it with errors.Is; it is not a failure.
  • srv.Shutdown(ctx) closes listeners immediately (no new conns), then waits for active requests to return. Bound it with context.WithTimeout; if the deadline passes, Shutdown returns context.DeadlineExceeded and stragglers are dropped → [[bounded-drain]].
  • Make the drain deadline shorter than the platform's kill grace (K8s terminationGracePeriodSeconds, default 30s) so you exit voluntarily before SIGKILL.
  • Shutdown does not close hijacked or idle-WebSocket connections. Long-lived streams need their own cancellation (propagate the shutdown context, or srv.Close() as a last resort).
  • Each handler has r.Context(), cancelled when the client disconnects or the server shuts down — long handlers should select on it to bail early.
  • Order on shutdown: stop intake → drain HTTP → then close DB pools, flush logs/metrics, etc.

💻 Code Examples

ctx, stop := signal.NotifyContext(context.Background(), syscall.SIGINT, syscall.SIGTERM)
defer stop()

srv := &http.Server{Addr: ":8080", Handler: mux}
go func() { srvErr <- srv.ListenAndServe() }() // ErrServerClosed on clean stop

select {
case err := <-srvErr:
    if !errors.Is(err, http.ErrServerClosed) {
        log.Fatalf("server failed: %v", err)
    }
case <-ctx.Done(): // signal arrived
    shutdownCtx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
    defer cancel()
    if err := srv.Shutdown(shutdownCtx); err != nil { // drains in-flight reqs
        log.Fatalf("forced shutdown: %v", err)
    }
}

Full code: examples/month-06/graceful/main.go · Run: go run ./examples/month-06/graceful then press Ctrl-C

🏋️ Exercises / Practice

Exercise Status Link
In-flight tracker that drains within a context deadline exercises/month-06/week-2/drain

🐛 Mistakes Made

  • Treated http.ErrServerClosed as a fatal error and log.Fatal'd on every clean shutdown. Wrapped the check in errors.Is so the sentinel is expected.
  • Called srv.Shutdown(context.Background()) with no timeout; one stuck handler hung shutdown forever until SIGKILL. Added context.WithTimeout.

❓ Open Questions

  • How do I drain WebSocket/SSE connections that Shutdown ignores? (Propagate the shutdown context into each long-lived handler and close on cancel; srv.Close() is the blunt fallback.)

🧠 Active Recall (answer without looking)

  1. Q: After srv.ListenAndServe() returns following a clean Shutdown, what error do you get and how should you treat it?
    A

http.ErrServerClosed. It's the expected sentinel for a graceful stop — match it with errors.Is and do NOT treat it as a failure. 2. Q: Why bound srv.Shutdown with a timeout context, and what happens when the deadline passes?

A

A hung handler would otherwise block shutdown forever. With a timeout, Shutdown returns context.DeadlineExceeded once the deadline passes, abandoning stragglers so the process can exit before SIGKILL.

🪶 Feynman Reflection

Graceful shutdown is closing a shop: lock the front door so no new customers enter (close listeners), let the people already inside finish checking out (drain in-flight requests), but if someone won't leave within ten minutes, turn off the lights anyway (deadline). SIGKILL is the landlord cutting the power — you can't argue with it, so leave first.

🕳️ Knowledge Gaps

  • Coordinating shutdown across many subsystems (errgroup-style) — capstone material.

✅ Summary

I can convert SIGINT/SIGTERM into a cancelled context, drain an HTTP server with a bounded Shutdown, and distinguish the ErrServerClosed sentinel from real failures.

⏭️ Next Steps / Prep for Tomorrow

  • Day 154: week review + active recall across the whole production/observability week.

Time spent Difficulty Confidence
95 min 🟦🟦⬜⬜⬜ 🟦🟦🟦🟦⬜

Suggested commit: docs(journal): signals & graceful shutdown (day 153)