Day 153 — Signals & graceful shutdown¶
Month 6 · Week 2 · ⬅ Day 152 · Day 154 ➡ · Journal index
🎯 Learning Objective¶
Catch OS termination signals and drain an HTTP server cleanly: stop accepting new connections, let in-flight requests finish within a deadline, then exit.
📚 Topics¶
signal.NotifyContext; SIGINT vs SIGTERM vs SIGKILLhttp.Server.Shutdown(ctx),http.ErrServerClosed, bounded drainerrors.Isfor sentinel errors; per-requestr.Context()
📖 Reading / Sources¶
📝 Notes¶
signal.NotifyContext(parent, sigs...)returns a context cancelled when one of the signals arrives, plus astop()to release the handler. Cleaner than the channel form for the common "cancel on signal" case → [[signals]].- SIGTERM is the polite "please stop" sent by
docker stopand Kubernetes; SIGINT is Ctrl-C. SIGKILL (9) cannot be caught — it's the hammer after the grace period. So handle SIGINT/SIGTERM and finish fast → [[graceful-shutdown]]. - Run
srv.ListenAndServe()in a goroutine somaincan block on the signal context. On a clean shutdown it returns the sentinelhttp.ErrServerClosed— check it witherrors.Is; it is not a failure. srv.Shutdown(ctx)closes listeners immediately (no new conns), then waits for active requests to return. Bound it withcontext.WithTimeout; if the deadline passes,Shutdownreturnscontext.DeadlineExceededand stragglers are dropped → [[bounded-drain]].- Make the drain deadline shorter than the platform's kill grace (K8s
terminationGracePeriodSeconds, default 30s) so you exit voluntarily before SIGKILL. Shutdowndoes not close hijacked or idle-WebSocket connections. Long-lived streams need their own cancellation (propagate the shutdown context, orsrv.Close()as a last resort).- Each handler has
r.Context(), cancelled when the client disconnects or the server shuts down — long handlers shouldselecton it to bail early. - Order on shutdown: stop intake → drain HTTP → then close DB pools, flush logs/metrics, etc.
💻 Code Examples¶
ctx, stop := signal.NotifyContext(context.Background(), syscall.SIGINT, syscall.SIGTERM)
defer stop()
srv := &http.Server{Addr: ":8080", Handler: mux}
go func() { srvErr <- srv.ListenAndServe() }() // ErrServerClosed on clean stop
select {
case err := <-srvErr:
if !errors.Is(err, http.ErrServerClosed) {
log.Fatalf("server failed: %v", err)
}
case <-ctx.Done(): // signal arrived
shutdownCtx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
defer cancel()
if err := srv.Shutdown(shutdownCtx); err != nil { // drains in-flight reqs
log.Fatalf("forced shutdown: %v", err)
}
}
Full code:
examples/month-06/graceful/main.go· Run:go run ./examples/month-06/gracefulthen press Ctrl-C
🏋️ Exercises / Practice¶
| Exercise | Status | Link |
|---|---|---|
| In-flight tracker that drains within a context deadline | ✅ | exercises/month-06/week-2/drain |
🐛 Mistakes Made¶
- Treated
http.ErrServerClosedas a fatal error andlog.Fatal'd on every clean shutdown. Wrapped the check inerrors.Isso the sentinel is expected. - Called
srv.Shutdown(context.Background())with no timeout; one stuck handler hung shutdown forever until SIGKILL. Addedcontext.WithTimeout.
❓ Open Questions¶
- How do I drain WebSocket/SSE connections that
Shutdownignores? (Propagate the shutdown context into each long-lived handler and close on cancel;srv.Close()is the blunt fallback.)
🧠 Active Recall (answer without looking)¶
- Q: After
srv.ListenAndServe()returns following a cleanShutdown, what error do you get and how should you treat it?A
http.ErrServerClosed. It's the expected sentinel for a graceful stop — match it with errors.Is and do NOT treat it as a failure.
2. Q: Why bound srv.Shutdown with a timeout context, and what happens when the deadline passes? A
A hung handler would otherwise block shutdown forever. With a timeout, Shutdown returns context.DeadlineExceeded once the deadline passes, abandoning stragglers so the process can exit before SIGKILL.
🪶 Feynman Reflection¶
Graceful shutdown is closing a shop: lock the front door so no new customers enter (close listeners), let the people already inside finish checking out (drain in-flight requests), but if someone won't leave within ten minutes, turn off the lights anyway (deadline). SIGKILL is the landlord cutting the power — you can't argue with it, so leave first.
🕳️ Knowledge Gaps¶
- Coordinating shutdown across many subsystems (errgroup-style) — capstone material.
✅ Summary¶
I can convert SIGINT/SIGTERM into a cancelled context, drain an HTTP server with a bounded Shutdown, and distinguish the ErrServerClosed sentinel from real failures.
⏭️ Next Steps / Prep for Tomorrow¶
- Day 154: week review + active recall across the whole production/observability week.
| Time spent | Difficulty | Confidence |
|---|---|---|
| 95 min | 🟦🟦⬜⬜⬜ | 🟦🟦🟦🟦⬜ |
Suggested commit: docs(journal): signals & graceful shutdown (day 153)