Skip to content

Day 080 — Project: Context & Graceful Shutdown

Month 3 · Week 4 · ⬅ Day 079 · Day 081 ➡ · Journal index

🎯 Learning Objective

Make the crawler stoppable: one cancellation signal (Ctrl-C, a deadline, or a first error) propagates to every worker so they stop pulling work, abandon in-flight fetches, and let main wait for a clean exit.

📚 Topics

  • context.WithCancel / WithTimeout, ctx.Done(), ctx.Err()
  • signal.NotifyContext for SIGINT/SIGTERM
  • Drain-and-wait shutdown with sync.WaitGroup

📖 Reading / Sources

📝 Notes

  • A context is a cancellation tree. Cancelling a parent cancels every child; ctx.Done() is a channel closed on cancel, ctx.Err() says why (context.Canceled or context.DeadlineExceeded) → [[context-cancellation]].
  • Cancelling does NOT stop a goroutine. It only closes Done(). Each goroutine must select on ctx.Done() and return itself — context is cooperative, not preemptive → [[cooperative-cancellation]].
  • signal.NotifyContext(parent, os.Interrupt, syscall.SIGTERM) returns a context cancelled on the first matching signal, plus a stop() you must defer to restore default signal handling. Far cleaner than a manual signal.Notify channel + goroutine → [[signal-handling]].
  • Compose triggers freely. Wrap the signal context in context.WithTimeout(ctx, d) and you get "cancel on Ctrl-C or after d". The workers don't care which fired — they just see Done() close → [[context-composition]].
  • Always defer cancel() (and defer stop()). Even when a timeout will fire, calling cancel releases the timer and child resources immediately; the vet tool flags a lost cancel as a context leak → [[resource-cleanup]].
  • Graceful shutdown = drain + wait. On cancel, the producer stops sending and closes the jobs channel; workers finish or abandon current work and return; main blocks on wg.Wait() so it never exits before its goroutines. Returning while workers run leaks them → [[goroutine-leak]] · [[waitgroup]].
  • A worker blocked on a send/receive must select on Done() too, or it deadlocks after the producer is gone: select { case jobs <- j: case <-ctx.Done(): return } → [[select]].

💻 Code Examples

A worker that honours cancellation while idle and mid-task (from the example):

func worker(ctx context.Context, jobs <-chan int, wg *sync.WaitGroup) {
    defer wg.Done()
    for {
        select {
        case <-ctx.Done():
            return // cancelled while waiting for the next job
        case job, ok := <-jobs:
            if !ok {
                return // jobs closed: normal end
            }
            select {
            case <-time.After(150 * time.Millisecond): // pretend work
            case <-ctx.Done():
                return // cancelled mid-task: drop it and leave
            }
            _ = job
        }
    }
}

Full code: examples/month-03/graceful-shutdown/ · Run: go run ./examples/month-03/graceful-shutdown (or press Ctrl-C early)

🏋️ Exercises / Practice

Exercise Status Link
Reuse the crawler/worker-pool exercises and add a ctx param that aborts the crawl exercises/month-03/week-4/crawl/

🐛 Mistakes Made

  • Wrote the producer as a plain for { jobs <- j } — after cancel the workers were gone, so the producer blocked forever on the send. Added a select with <-ctx.Done().
  • Forgot defer stop() from NotifyContext; go vet didn't catch that one, but the signal handler stayed installed. Added it.

❓ Open Questions

  • Should an in-flight fetch be abandoned or allowed to finish on shutdown? (Depends: abandon for responsiveness, finish for data integrity. The context-aware HTTP request makes "abandon" the default.)

🧠 Active Recall (answer without looking)

  1. Q: Does cancelling a context stop the goroutines using it?

    A No. It only closes `ctx.Done()`. Each goroutine must select on `Done()` and return on its own — cancellation is cooperative, not preemptive.

  2. Q: What does signal.NotifyContext return and why must you call its second value?

    A It returns a context cancelled on the first matching signal, plus a `stop()` function. Deferring `stop()` releases the signal handler (and the context's resources) so signals revert to default handling.

🪶 Feynman Reflection

A context is a kill switch wired to every worker at once. Flipping it (signal, timeout, or error) doesn't yank anyone offstage — it turns on a light (Done()) that everyone is watching, and each worker walks off when they see it. main is the stage manager who waits (wg.Wait()) until the stage is empty before turning off the lights.

🕳️ Knowledge Gaps

  • context.AfterFunc (Go 1.21) and context.WithoutCancel / WithDeadlineCause — newer helpers I haven't used yet.

✅ Summary

The crawler now shuts down gracefully: signal.NotifyContext + WithTimeout give one cancellation source, every worker selects on Done(), and main drains then wg.Wait()s for a leak-free exit.

⏭️ Next Steps / Prep for Tomorrow

  • Day 081: add rate limiting so we don't hammer a host — time.Ticker and a token bucket.

Time spent Difficulty Confidence
90 min 🟦🟦⬜⬜⬜ 🟦🟦🟦⬜⬜

Suggested commit: feat(examples): crawler context & graceful shutdown (day 080)