Day 081 — Project: Rate Limiting¶

Month 3 · Week 4 · ⬅ Day 080 · Day 082 ➡ · Journal index

🎯 Learning Objective¶

Stop the crawler from hammering hosts: cap how often it fetches (throughput) independently of how many fetch at once (the worker pool), using only the standard library — time.Ticker pacing and a token-bucket channel.

📚 Topics¶

Rate limiting vs. bounded parallelism (orthogonal concerns)
time.Ticker pacing; token bucket via a buffered channel
Bursts, refill rate, and ticker.Stop()

📖 Reading / Sources¶

time.Ticker docs
golang.org/x/time/rate docs (production choice)
Go blog — Go Concurrency Patterns: Timing out, moving on

📝 Notes¶

Two orthogonal knobs. The worker pool caps concurrency (N fetches in flight); the rate limiter caps throughput (≤ R fetches per second). You often want both — e.g. 8 workers but no more than 5 req/s per host → [[rate-limiting]] · [[bounded-concurrency]].
time.Ticker pacing is the simplest stdlib throttle: t := time.NewTicker(d), then <-t.C before each action paces it at 1/d. No bursts, and you must defer t.Stop() or the ticker's timer leaks → [[time-ticker]].
Token bucket allows controlled bursts: a bucket holds up to capacity tokens, refilled at rate/sec; each action takes a token, an empty bucket waits. Implement with a buffered channel of size capacity, pre-filled, topped up by a ticker. Acquire = receive a token (blocks when empty); refill = non-blocking send (drop if full) → [[token-bucket]].
Burst vs. steady state. Start the bucket full → the first capacity actions fire immediately (burst), then the queue is paced to the refill rate. Tune capacity for burst tolerance, rate for the long-run average.
Make the limiter context-aware: Acquire(ctx) should select on the token channel and ctx.Done() so shutdown isn't blocked waiting for a token → [[context-cancellation]].
Deterministic testing: a real limiter depends on wall-clock time, which is flaky in tests. Inject time — expose Refill(n) and call it from the test instead of sleeping (see the exercise) → [[testable-time]].
Production: prefer golang.org/x/time/rate.Limiter (Wait(ctx), Allow(), Reserve()); it implements an accurate token bucket. Hand-roll only for trivial fixed pacing → [[x-time-rate]].

💻 Code Examples¶

A context-aware token bucket backed by a buffered channel + ticker (from the example):

type tokenBucket struct{ tokens chan struct{} }

func (b *tokenBucket) Acquire(ctx context.Context) error {
    select {
    case <-b.tokens: // take a token (blocks while empty)
        return nil
    case <-ctx.Done(): // don't block shutdown
        return ctx.Err()
    }
}
// a ticker goroutine tops up: select { case b.tokens <- struct{}{}: default: }  // drop if full

Full code (ticker pacing + token bucket): examples/month-03/rate-ticker/ · Run: go run ./examples/month-03/rate-ticker

🏋️ Exercises / Practice¶

Exercise	Status	Link
Deterministic token-bucket limiter (`Allow`/`Refill`, time injected)	✅	`exercises/month-03/week-4/tokenbucket/`

🐛 Mistakes Made¶

Refilled the bucket with a blocking b.tokens <- struct{}{} — when the bucket was full the refill goroutine blocked and stopped ticking. Switched to a select { … default: } non-blocking send that drops the extra token.
Forgot defer ticker.Stop() again; the timer kept the runtime's timer heap busy after the function returned.

❓ Open Questions¶

Per-host vs. global rate limiting for the crawler? (Real crawlers rate-limit per host; a map[host]*limiter guarded by a mutex. Global is the simple start.)

🧠 Active Recall (answer without looking)¶

Q: Rate limiting vs. bounded parallelism — what's the difference?

A
Bounded parallelism caps *how many* actions run at once (concurrency, e.g. an N-worker pool). Rate limiting caps *how often* an action happens over time (throughput, e.g. 5 req/s via a token bucket). They're orthogonal; you may need both.
Q: Why back a token bucket with a buffered channel, and why is the refill a non-blocking send?

A
The buffer of size `capacity` holds the available tokens, allowing a burst up to capacity; `Acquire` receives one and blocks when empty. The refill uses `select { … default }` so that when the bucket is already full it drops the token instead of blocking the ticker goroutine.

🪶 Feynman Reflection¶

The worker pool says "at most 8 people through the door at once." The rate limiter is a turnstile that only releases a token every so often — you can rush a small backlog (the burst, up to the bucket size), but over time you move exactly as fast as tokens drip in. Two independent dials: how many and how often.

🕳️ Knowledge Gaps¶

rate.Limiter.Reserve() semantics and distributed (Redis-backed) rate limiting — beyond stdlib; noted for the services month.

✅ Summary¶

Added stdlib rate limiting to the crawler: time.Ticker for fixed pacing and a context-aware token-bucket channel for bursty-but-bounded throughput, kept orthogonal to the worker pool. The exercise version injects time so it tests deterministically.

⏭️ Next Steps / Prep for Tomorrow¶

Day 082: lock the whole thing down with tests under go test -race.

Time spent	Difficulty	Confidence
90 min	🟦🟦⬜⬜⬜	🟦🟦🟦⬜⬜

Suggested commit: feat(examples): crawler rate limiting (day 081)