Day 081 — Project: Rate Limiting¶
Month 3 · Week 4 · ⬅ Day 080 · Day 082 ➡ · Journal index
🎯 Learning Objective¶
Stop the crawler from hammering hosts: cap how often it fetches (throughput) independently of how many fetch at once (the worker pool), using only the standard library — time.Ticker pacing and a token-bucket channel.
📚 Topics¶
- Rate limiting vs. bounded parallelism (orthogonal concerns)
time.Tickerpacing; token bucket via a buffered channel- Bursts, refill rate, and
ticker.Stop()
📖 Reading / Sources¶
-
time.Tickerdocs -
golang.org/x/time/ratedocs (production choice) - Go blog — Go Concurrency Patterns: Timing out, moving on
📝 Notes¶
- Two orthogonal knobs. The worker pool caps concurrency (N fetches in flight); the rate limiter caps throughput (≤ R fetches per second). You often want both — e.g. 8 workers but no more than 5 req/s per host → [[rate-limiting]] · [[bounded-concurrency]].
time.Tickerpacing is the simplest stdlib throttle:t := time.NewTicker(d), then<-t.Cbefore each action paces it at1/d. No bursts, and you mustdefer t.Stop()or the ticker's timer leaks → [[time-ticker]].- Token bucket allows controlled bursts: a bucket holds up to
capacitytokens, refilled atrate/sec; each action takes a token, an empty bucket waits. Implement with a buffered channel of sizecapacity, pre-filled, topped up by a ticker.Acquire= receive a token (blocks when empty); refill = non-blocking send (drop if full) → [[token-bucket]]. - Burst vs. steady state. Start the bucket full → the first
capacityactions fire immediately (burst), then the queue is paced to the refill rate. Tunecapacityfor burst tolerance,ratefor the long-run average. - Make the limiter context-aware:
Acquire(ctx)shouldselecton the token channel andctx.Done()so shutdown isn't blocked waiting for a token → [[context-cancellation]]. - Deterministic testing: a real limiter depends on wall-clock time, which is flaky in tests. Inject time — expose
Refill(n)and call it from the test instead of sleeping (see the exercise) → [[testable-time]]. - Production: prefer
golang.org/x/time/rate.Limiter(Wait(ctx),Allow(),Reserve()); it implements an accurate token bucket. Hand-roll only for trivial fixed pacing → [[x-time-rate]].
💻 Code Examples¶
A context-aware token bucket backed by a buffered channel + ticker (from the example):
type tokenBucket struct{ tokens chan struct{} }
func (b *tokenBucket) Acquire(ctx context.Context) error {
select {
case <-b.tokens: // take a token (blocks while empty)
return nil
case <-ctx.Done(): // don't block shutdown
return ctx.Err()
}
}
// a ticker goroutine tops up: select { case b.tokens <- struct{}{}: default: } // drop if full
Full code (ticker pacing + token bucket):
examples/month-03/rate-ticker/· Run:go run ./examples/month-03/rate-ticker
🏋️ Exercises / Practice¶
| Exercise | Status | Link |
|---|---|---|
Deterministic token-bucket limiter (Allow/Refill, time injected) |
✅ | exercises/month-03/week-4/tokenbucket/ |
🐛 Mistakes Made¶
- Refilled the bucket with a blocking
b.tokens <- struct{}{}— when the bucket was full the refill goroutine blocked and stopped ticking. Switched to aselect { … default: }non-blocking send that drops the extra token. - Forgot
defer ticker.Stop()again; the timer kept the runtime's timer heap busy after the function returned.
❓ Open Questions¶
- Per-host vs. global rate limiting for the crawler? (Real crawlers rate-limit per host; a
map[host]*limiterguarded by a mutex. Global is the simple start.)
🧠 Active Recall (answer without looking)¶
-
Q: Rate limiting vs. bounded parallelism — what's the difference?
A
Bounded parallelism caps *how many* actions run at once (concurrency, e.g. an N-worker pool). Rate limiting caps *how often* an action happens over time (throughput, e.g. 5 req/s via a token bucket). They're orthogonal; you may need both. -
Q: Why back a token bucket with a buffered channel, and why is the refill a non-blocking send?
A
The buffer of size `capacity` holds the available tokens, allowing a burst up to capacity; `Acquire` receives one and blocks when empty. The refill uses `select { … default }` so that when the bucket is already full it drops the token instead of blocking the ticker goroutine.
🪶 Feynman Reflection¶
The worker pool says "at most 8 people through the door at once." The rate limiter is a turnstile that only releases a token every so often — you can rush a small backlog (the burst, up to the bucket size), but over time you move exactly as fast as tokens drip in. Two independent dials: how many and how often.
🕳️ Knowledge Gaps¶
rate.Limiter.Reserve()semantics and distributed (Redis-backed) rate limiting — beyond stdlib; noted for the services month.
✅ Summary¶
Added stdlib rate limiting to the crawler: time.Ticker for fixed pacing and a context-aware token-bucket channel for bursty-but-bounded throughput, kept orthogonal to the worker pool. The exercise version injects time so it tests deterministically.
⏭️ Next Steps / Prep for Tomorrow¶
- Day 082: lock the whole thing down with tests under
go test -race.
| Time spent | Difficulty | Confidence |
|---|---|---|
| 90 min | 🟦🟦⬜⬜⬜ | 🟦🟦🟦⬜⬜ |
Suggested commit: feat(examples): crawler rate limiting (day 081)