Table of Contents

Go-Flavored System Design Prompts
Table of Contents
How to Answer a System Design Question
1. Distributed Rate Limiter
2. URL Shortener
3. Distributed Job Queue
4. Notification Service
5. Distributed Cache
6. Chat / Messaging Service
7. News Feed
8. Metrics Ingestion Pipeline
9. Web Crawler
10. Leaderboard / Counting Service

Go-Flavored System Design Prompts¶

Ten system-design prompts framed for Go backend interviews. Each prompt is a ## section with the problem, a framework-applied outline, Go-specific implementation notes (goroutines, channels, context, sync, net/http, sharding), and the key tradeoffs. A collapsible "Sample strong answer" is included per prompt.

How to Answer a System Design Question¶

A repeatable framework. Spend time up front on scope; don't jump to a database.

Clarify requirements — separate functional (what it does) from non-functional (scale, latency SLOs, consistency, availability). State assumptions out loud and confirm scope.
Estimates / back-of-envelope — derive QPS, read:write ratio, storage growth, and bandwidth from DAU and per-user activity. These numbers justify every later decision (do we need a cache? how many shards?).
API — define the handful of endpoints/RPCs with their inputs and outputs. This pins down the contract before internals.
Data model — entities, access patterns, and the storage choice that fits them (relational, KV, wide-column, search, blob). Pick the partition key here.
High-level architecture — draw the boxes: clients, load balancer, stateless services, queues, datastores, caches. Keep services stateless so they scale horizontally.
Deep dives — go deep on the 1-2 hard parts the interviewer cares about (hot keys, exactly-once, fan-out, consistency).
Bottlenecks & tradeoffs — name the failure modes, the scaling limits, and the consistency/availability tradeoffs (CAP), plus how you'd evolve the design.

Go-specific notes (apply throughout): keep handlers stateless behind net/http; use a bounded worker pool of goroutines (not unbounded go func()) to cap concurrency and memory; pass context.Context for deadlines/cancellation through every call; use channels for backpressure and sync primitives (sync.Mutex, sync.RWMutex, atomic, singleflight) for shared in-memory state; shard hot state by key to reduce lock contention; and always wire graceful shutdown so in-flight work drains.

1. Distributed Rate Limiter¶

Problem: Limit each client (API key / user / IP) to N requests per time window across a fleet of stateless service replicas.

Framework applied:

Requirements: per-key limits, low added latency (<1ms ideal), fail-open vs fail-closed policy, works across replicas. Limit is the non-functional core.
Estimates: if 50k RPS each needs a limiter check, the limiter store must handle 50k+ ops/s — a single Redis instance can, but plan for sharding by key.
API: Allow(key) -> (allowed bool, retryAfter); expose limit headers (X-RateLimit-Remaining).
Data model: counter or token state per key with a TTL.
Architecture: in-process limiter for the common case + shared Redis for cross-replica truth; algorithms: fixed window, sliding window log/counter, token bucket, leaky bucket.
Deep dive: atomicity — INCR+EXPIRE race; use a Lua script. Boundary bursts in fixed windows; prefer sliding window or token bucket.

Go-specific notes: golang.org/x/time/rate is a per-process token bucket — great as an L1 limiter and for limiting outbound calls, but its state is local. For distributed truth, run a Lua script on Redis so increment+expire is atomic. Wrap checks in middleware and respect context deadlines so a slow Redis doesn't stall requests (decide fail-open vs fail-closed on timeout).

Tradeoffs: local-only is fastest but inaccurate across replicas; Redis is accurate but adds a round trip and a dependency. Fixed window is cheap but allows 2x bursts at boundaries; sliding window is smoother but costs more memory/CPU.

Sample strong answer

I'd run a two-tier limiter: a local token bucket (`x/time/rate`) absorbs the bulk cheaply, and a Redis token bucket implemented in Lua provides the authoritative cross-replica limit. The Lua script atomically refills based on elapsed time and decrements, returning allow/deny plus retry-after. Keys are sharded across a Redis cluster by hashing the client key, so no single node is hot. On Redis timeout I fail open for availability but emit a metric, since a hard dependency on the limiter would otherwise reduce overall availability. I'd expose `X-RateLimit-*` headers and a 429 with `Retry-After`.

2. URL Shortener¶

Problem: Map long URLs to short codes (e.g. bit.ly/abc123), redirect on lookup. Very read-heavy.

Framework applied:

Requirements: create short code, redirect, optional custom alias, analytics; codes are permanent; very high read:write ratio.
Estimates: 100M new URLs/month ~= 40 writes/s, but reads might be 100x = 4000+ reads/s; 6-char base62 = 62^6 ≈ 56 billion codes.
API: POST /shorten {url} -> {short}, GET /{code} -> 301/302 redirect.
Data model: code -> longURL plus metadata; KV store or relational with index on code.
Architecture: generate code via base62 of an auto-increment ID or a hash; a Key Generation Service (KGS) can pre-generate unique codes to avoid collision checks; Redis cache in front of the DB for hot codes.
Deep dive: read path — cache-aside on Redis, DB as source of truth; redirect with 301 (cacheable) vs 302 (lets you keep counting clicks).

Go-specific notes: redirect handlers are trivially stateless behind net/http; cache hot codes in Redis with cache-aside and use singleflight to collapse concurrent misses for the same code so one DB read serves many goroutines. Use a context-bound DB call so slow lookups time out.

Tradeoffs: auto-increment IDs are compact and collision-free but leak volume and need a global counter (shard the ID space or use a range-allocating KGS); hashing avoids a counter but needs collision handling. 301 is cacheable (fast, but you lose click counts); 302 keeps analytics but adds load.

Sample strong answer

Writes allocate a unique 64-bit ID (from a range-allocating KGS so each app node hands out IDs locally without per-request coordination), then base62-encode it to a ~7-char code. The mapping lives in a KV store partitioned by code; a Redis cache fronts reads with cache-aside and `singleflight` to prevent stampedes on a viral link. Reads dominate, so the system is built around the cache and read replicas; I'd use 302 redirects to keep click analytics, pushing counts asynchronously to a queue so the redirect stays fast. Custom aliases are a conditional insert that fails on collision.

3. Distributed Job Queue¶

Problem: Accept background jobs, deliver them reliably to a pool of workers, retry failures, scale workers independently.

Framework applied:

Requirements: durability (no lost jobs), at-least-once delivery, retries with backoff, dead-letter for poison jobs, optional delayed/scheduled jobs, ordering (usually not strict).
Estimates: enqueue rate vs worker throughput sets the needed worker count and queue depth; size for backlog during spikes.
API: Enqueue(job), workers Dequeue/Ack/Nack.
Data model: a durable log/queue (Redis Streams, Kafka, SQS, RabbitMQ); job record with attempt count and status.
Architecture: producers -> broker -> consumer groups; a visibility timeout hides a claimed job until ack, and unacked jobs are reclaimed after timeout (XCLAIM / redelivery).
Deep dive: exactly-once is impractical; do at-least-once + idempotent handlers keyed by job ID. Failures retry with exponential backoff; after max attempts move to a dead-letter queue.

Go-specific notes: a worker is a bounded pool of goroutines reading from a channel fed by the broker — bound it so you don't exhaust memory or downstream connections. Use context with a per-job timeout so a stuck handler is cancelled and the job redelivered. A buffered channel between the fetcher goroutine and workers provides backpressure; stop fetching when the channel is full. On shutdown, stop intake, drain in-flight jobs, and Ack only after success.

jobs := make(chan Job, 100) // backpressure buffer
var wg sync.WaitGroup
for i := 0; i < numWorkers; i++ {
    wg.Add(1)
    go func() {
        defer wg.Done()
        for j := range jobs {
            ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
            if err := handle(ctx, j); err == nil {
                broker.Ack(j.ID)
            } else {
                broker.Nack(j.ID) // redelivered or dead-lettered
            }
            cancel()
        }
    }()
}

Tradeoffs: Redis Streams is simple and fast but Kafka scales partitions/retention better; visibility timeout too short causes duplicate processing, too long delays recovery from crashes. At-least-once shifts the cost to making handlers idempotent.

Sample strong answer

I'd use Redis Streams (or Kafka at larger scale) with consumer groups for load balancing and durability. Each Go worker process runs a bounded pool of goroutines pulling batches via `XREADGROUP`, processing under a per-job `context` timeout, and `XACK`-ing only on success. Handlers are idempotent (deduped on job ID) because at-least-once means redelivery. Failures increment an attempt counter and retry with exponential backoff + jitter; after N attempts the job is `XADD`-ed to a dead-letter stream for inspection. A reaper periodically `XCLAIM`s entries that have been pending past the visibility timeout, recovering jobs from crashed workers. Graceful shutdown drains in-flight jobs before exit.

4. Notification Service¶

Problem: Deliver notifications across multiple channels (push, email, SMS, in-app) reliably, at scale, with retries and user preferences.

Framework applied:

Requirements: multi-channel, per-user preferences/opt-out, deduplication, retries, rate limits per provider, templating, priority (OTP vs marketing).
Estimates: fan-out is the scaling factor — one event can produce millions of notifications (e.g. a broadcast); size the queue and worker fleet for peak fan-out.
API: Notify(userID, template, data, channels); provider webhooks for delivery status.
Data model: user preferences, templates, notification log with status; idempotency keys.
Architecture: ingest -> queue -> fan-out to per-channel workers -> provider adapters (APNs/FCM, SES, Twilio). Decouple ingestion from delivery via the queue so spikes are absorbed.
Deep dive: idempotency keys prevent duplicate sends on retry; per-provider rate limiting and circuit breakers protect against provider throttling/outages; retries use exponential backoff, then dead-letter.

Go-specific notes: model each channel as an adapter behind a common Sender interface; a worker pool per channel respects each provider's rate limit using x/time/rate. context carries deadlines so a slow provider call is cancelled and retried rather than blocking a goroutine indefinitely. Use singleflight or an idempotency store to collapse duplicate sends. Channels (the Go kind) connect the fan-out stage to per-channel queues with backpressure.

Tradeoffs: synchronous send is simple but couples your latency/availability to the provider's; queue-based async adds delivery delay but isolates failures and enables retries. Strict ordering and exactly-once are expensive — prefer at-least-once + idempotency.

Sample strong answer

Ingestion validates and writes the request with an idempotency key, then enqueues. A fan-out worker expands audience and user preferences into per-channel messages onto per-channel queues. Channel workers (bounded goroutine pools) call provider adapters behind a `Sender` interface, rate-limited per provider with `x/time/rate` and guarded by a circuit breaker. Sends are idempotent on the key, retried with exponential backoff, and dead-lettered after max attempts; delivery webhooks update the notification log. High-priority traffic (OTP) uses a separate queue/pool so marketing blasts can't starve it.

5. Distributed Cache¶

Problem: Build a horizontally scalable in-memory cache (think a simplified Memcached/Redis layer) for a read-heavy service.

Framework applied:

Requirements: low-latency get/set, TTL/eviction, scale beyond one node's memory, tolerate node loss, optional consistency with the backing store.
Estimates: total working set / per-node memory = node count; target hit ratio drives DB offload.
API: Get(key), Set(key, val, ttl), Delete(key).
Data model: opaque KV with metadata (TTL, size) for eviction (LRU/LFU).
Architecture: client-side or proxy sharding via consistent hashing so adding/removing a node remaps only ~1/N keys; replication for availability.
Deep dive: hot keys (one key overwhelms its shard) — replicate hot keys or add a small local L1; stampede protection on miss.

Go-specific notes: a single node is a sharded map — partition keys across multiple map+sync.RWMutex shards (or sync.Map) to cut lock contention under high concurrency. Use container/list for an LRU. For multi-node, a consistent-hash ring picks the node; golang.org/x/sync/singleflight collapses concurrent misses so one backend fetch fills many waiters. context bounds backend fetch time.

type shard struct { mu sync.RWMutex; m map[string]entry }
func (c *Cache) shardFor(key string) *shard { return c.shards[fnv32(key)%uint32(len(c.shards))] }

Tradeoffs: consistent hashing minimizes resharding but uneven key distribution causes hot shards (use virtual nodes); replication improves availability but risks stale reads; write-through is consistent but slower than write-back.

Sample strong answer

Each node stores a partitioned in-memory map (N shards, each with its own `RWMutex`) plus an LRU for eviction, which removes the global-lock bottleneck. Clients route keys with a consistent-hash ring using virtual nodes for even spread, so scaling the cluster remaps only a fraction of keys. Misses use `singleflight` to ensure one backend load per key under a thundering herd, and hot keys are detected and replicated across nodes or pinned to a tiny per-client L1. I'd replicate each shard to a secondary for availability, accepting eventual consistency, and bound every backend fetch with `context`.

6. Chat / Messaging Service¶

Problem: Real-time 1:1 and group messaging with delivery, online presence, and message history.

Framework applied:

Requirements: low-latency delivery, ordering within a conversation, persistence/history, presence, delivery/read receipts, offline delivery.
Estimates: concurrent connections drive the gateway fleet (each WebSocket holds a connection); messages/s sizes the queue and storage.
API: WebSocket for send/receive; REST for history (GET /conversations/{id}/messages?before=...).
Data model: messages partitioned by conversation ID, ordered by time/sequence; wide-column (Cassandra) fits the append-heavy, range-by-conversation access pattern.
Architecture: clients hold WebSocket connections to stateless gateway nodes; a routing layer / message bus (per-user channel via Redis pub/sub or Kafka) delivers a message to whichever gateway holds the recipient's socket; persist asynchronously.
Deep dive: a user may connect to a different gateway than the sender — use a presence registry (userID -> gatewayID) so messages route to the right node; offline users get queued messages on reconnect.

Go-specific notes: Go excels here — each connection is a goroutine, cheap enough for hundreds of thousands per node. Per-connection a reader goroutine and a writer goroutine communicate over a buffered channel; a slow client whose buffer fills is dropped (backpressure) rather than blocking the hub. A central hub goroutine plus sync.Map of userID -> connection handles fan-out; context and ping/pong detect dead connections.

type Hub struct{ conns sync.Map } // userID -> *Conn with a send chan
func (c *Conn) send(m Msg) {
    select {
    case c.out <- m: // buffered
    default:         // slow consumer: drop/close
        c.close()
    }
}

Tradeoffs: WebSockets are stateful, complicating load balancing and deploys (drain connections gracefully); strong ordering needs per-conversation sequencing; storing every message is write-heavy, so pick a store tuned for sequential writes.

Sample strong answer

Clients connect via WebSocket to stateless gateways; each connection is two goroutines (reader/writer) bridged by a buffered send channel, so a slow client applies backpressure and is dropped rather than stalling others. A presence registry in Redis maps users to their current gateway, and messages route over a pub/sub bus to the recipient's gateway. Messages persist asynchronously to a wide-column store partitioned by conversation ID and ordered by a per-conversation sequence number, which also gives idempotent dedup. Offline recipients have messages queued and replayed on reconnect; receipts are just additional lightweight messages. Deploys drain connections gracefully.

7. News Feed¶

Problem: Build a home timeline that aggregates posts from accounts a user follows (Twitter/Instagram-style).

Framework applied:

Requirements: generate a ranked feed, low read latency, handle high-fanout accounts (celebrities), near-real-time freshness.
Estimates: read:write is huge (feeds read constantly); a celebrity post fans out to millions of followers — the core scaling problem.
API: GET /feed?cursor=..., POST /post.
Data model: posts store; per-user feed cache (list of post IDs); follower graph.
Architecture: fan-out on write (push) precomputes each follower's feed for fast reads, vs fan-out on read (pull) which assembles the feed at query time. Most designs use a hybrid: push for normal users, pull for celebrity accounts.
Deep dive: the celebrity problem — pushing to 50M feeds per post is wasteful; instead pull their posts at read time and merge with the pushed feed.

Go-specific notes: fan-out is an embarrassingly parallel job for a bounded worker pool writing to follower feed caches; bound concurrency to protect the cache/DB. Use channels to stream follower batches to workers and context to cancel a stuck fan-out. Feed assembly merges the precomputed list with freshly pulled celebrity posts, easily parallelized with goroutines + a sync.WaitGroup.

Tradeoffs: push gives fast reads but expensive, write-amplified, and wasteful for inactive users; pull gives cheap writes but slow, expensive reads; hybrid balances them at the cost of complexity. Ranking adds latency, so often precompute or rank a candidate set.

Sample strong answer

I'd use a hybrid fan-out. For ordinary authors, a write enqueues a fan-out job; a bounded pool of Go workers appends the post ID to each active follower's feed list in Redis (skipping long-inactive users to limit write amplification). For celebrity authors, I skip fan-out and instead pull their recent posts at read time, merging them with the precomputed feed and re-ranking the top candidates. Reads hit the cached feed list, hydrate post IDs from a cache/DB, and stream back with cursor pagination. Goroutines parallelize hydration and the celebrity merge under a `context` deadline; the follower graph and feeds are sharded by user ID.

8. Metrics Ingestion Pipeline¶

Problem: Ingest high-volume time-series metrics from many agents, store them, and serve aggregation queries (a Prometheus/Datadog-lite).

Framework applied:

Requirements: very high write throughput, downsampling/rollups, retention tiers, range + aggregation queries, tolerate bursts and late data.
Estimates: millions of data points/s; storage dominated by writes — compression and rollups are essential (raw at high res short-term, downsampled long-term).
API: POST /ingest (batched), GET /query?metric=&from=&to=&agg=.
Data model: time-series: (metric, labels, timestamp, value); a TSDB (Prometheus/InfluxDB/Cassandra) partitioned by metric/series and time windows.
Architecture: agents -> ingest gateway (batch + validate) -> durable buffer (Kafka) -> writers -> TSDB; query layer reads rollups. Buffer absorbs spikes and decouples ingest from storage.
Deep dive: batching and backpressure — ingest must shed or buffer load without falling over; rollup/compaction jobs reduce long-term storage.

Go-specific notes: the ingest path is a classic Go pipeline — receive goroutines parse batches into channels, a pool of writer goroutines flush to storage in batches. Use buffered channels for backpressure: when full, return 429 rather than OOM. atomic counters aggregate in-memory before flush; time.Ticker triggers periodic flushes; context cancels on shutdown after draining. Shard series by hash of metric+labels so writers don't contend.

batch := make([]Point, 0, 1000)
ticker := time.NewTicker(time.Second)
for {
    select {
    case p := <-in:
        if batch = append(batch, p); len(batch) >= 1000 { flush(batch); batch = batch[:0] }
    case <-ticker.C:
        if len(batch) > 0 { flush(batch); batch = batch[:0] }
    case <-ctx.Done():
        flush(batch); return
    }
}

Tradeoffs: batching boosts throughput but adds latency and risks loss of an unflushed buffer (durable WAL/Kafka mitigates); high-resolution retention is costly, so downsample; aggressive sharding helps writes but complicates cross-series queries.

Sample strong answer

Agents send compressed batches to a stateless ingest gateway that validates and writes to Kafka, partitioned by series hash so order is preserved per series and load spreads evenly. A pool of Go writer goroutines consumes partitions, accumulates points, and flushes time-and-size-based batches to the TSDB, applying backpressure (429) when buffers fill. Compaction jobs downsample raw data into 1m/1h rollups across retention tiers, so queries over long ranges read cheap aggregates while recent queries hit high-resolution data. The query layer fans out to the relevant shards and merges. Everything is `context`-cancellable and drains on shutdown.

9. Web Crawler¶

Problem: Crawl a large set of web pages, extract links, and store content, while being polite and avoiding loops.

Framework applied:

Requirements: breadth of coverage, politeness (per-domain rate limits, robots.txt), dedup (don't re-crawl), freshness/recrawl policy, fault tolerance.
Estimates: pages/day target vs per-fetch latency = needed concurrency; URL frontier and seen-set size drive storage.
API: internal — Seed(urls), frontier Next()/Done().
Data model: URL frontier (queue, often prioritized and partitioned by domain), a "seen" set (Bloom filter + store) for dedup, content/blob store.
Architecture: frontier -> fetcher pool -> parser -> extract links -> dedup -> back into frontier; partition the frontier by domain so one domain's politeness delay doesn't block others.
Deep dive: politeness — per-domain rate limiting and robots.txt; dedup at URL and content (hash) level to avoid loops and mirrors.

Go-specific notes: the canonical fan-out/fan-in pipeline. A bounded fetcher pool (goroutines) prevents opening unlimited sockets; per-domain x/time/rate limiters enforce politeness; context with timeout bounds slow fetches. Channels connect frontier -> fetchers -> parser; a sync.Map/Bloom filter tracks seen URLs. golang.org/x/sync/errgroup coordinates the pipeline and propagates cancellation.

Tradeoffs: more concurrency = faster but risks getting blocked and overwhelming the dedup store; an exact seen-set is memory-heavy (Bloom filter trades a small false-positive rate for huge memory savings, occasionally skipping a page); strict politeness lowers throughput per domain.

Sample strong answer

A partitioned frontier (by domain) feeds a bounded pool of fetcher goroutines, each respecting a per-domain token-bucket limiter and robots.txt, with a `context` timeout per request. Fetched pages flow over a channel to parser goroutines that extract and normalize links; a Bloom filter (backed by a durable seen-store) drops already-seen URLs cheaply before re-enqueuing new ones. Content is hashed to dedup mirrors. The whole pipeline is wired with `errgroup` so cancellation propagates and a crash recovers from the persisted frontier. Concurrency is tuned to maximize throughput without tripping rate limits or exhausting the dedup store.

10. Leaderboard / Counting Service¶

Problem: Maintain a real-time ranked leaderboard (top-N and a player's rank) over frequently changing scores at scale.

Framework applied:

Requirements: update score, get top-N, get a specific player's rank, low latency, handle many concurrent updates; approximate vs exact ranks.
Estimates: updates/s and players count; top-N reads are frequent and must be O(log N) or cached.
API: AddScore(player, delta), Top(n), Rank(player).
Data model: a sorted set (Redis ZADD/ZREVRANGE/ZREVRANK) keyed by leaderboard, member=player, score=points.
Architecture: writes update the sorted set; very hot boards are sharded and merged, or fronted by an aggregation buffer; periodic snapshots for history.
Deep dive: hot-key contention on one giant board — shard players across multiple sorted sets and merge top-N, or batch increments in-memory before flushing.

Go-specific notes: Redis sorted sets do the ranking; the Go service batches and coalesces frequent increments to cut write amplification — an in-memory map[player]delta guarded by a sync.Mutex (or sharded maps + atomic) flushed periodically by a time.Ticker goroutine. singleflight caches top-N for a sub-second window so a flood of reads collapses to one Redis call. context bounds Redis ops.

Tradeoffs: a single Redis sorted set is simple and exact but a hot-key bottleneck at extreme scale; sharding scales writes but makes exact global rank harder (merge/estimate); batching increments boosts throughput but makes scores eventually consistent for a short window.

Sample strong answer

I'd back the leaderboard with a Redis sorted set: `ZINCRBY` for updates, `ZREVRANGE 0 N` for top-N, `ZREVRANK` for a player's rank — all O(log N). To handle update floods, the Go service coalesces increments in a sharded in-memory buffer and flushes batched `ZINCRBY`s on a ticker, trading a brief eventual-consistency window for far fewer Redis writes. Top-N is cached for ~1s via `singleflight` so read spikes hit Redis once. For a massive global board I'd shard players across K sorted sets and merge the per-shard top-N (each shard's local top contains the global top), accepting approximate ranks deep in the list. Periodic snapshots persist history.