Table of Contents

Concurrency Questions
Table of Contents
Goroutines
Channels
select
The sync Package
context
Patterns: Worker Pools & Pipelines
Races & Deadlocks
The GMP Scheduler
Goroutine Leaks

Concurrency Questions¶

25+ questions on goroutines, channels, select, the sync package, context, the GMP scheduler, and the classic leak/race traps. Difficulty: 🟢 junior · 🟡 mid · 🔴 senior.

Goroutines¶

🟢 What is a goroutine and how is it different from an OS thread?

A goroutine is a lightweight, runtime-managed thread of execution started with the `go` keyword. It begins with a tiny (~8KB) growable stack, whereas an OS thread reserves a large fixed stack (often 1–8MB), so you can run hundreds of thousands of goroutines. The Go runtime multiplexes many goroutines onto a small number of OS threads via its scheduler, and switching between goroutines is much cheaper than a kernel thread context switch. You don't manage threads directly; you just create goroutines and let the scheduler place them.

🟡 How do you wait for goroutines to finish?

The idiomatic tool is `sync.WaitGroup`: call `wg.Add(n)` before launching, `wg.Done()` (usually deferred) inside each goroutine, and `wg.Wait()` to block until the counter hits zero.

var wg sync.WaitGroup
for _, job := range jobs {
    wg.Add(1)
    go func(j Job) { defer wg.Done(); process(j) }(job)
}
wg.Wait()

For collecting errors, `golang.org/x/sync/errgroup` wraps a WaitGroup and returns the first error while cancelling a shared context. Avoid ad-hoc `time.Sleep` for synchronization — it is a race waiting to happen.

🔴 What is the classic loop-variable capture bug with goroutines, and what changed in Go 1.22?

Before Go 1.22, a `for` loop reused a single loop variable across iterations, so closures launched as goroutines all captured the same variable and typically observed its final value:

for _, v := range items {
    go func() { fmt.Println(v) }() // pre-1.22: often prints last item repeatedly
}

The classic fixes are to pass the variable as an argument (`go func(v Item){...}(v)`) or shadow it (`v := v`). Go 1.22 changed the semantics so each iteration gets a fresh loop variable, eliminating the bug for new code — but you must know your target Go version, and the argument-passing idiom remains the clearest, version-independent fix.

🟡 Does `main` wait for goroutines to finish before exiting?

No. When the `main` function returns, the program exits immediately and any still-running goroutines are killed abruptly without running their deferred functions. This is why you need explicit synchronization (`WaitGroup`, channels, `errgroup`) to wait for work to complete. A common beginner bug is launching goroutines and seeing no output because `main` returned first.

Channels¶

🟢 What is a channel and what is the difference between buffered and unbuffered?

A channel is a typed conduit for sending and receiving values between goroutines, providing both communication and synchronization. An unbuffered channel (`make(chan T)`) is synchronous: a send blocks until another goroutine receives, so it's a rendezvous. A buffered channel (`make(chan T, n)`) lets up to `n` sends proceed without a matching receiver; sends block only when the buffer is full and receives block only when it's empty. Use unbuffered channels when you want a handoff guarantee, buffered when you want to decouple producer and consumer rates.

🟡 What are the "channel axioms" — behavior of nil and closed channels?

These rules are worth memorizing: - Send to or receive from a **nil** channel blocks **forever**. - Send to a **closed** channel **panics**. - Receive from a **closed** channel returns immediately with the zero value; use the comma-ok form (`v, ok := <-ch`) where `ok == false` signals closed and drained. - Closing a nil channel or closing an already-closed channel **panics**. The nil-blocks-forever rule is actually useful: setting a channel variable to `nil` disables its case in a `select`.

🟡 Who should close a channel, and why?

The sender closes, never the receiver, because only the sender knows when no more values will be sent, and sending on a closed channel panics. With multiple senders, no single sender should close; instead coordinate with a separate "done" signal or a `sync.WaitGroup` plus a dedicated closer goroutine, or use `context` cancellation. Closing is a broadcast that the stream is finished — it is optional (you don't have to close every channel) and mainly needed to signal `range`/`for` receivers to stop.

🟢 How do you range over a channel, and when does the loop end?

`for v := range ch` receives values until the channel is closed and drained, then exits the loop cleanly.

for v := range ch {
    process(v)
}
// reached only after ch is closed and empty

If the channel is never closed and no more values arrive, the loop blocks forever — a frequent leak source. So `range` over a channel implies a contract that some sender will eventually `close(ch)`.

🔴 What happens when you send on a closed channel, and how do you avoid it?

It panics with `"send on closed channel"`, and this panic cannot be safely prevented with a `select`. The fix is structural: establish clear ownership so the closing goroutine is the only sender, or use a separate cancellation channel/`context` to tell senders to stop and let the closer close only after all senders have exited (coordinate with a `WaitGroup`). Never use `recover` as a routine guard around sends; redesign so the close happens after senders are known to be done.

🔴 What is the difference between closing a channel and sending a value to signal completion?

Closing is a one-time broadcast: every current and future receiver observes it (receives return immediately with `ok == false`), which makes `close(done)` the idiomatic way to signal "stop" to many goroutines at once. Sending a value reaches exactly one receiver per send, so it can't broadcast. That is why done/cancellation channels are signalled by `close`, often using `chan struct{}` since the value carries no data. `context.Done()` is built on exactly this closed-channel broadcast.

select¶

🟢 What does `select` do?

`select` waits on multiple channel operations and proceeds with one that is ready; if several are ready it chooses one at random (to avoid starvation). It is the core multiplexing primitive for concurrency — letting a goroutine respond to whichever of several events happens first. A `default` case makes the `select` non-blocking (it runs if nothing else is ready), and an empty `select{}` blocks forever.

select {
case v := <-in:
    handle(v)
case <-ctx.Done():
    return ctx.Err()
}

🟡 How do you implement a non-blocking send or receive?

Add a `default` case so the `select` doesn't block when no channel is ready:

select {
case ch <- v:
    // sent
default:
    // would block — drop, buffer, or handle backpressure
}

This is how you implement "try send" semantics, e.g. dropping metrics under load rather than blocking the hot path. A non-blocking receive uses the same pattern with `case v := <-ch`.

🔴 How do you add a timeout to a channel operation, and what is the leak with `time.After`?

Use a timeout case in a `select`:

select {
case v := <-ch:
    use(v)
case <-time.After(2 * time.Second):
    return errTimeout
}

The subtle leak: `time.After` creates a `Timer` that isn't garbage-collected until it fires, so in a hot loop or a long-lived `select` that usually takes the other branch, you accumulate pending timers and memory. For repeated use, create a `time.NewTimer`, `Stop()` and `Reset()` it, or prefer a `context.WithTimeout` whose cancel you `defer`. (Go 1.23 made `time.After` timers collectable earlier, but `context` is still the cleaner pattern.)

🔴 How does setting a channel to nil interact with select?

A `nil` channel's case in a `select` is never ready (send/receive on nil blocks forever), so assigning `nil` to a channel variable dynamically disables that branch. This is a powerful idiom for state machines: once an input channel is drained/closed, set it to `nil` so its case stops firing while other cases keep working, avoiding a busy-spin on a closed channel that always returns immediately.

for in != nil || out != nil {
    select {
    case v, ok := <-in:
        if !ok { in = nil; continue } // disable this case
        buf = append(buf, v)
    case out <- next(buf):
        // ...
    }
}

The sync Package¶

🟢 When should you use a mutex instead of a channel?

Use a mutex (`sync.Mutex`) to protect shared state when the goroutines are simply guarding access to a piece of memory — a counter, a cache map, a config struct. Use channels when you are passing ownership of data or coordinating the flow of work between goroutines ("share memory by communicating"). The Go proverb is "don't communicate by sharing memory; share memory by communicating," but mutexes are perfectly idiomatic and often simpler/faster for plain shared-state protection. Reach for whichever models the problem most clearly.

🟡 What is the difference between `sync.Mutex` and `sync.RWMutex`?

A `Mutex` allows one holder at a time for any access. An `RWMutex` distinguishes readers from writers: many readers can hold `RLock` concurrently, but a writer's `Lock` is exclusive and blocks all readers and writers. Use `RWMutex` for read-heavy workloads where reads vastly outnumber writes. It has more overhead than a plain `Mutex`, so for low-contention or write-heavy data a plain `Mutex` is often faster; measure rather than assume.

🔴 Why must you not copy a sync.Mutex (or a struct containing one)?

A `Mutex` contains internal state (lock bits, waiter counts); copying it duplicates that state, so the copy and original protect different "locks" and the mutual exclusion guarantee is broken — you can get two goroutines in the "critical section" simultaneously. This is why you pass such structs by pointer and why `go vet` flags lock copies. The same applies to `sync.WaitGroup`, `sync.Once`, `sync.Cond`, and anything embedding them.

🟡 What does `sync.Once` guarantee?

`sync.Once` ensures a function runs exactly once, even under concurrent calls, and that all callers observe the completed side effects before `Do` returns (it establishes a happens-before relationship). It's the idiomatic way to do lazy, thread-safe initialization of a singleton, config, or connection.

var once sync.Once
var conn *DB
func Get() *DB {
    once.Do(func() { conn = connect() })
    return conn
}

If the function panics, `Once` still considers it "done" and won't retry, so handle errors carefully (Go 1.21 added `OnceFunc`/`OnceValue` helpers).

🔴 When should you use `sync/atomic` instead of a mutex?

Use atomics for simple single-word operations — counters, flags, swapping a pointer — where a full mutex would be overkill and you want lock-free performance under contention. `atomic.AddInt64`, `atomic.CompareAndSwap`, and the typed `atomic.Int64`/`atomic.Pointer[T]` wrappers (Go 1.19+) provide this. The catch: atomics only make individual operations atomic, not sequences, so anything requiring an invariant across multiple variables still needs a mutex. Atomics are easy to misuse and create subtle memory-ordering bugs; prefer a mutex unless profiling shows it matters.

🟡 What is `sync.WaitGroup` and what is the common misuse?

`WaitGroup` counts outstanding work: `Add` increments, `Done` decrements, `Wait` blocks until zero. The cardinal rule is to call `Add` **before** launching the goroutine, never inside it — calling `Add` inside the goroutine races with `Wait` and may let `Wait` return before the goroutine even started. Also never copy a `WaitGroup`, pass it by pointer, and make sure every path calls `Done` (use `defer wg.Done()`), or `Wait` blocks forever.

context¶

🟢 What is `context.Context` used for?

`context` carries deadlines, cancellation signals, and request-scoped values across API boundaries and goroutines. Its primary job is propagating cancellation: when a request is canceled or times out, every goroutine and downstream call working on it can observe `ctx.Done()` and stop, freeing resources. By convention it is the first parameter (`ctx context.Context`) and is never stored in a struct. It is the standard mechanism that ties together timeouts, deadlines, and graceful shutdown.

🟡 What are the main context constructors and how do you use them?

`context.Background()` is the empty root (for `main`, init, tests); `context.TODO()` is a placeholder when you haven't wired context through yet. You derive children with `WithCancel`, `WithTimeout`, `WithDeadline`, and `WithValue`. The cancel-returning ones must have their `cancel` called to release resources, even if the work finished, hence the ubiquitous `defer cancel()`.

ctx, cancel := context.WithTimeout(parent, 3*time.Second)
defer cancel()
result, err := doRPC(ctx, req)

🔴 What happens if you don't call the cancel function returned by WithCancel/WithTimeout?

You leak resources: the child context (and its timer, for `WithTimeout`) stays registered with its parent until the parent is canceled, so the cancellation tree grows and timers aren't released — `go vet` warns about this. Calling `cancel()` is cheap and idempotent; `defer cancel()` immediately after creating the context is the correct habit even when the operation completes normally. Forgetting it is one of the most common context bugs and shows up as slow memory growth in long-lived servers.

🔴 How should context values be used, and what's the anti-pattern?

`context.WithValue` should carry only request-scoped data that crosses API boundaries — request IDs, auth/trace info — keyed by an unexported custom type to avoid collisions. The anti-pattern is using it as a general-purpose bag to pass optional function parameters or dependencies; that hides the data flow, defeats static typing, and makes code hard to follow. Required inputs belong in explicit function parameters; context values are for ambient, cross-cutting metadata only.

type ctxKey struct{}
ctx = context.WithValue(ctx, ctxKey{}, requestID)
id, _ := ctx.Value(ctxKey{}).(string)

🟡 How does a goroutine actually respond to context cancellation?

It must explicitly check, because cancellation is cooperative — nothing forcibly stops a goroutine. Long-running loops poll `ctx.Err()` or select on `ctx.Done()`, and blocking calls accept the context so they unblock on cancel:

select {
case <-ctx.Done():
    return ctx.Err()
case work := <-jobs:
    handle(work)
}

Standard-library I/O (`net`, `database/sql`, `http`) accepts contexts and aborts when canceled. Code that ignores its context can't be canceled and will leak.

Patterns: Worker Pools & Pipelines¶

🟡 How do you implement a bounded worker pool in Go?

Launch a fixed number of worker goroutines that all read from a shared jobs channel and write to a results channel; close `jobs` when all work is queued, and use a `WaitGroup` to know when to close `results`.

jobs := make(chan Job)
results := make(chan Result)
var wg sync.WaitGroup
for i := 0; i < numWorkers; i++ {
    wg.Add(1)
    go func() {
        defer wg.Done()
        for j := range jobs { results <- process(j) }
    }()
}
go func() { wg.Wait(); close(results) }() // close results after workers exit
for _, j := range allJobs { jobs <- j }
close(jobs)
for r := range results { collect(r) }

Bounding the worker count limits concurrency (e.g. DB connections, CPU), and passing a `context` lets you cancel early.

🔴 What is a fan-out / fan-in pipeline?

Fan-out launches multiple goroutines reading from the same input channel to parallelize a stage; fan-in merges multiple output channels back into one. Each stage is a function taking input channels and returning output channels, connected by channels, with a shared `context` for cancellation. The merge stage uses a `WaitGroup` per source and closes the merged channel once all sources are drained. The key correctness concerns are closing channels exactly once by the owning goroutine and ensuring every stage exits on cancellation so nothing leaks.

🟡 Why bound concurrency with a worker pool or semaphore instead of spawning a goroutine per task?

Goroutines are cheap but not free: each consumes memory and, more importantly, each may grab a scarce downstream resource (a DB connection, a file handle, an upstream API rate budget). Spawning one per incoming request under load can exhaust connections, blow memory, or overwhelm a dependency, causing cascading failures. A worker pool or a `golang.org/x/sync/semaphore` (or a buffered channel used as a token bucket) caps in-flight work to a safe level, providing backpressure. Unbounded goroutine creation is a classic production outage cause.

Races & Deadlocks¶

🟢 What is a data race and how do you detect it?

A data race occurs when two or more goroutines access the same memory concurrently, at least one of them writes, and there's no synchronization ordering the accesses. The result is undefined behavior — torn reads, lost updates, or crashes. Go ships a race detector: run `go test -race`, `go run -race`, or build with `-race` and it instruments memory accesses to report races at runtime with stack traces. It only finds races that actually execute, so run it across your test suite and realistic workloads; it roughly 2–10x's CPU/memory, so it's a testing tool, not for production.

🟡 What is a deadlock and what are common causes in Go?

A deadlock is when goroutines are all blocked waiting on each other so no progress is possible; if every goroutine is blocked, the runtime detects it and panics with `"all goroutines are asleep - deadlock!"`. Common causes: sending on an unbuffered channel with no receiver (and vice versa), forgetting to `close` a channel that a `range` waits on, lock-ordering inversion (goroutine A holds lock1 wants lock2 while B holds lock2 wants lock1), and a `WaitGroup.Wait` whose `Done` is never reached. Note the runtime only catches the case where *all* goroutines are stuck — a partial deadlock leaks silently.

🔴 How do you prevent lock-ordering deadlocks?

Establish a global lock acquisition order and always acquire multiple locks in that same order across the whole program, so no two goroutines can hold-and-wait in a cycle. Keep critical sections small and avoid calling out to unknown code (callbacks, channel sends) while holding a lock, since that can block on something that needs the lock. Where possible, restructure to need only one lock at a time, or use a single coarser lock, or copy the data you need and release the lock before doing slow work. `go vet`'s lock analysis and the race detector help, but discipline in ordering is the real fix.

🟡 Why might a mutex'd counter still be wrong if you forget the lock on the read?

Synchronization must guard every access, reads included. If writers lock but a reader reads without the lock, that read races with concurrent writes: it may observe a torn or stale value, and the memory model gives no happens-before guarantee, so the value might never become visible. Both the read and the write must use the same lock (or atomics) to be safe. This is a frequent subtle bug — people lock the "obvious" mutation but read the field bare elsewhere.

The GMP Scheduler¶

🟡 What are G, M, and P in the Go scheduler?

G is a goroutine (its stack and state), M is a machine — an OS thread that actually executes code, and P is a processor — a scheduling context that holds a run queue of runnable goroutines and the resources needed to run Go code. To execute a goroutine, an M must hold a P. The number of Ps is set by `GOMAXPROCS` (default: number of CPU cores), which caps how many goroutines run Go code in parallel. This G-M-P design lets the runtime schedule millions of Gs over a few Ms efficiently.

🔴 What is work stealing in the Go scheduler?

Each P has its own local run queue of runnable goroutines, which avoids contention on a single global queue. When a P empties its local queue, instead of going idle it "steals" half the goroutines from another P's queue (and also checks the global queue and the network poller). This keeps all Ps busy and balances load without central coordination. It's why Go scales well across cores with minimal scheduling overhead.

🔴 What is the difference between GOMAXPROCS and the number of OS threads?

`GOMAXPROCS` is the number of Ps — the maximum number of goroutines executing Go code simultaneously — and defaults to the CPU count. The number of OS threads (Ms) can be larger: when a goroutine makes a blocking syscall, the runtime detaches its M from the P so the P can run other goroutines on a different M, so you may have more Ms than Ps. Pure Go code parallelism is bounded by `GOMAXPROCS`, but blocking syscalls and cgo can spin up additional threads. In containers, set `GOMAXPROCS` to match the CPU quota (e.g. via `automaxprocs`) or you may oversubscribe.

🔴 How does the Go scheduler handle a goroutine that blocks on a syscall vs one that runs a long CPU loop?

On a blocking syscall, the runtime hands the goroutine's M off to the kernel and detaches the P, which is picked up by another M (new or parked) so other goroutines keep running; when the syscall returns the goroutine is rescheduled. For a long-running CPU loop, the scheduler relies on preemption: originally Go used cooperative preemption only at function-call safe points, so a tight loop with no calls could starve others. Since Go 1.14, asynchronous preemption uses signals to interrupt such goroutines at non-safe points, so a pure compute loop no longer hogs a P indefinitely.

🟡 What does `runtime.Gosched()` do, and should you use it?

`runtime.Gosched()` voluntarily yields the processor, putting the current goroutine back on the run queue so another can run, without blocking it. With modern asynchronous preemption you almost never need it — the scheduler already preempts long-running goroutines. It's occasionally useful in tight CPU loops on older runtimes or in specific benchmarking/testing scenarios, but reaching for it in normal code usually signals a design problem (e.g. busy-waiting that should be a channel/condition variable instead).

Goroutine Leaks¶

🟡 What is a goroutine leak and why is it dangerous?

A goroutine leak is a goroutine that blocks forever and is never reclaimed, because the runtime can't garbage-collect a goroutine that's still "alive" but stuck. Each leaked goroutine holds its stack and anything its closure references, so leaks slowly consume memory and can exhaust resources, eventually degrading or crashing a long-running server. Unlike a one-off bug, leaks accumulate over time and under load, making them insidious in production. Monitor `runtime.NumGoroutine()` and use tools like `goleak` in tests to catch them.

🔴 What is the most common goroutine-leak pattern with channels?

A goroutine blocked sending on a channel whose receiver has gone away (or blocked receiving from a channel that's never written/closed). Classic case: a function launches a goroutine that sends a result, but the caller returns early (timeout/error) and never receives, so the goroutine blocks on the send forever.

func bad(ctx context.Context) (int, error) {
    ch := make(chan int) // unbuffered!
    go func() { ch <- expensive() }() // leaks if caller stops listening
    select {
    case v := <-ch:
        return v, nil
    case <-ctx.Done():
        return 0, ctx.Err() // goroutine now blocked forever on ch<-
    }
}

Fixes: make the channel buffered (size 1) so the send always completes, or thread the context so the goroutine itself can give up. Always ask "what unblocks this goroutine on every path?"

🔴 How do you make a goroutine cancellable to prevent leaks?

Give it a way to be told to stop and make it select on that. Pass a `context.Context` (or a `done chan struct{}`) and have the goroutine return when it's canceled, including on every blocking operation:

go func() {
    for {
        select {
        case <-ctx.Done():
            return // cleanup and exit
        case job := <-jobs:
            process(ctx, job)
        }
    }
}()

The owner of the goroutine is responsible for eventually canceling the context (or closing `done`). The rule of thumb: every goroutine you start should have a clear, guaranteed termination condition.

🟡 How do you detect and debug goroutine leaks?

Watch `runtime.NumGoroutine()` over time — a steadily climbing count under steady load signals a leak. Capture a goroutine profile via `net/http/pprof` (`go tool pprof http://.../debug/pprof/goroutine`) or send `SIGQUIT` to dump all stacks; goroutines blocked at the same spot for a long time point to the leak site. In tests, `go.uber.org/goleak` fails a test if goroutines outlive it. The stack dump's "minutes" annotation on long-blocked goroutines is especially telling.

🔴 Why is launching a goroutine without knowing how it ends an anti-pattern?

Because in Go nothing cleans up a goroutine for you — if you can't state the precise condition under which it returns, you've likely created a leak or a race. Every `go` statement should come with an answer to "who stops this, and when?": a closed channel it ranges over, a context it watches, a bounded loop, or a `WaitGroup` the owner waits on. This discipline (sometimes phrased "never start a goroutine without knowing when it will stop") prevents the most common production concurrency bugs and makes shutdown deterministic.