Skip to content

Table of Contents

06 — Redis Job Queue

A Redis-backed background job/worker system in Go: enqueue typed jobs, process them on a bounded worker pool, retry with exponential backoff, dead-letter the poison ones, and watch it all through Prometheus.

status Go tests

Note This project is its own Go module (github.com/nabin747/go-from-zero/projects/06-job-queue-redis) and is intentionally excluded from the repo-root CI. Before building or testing, run go mod tidy to resolve dependencies. The worker and producer binaries need a running Redis; the unit tests do not (they run against an in-memory fake).

Overview

This is a simplified, build-it-yourself take on production task queues like asynq, Sidekiq, and Celery. Producers enqueue JSON-serialized jobs; a pool of worker goroutines pulls them off Redis, dispatches each to a registered handler by type, and applies the policy that makes a queue actually reliable: retry with exponential backoff + jitter, a max-attempts cap, and a dead-letter queue (DLQ) for jobs that exhaust their attempts. Everything is instrumented with Prometheus. See the full SPEC for the larger design (scheduling, recovery, priorities, ADRs).

The interesting part of a job queue is never "push and pop" — it is what happens at the edges: handler failures, backoff, poison messages, and shutting down without dropping work. This implementation focuses on exactly those.

Demo

# 1. start Redis
make redis                     # docker run ... redis:7-alpine

# 2. start the worker pool (separate terminal)
make worker                    # serves /metrics on :2112

# 3. enqueue some jobs
go run ./cmd/producer --type email:welcome --payload '{"user_id":42}' --count 5
go run ./cmd/producer --type demo:flaky   --count 20   # exercises retries + DLQ

# 4. watch the metrics
curl -s localhost:2112/metrics | grep jobqueue_
curl -s localhost:2112/stats                            # {"pending":..,"scheduled":..,"dead":..}

Architecture

flowchart LR
    P[producer CLI / app] -->|Enqueue JSON| PEND[("Redis LIST: jq:pending")]

    subgraph Redis
        PEND
        SCHED[("ZSET: jq:scheduled\nscore = run-at ms")]
        DEAD[("LIST: jq:dead (DLQ)")]
    end

    subgraph Pool["Worker pool (bounded concurrency)"]
        direction TB
        DQ[BLPOP pending] --> DISP[dispatch by Type\nvia Registry]
        DISP --> H[Handler]
        H --> R{result?}
    end

    PEND -->|promote due| DQ
    SCHED -->|ZRANGEBYSCORE now| PEND
    R -->|nil| OK[ack: jobs_processed_total++]
    R -->|err & attempt < max| RETRY[backoff+jitter\nEnqueueIn -> ZSET\njobs_retried_total++]
    RETRY --> SCHED
    R -->|err & attempt >= max\nor SkipRetry| DLQ[DeadLetter\njobs_failed_total++]
    DLQ --> DEAD

    H -.observe.-> HIST[processing_duration histogram]

A small Queue port (internal/queue) hides the broker behind an interface. The default implementation is Redis (RPUSH/BLPOP + a scheduled ZSET for backoff); an in-memory fake implements the same interface so the worker/retry logic is unit-tested without Redis. The worker pool depends on a narrow, consumer-defined subset of that interface (Dequeue / EnqueueIn / DeadLetter), which both implementations satisfy.

Tech Stack

Go 1.22 · go-redis/v9 · prometheus/client_golang · log/slog · Docker · docker compose

Getting Started

Prerequisites

  • Go 1.22+
  • Docker (for Redis, and optionally the compose stack)
  • go mod tidy once, to resolve/pin dependencies (this module ships go.mod with the direct requires; tidy fills in go.sum and indirect deps)

Run

git clone https://github.com/nabin747/go-from-zero
cd go-from-zero/projects/06-job-queue-redis
go mod tidy

# Redis: either a throwaway container...
docker run --rm -d --name jobqueue-redis -p 6379:6379 redis:7-alpine
# ...or the full stack (redis + worker):
make compose-up

make worker                                  # JQ_REDIS_ADDR, JQ_CONCURRENCY configurable
go run ./cmd/producer --type demo:flaky --count 10

Configuration (flags or env):

Flag Env Default Description
--addr JQ_REDIS_ADDR localhost:6379 Redis address
--concurrency JQ_CONCURRENCY 10 worker goroutines (worker only)
--metrics-addr JQ_METRICS_ADDR :2112 metrics/health listen addr

Test

go test -race -cover ./...

Unit tests run entirely against the in-memory Queue fake — no Redis required.

Project Layout

cmd/
  worker/      worker pool binary: registers handlers, exposes /metrics, Run()
  producer/    demo enqueuer CLI
internal/
  job/         Job type, (de)serialization, Handler, handler Registry
  queue/       Queue port + Redis impl (RPUSH/BLPOP/ZSET) + in-memory fake
  worker/      bounded worker pool: dequeue -> dispatch -> retry/backoff/DLQ
  metrics/     Prometheus counters + processing-duration histogram
Dockerfile             multi-stage build for worker + producer (distroless)
docker-compose.yml     redis + worker
Makefile               tidy / build / test / race / run / compose targets

API

HTTP surface exposed by the worker (default :2112):

Method Path Description
GET /metrics Prometheus exposition
GET /healthz 200 if Redis is reachable
GET /stats JSON: pending / scheduled / dead counts

Prometheus metrics:

Metric Type Labels Meaning
jobqueue_jobs_processed_total counter type jobs completed successfully
jobqueue_jobs_failed_total counter type jobs moved to the DLQ
jobqueue_jobs_retried_total counter type retries scheduled
jobqueue_job_processing_duration_seconds histogram type handler execution time

Testing Strategy

  • Unit (table-driven, stdlib testing) against the in-memory Queue fake: success-first-try, succeed-after-N-retries, exhaust-to-DLQ, SkipRetry-to-DLQ, and unknown-type-to-DLQ; plus a graceful-shutdown test that asserts an in-flight job is allowed to finish during the drain.
  • Backoff bounds/jitter are property-checked across attempts.
  • Run with -race to validate the concurrent dequeue/process paths.
  • Real-Redis integration (BLPOP blocking, scheduler timing) is a natural next step via testcontainers-go; see the SPEC.

Lessons Learned

Prompts to write up (see the SPEC for the full list):

  1. Why is exactly-once delivery practically impossible here, and what does at-least-once force every handler author to do (idempotency)?
  2. Walk through what happens to an in-flight job when a worker is kill -9'd: which Redis key holds it, and how would a recovery janitor re-queue it?
  3. Why exponential backoff with jitter rather than a fixed delay? What problem does the jitter solve under load (thundering herd)?
  4. How does the bounded worker pool apply backpressure when handlers are slower than the enqueue rate, and how would you alert on a queue falling behind?
  5. Why must poison messages be quarantined in a DLQ rather than retried forever?

Future Improvements

  • Reliable-queue dequeue (BLMOVE pending -> processing) + a recovery janitor for crashed workers (the SPEC's headline reliability test).
  • Delayed/scheduled jobs and per-queue priorities exposed in the public API.
  • Unique/idempotent enqueue via SET NX dedup keys.
  • testcontainers-go integration tests for the blocking + scheduler paths.
  • Grafana dashboard + Redis Streams alternative (ADR).

🎒 Portfolio

Résumé bullets:

  • "Built a Redis-backed background job queue in Go (go-redis/v9) with a bounded worker pool, exponential-backoff-with-jitter retries, a max-attempts cap, and a dead-letter queue for poison messages."
  • "Designed the broker behind a small Queue port with an in-memory fake, enabling table-driven unit tests of the retry/DLQ/shutdown logic with no Redis dependency, run under -race."
  • "Instrumented processing throughput, failure/retry rates, and a processing-latency histogram with Prometheus; shipped a multi-stage Docker build and a docker-compose stack."

Interview talking points:

  • The Queue port + consumer-defined interface in the worker, and how it makes the Redis impl swappable and the logic testable.
  • Graceful shutdown using two contexts: a dequeue-loop context cancelled on SIGTERM, and a separate job context kept alive so in-flight handlers drain within a deadline.
  • At-least-once vs exactly-once, and why handlers must be idempotent.
  • Backoff + jitter to avoid thundering-herd retries; DLQ for poison messages.

Projects · Repo README