Skip to content

Table of Contents

Taskly — Multi-Tenant SaaS Backend (Capstone)

A production-grade, multi-tenant project & task management backend in Go. Hexagonal architecture, strict tenant isolation, JWT + RBAC, Redis cache & rate limiting, and the full observability triad — all wired for graceful, horizontally-scalable operation.

status Go architecture tests license

This is the flagship capstone of the Learn Go track. See the full SPEC for the complete brief.

[!IMPORTANT] This is its own Go module with heavy third-party dependencies. Before building you must run go mod tidy (to download deps and generate go.sum), and to run it you need Docker for PostgreSQL + Redis. The module is deliberately excluded from the repo-root Go CI matrix.


Overview

Taskly lets many independent organizations (tenants) share one deployment while keeping each tenant's data strictly isolated. A user signs up and creates an organization (becoming its owner); organizations contain projects, projects contain tasks, and people join organizations as members with a role (owner / admin / member). Every tenant-scoped query is filtered by org_id taken from the verified JWT — never from client input — so no endpoint can ever return another tenant's rows.

The project exists to exercise, end to end, what a senior Go backend engineer is expected to design and defend: clean ports-and-adapters architecture, multi-tenancy, JWT auth + RBAC, PostgreSQL (pgx) + migrations, Redis cache-aside + token-bucket rate limiting, structured logs + Prometheus metrics + OpenTelemetry traces, graceful shutdown, Docker / compose, and an ADR log.

Architecture

The application core (domain + use-cases) depends only on ports (interfaces). Inbound adapters drive the core; outbound adapters implement the ports. The composition root (cmd/api) is the only place that knows about concrete infrastructure.

flowchart TB
    subgraph Clients
        WEB[Web / SPA]
        CLI[curl / API client]
    end

    subgraph Edge["Middleware chain (internal/adapter/http)"]
        direction LR
        RID[request-id] --> REC[recover] --> MET[metrics] --> LOG[logging] --> RL[rate limit] --> AUTH[JWT auth] --> TEN[tenant guard] --> RBAC[RBAC]
    end

    subgraph Core["Application core (the hexagon)"]
        UC["Use-cases / services<br/>auth · tenant · project · task"]
        PORTS{{"Ports (interfaces)<br/>Repos · Cache · TokenIssuer<br/>RateLimiter · TxManager · EventPublisher"}}
        DOM["Domain entities + rules<br/>(pure Go, no I/O)"]
        UC --> DOM
        UC --> PORTS
    end

    subgraph Outbound["Outbound adapters"]
        PG[Postgres repos<br/>pgxpool]
        RC[Redis cache + limiter<br/>go-redis]
        JWT[JWT issuer + bcrypt]
        EV[Event publisher]
    end

    subgraph Infra
        DB[(PostgreSQL)]
        REDIS[(Redis)]
        JAEGER[(Jaeger)]
        PROM[(Prometheus)]
    end

    WEB & CLI --> Edge --> UC
    PORTS -. implemented by .-> PG & RC & JWT & EV
    PG --> DB
    RC --> REDIS
    Edge -. metrics .-> PROM
    Edge -. spans .-> JAEGER

A second view — the lifecycle of an authenticated, tenant-scoped GET .../tasks request, including the cache-aside path:

sequenceDiagram
    autonumber
    participant C as Client
    participant M as Middleware
    participant H as chi Handler
    participant U as TaskService
    participant K as Redis cache (port)
    participant R as Postgres repo (port)
    participant D as PostgreSQL

    C->>M: GET /v1/orgs/{org}/projects/{p}/tasks  (Bearer JWT)
    M->>M: request-id · metrics · rate limit
    M->>M: verify JWT → Actor{user, org_id, role}
    M->>M: tenant guard: path org == token org_id
    M->>H: ctx(Actor)
    H->>U: List(ctx, actor, filter, page)
    U->>U: RBAC: role.Can(task:view)?
    U->>K: Get(cacheKey(org,p))
    alt cache hit
        K-->>U: []Task
    else miss
        U->>R: List(ctx, org_id, filter, page)
        R->>D: SELECT ... WHERE org_id=$1 AND project_id=$2 ...
        D-->>R: rows
        R-->>U: Page[Task]
        U->>K: Set(key, value, ttl)
    end
    U-->>H: Page[Task]
    H-->>C: 200 {data, page}  (span ended, metrics recorded)

Key decisions are recorded as ADRs: hexagonal architecture · multi-tenancy · JWT + RBAC · observability & persistence.

Features

  • Multi-tenancy — shared DB + org_id row scoping; tenant id is taken from the verified JWT and asserted against the URL; repositories take tenantID explicitly so the WHERE org_id = $1 predicate can never be forgotten. RLS is available as defense-in-depth (0002_rls.up.sql).
  • Auth — signup/login/refresh/logout; short-lived access JWT (HS256) + opaque, rotating refresh tokens with reuse (theft) detection; bcrypt password hashing.
  • RBACowner/admin/member with a single-source-of-truth permission matrix, enforced at the route (fast 403) and in each use-case (authoritative).
  • CRUD — organizations & members, projects, and tasks with filtering and keyset (cursor) pagination.
  • Caching — Redis cache-aside for hot listings with write-through invalidation.
  • Rate limiting — Redis token-bucket (atomic Lua) shared across replicas, with an in-memory fallback; returns 429 + Retry-After.
  • Observabilityslog JSON logs, Prometheus RED metrics, OpenTelemetry traces (OTLP → Jaeger), all correlated by request_id.
  • Operability/healthz + /readyz, graceful shutdown (drain on SIGTERM via signal.NotifyContext + http.Server.Shutdown), timeouts on all I/O, 12-factor config.
  • Delivery — multi-stage distroless Dockerfile, full docker-compose stack, Makefile, and a CI pipeline.

Tech Stack

Go 1.22 · chi (router) · pgx/pgxpool (Postgres) · golang-migrate (migrations) · go-redis (cache + limiter) · golang-jwt · bcrypt · slog (logs) · prometheus/client_golang (metrics) · OpenTelemetry + otelhttp (traces) · Docker / docker-compose · testcontainers-go (integration).

Getting Started

Prerequisites

  • Go 1.22+
  • Docker + Docker Compose (for PostgreSQL, Redis, Jaeger, Prometheus)
  • golang-migrate CLI (only if running migrations outside compose)

1. Resolve dependencies (required)

cd projects/07-capstone-saas-backend
go mod tidy          # downloads deps and generates go.sum

2. Run the whole stack

cp .env.example .env   # then edit JWT_SIGNING_KEY
make up                # docker compose up --build (api + postgres + redis + prometheus + jaeger)

Migrations run automatically as a one-shot migrate service before the API starts.

3. Run the API directly (Postgres + Redis still needed)

docker compose up -d postgres redis
make migrate-up
make run               # honours .env / environment

4. Demo flow

# Sign up → returns access + refresh tokens and the new org id
curl -s localhost:8080/v1/auth/signup -H 'content-type: application/json' -d '{
  "email":"ada@example.com","password":"supersecret","display_name":"Ada","org_name":"Acme"
}'

TOKEN=...   # access_token from the response
ORG=...     # tenant_id from the response

# Create a project, then a task under it
curl -s localhost:8080/v1/orgs/$ORG/projects -H "authorization: Bearer $TOKEN" \
  -H 'content-type: application/json' -d '{"name":"Launch"}'

PROJECT=...
curl -s localhost:8080/v1/orgs/$ORG/projects/$PROJECT/tasks -H "authorization: Bearer $TOKEN" \
  -H 'content-type: application/json' -d '{"title":"Write the README","priority":2}'

Test

make test     # go test -race -cover ./...   (service + platform unit tests)
make cover    # HTML coverage report for ./internal/...

API

Base path /v1. The active org is the JWT org_id claim; org-scoped paths also carry {org_id}, which must equal the claim or the request is 403. Full contract in api/openapi.yaml.

Method & Path Description Min role
POST /v1/auth/signup Create user + first org (creator = owner) public
POST /v1/auth/login Verify password, issue access + refresh public
POST /v1/auth/refresh Rotate refresh token, new access token valid refresh
POST /v1/auth/logout Revoke active refresh token member
GET /v1/orgs List my organizations member
POST /v1/orgs Create a new organization member
GET /v1/orgs/{org_id} Get organization member
PATCH /v1/orgs/{org_id} Update organization admin
DELETE /v1/orgs/{org_id} Soft-delete organization owner
GET /v1/orgs/{org_id}/members List members member
PATCH /v1/orgs/{org_id}/members/{user_id} Change member role admin
DELETE /v1/orgs/{org_id}/members/{user_id} Remove member admin
GET /v1/orgs/{org_id}/projects List projects (paged/filtered) member
POST /v1/orgs/{org_id}/projects Create project member
GET/PATCH/DELETE …/projects/{project_id} Read / update / archive member / admin / admin
GET …/projects/{project_id}/tasks List tasks (filter status/assignee) member
POST …/projects/{project_id}/tasks Create task member
GET/PATCH/DELETE …/tasks/{task_id} Read / update / delete member (delete: own only)

Every non-2xx response shares one envelope:

{ "error": { "code": "forbidden", "message": "…", "request_id": "…", "details": [] } }

codeunauthenticated · forbidden · not_found · validation_failed · conflict · rate_limited · internal.

Project Layout

07-capstone-saas-backend/
├── cmd/api/                     # composition root: wiring + graceful shutdown
├── internal/
│   ├── domain/                  # entities, RBAC, errors, and PORT interfaces (no I/O)
│   ├── service/                 # use-cases (auth, tenant, project, task) + unit tests
│   ├── adapter/
│   │   ├── http/                # chi router, handlers, middleware (auth/RBAC/ratelimit/…)
│   │   ├── postgres/            # pgx repositories + TxManager + keyset cursor
│   │   ├── redis/               # cache-aside + token-bucket rate limiter
│   │   └── events/              # EventPublisher (logging; swap for a queue worker)
│   └── platform/                # connectors & setup (config, logging, metrics,
│                                #   tracing, pgxpool, redis client, jwt+bcrypt, ratelimit)
├── db/migrations/               # golang-migrate *.up.sql / *.down.sql (+ optional RLS)
├── api/openapi.yaml             # OpenAPI 3 contract
├── deployments/prometheus/      # Prometheus scrape config
├── docs/adr/                    # Architecture Decision Records (Nygard)
├── Dockerfile · docker-compose.yml · Makefile · .env.example · .golangci.yml
└── go.mod

Testing Strategy

  • Unit tests (core, fast, no infra): the entire service layer is tested table-driven against hand-written in-memory fakes of every port (internal/service/*_test.go). They cover signup/login, refresh rotation & reuse detection, the RBAC matrix, tenant isolation (tenant A cannot read/write tenant B), cache-aside hit/miss/invalidation, and task delete-ownership rules. The JWT issuer and the in-memory rate limiter also have pure unit tests.
  • Integration tests (require Docker): repositories, migrations, the cache-aside path, the Redis token-bucket limiter, and the refresh-token store are intended to be exercised against real Postgres + Redis via testcontainers-go (spun up per suite in TestMain, torn down after). These catch what mocks cannot: SQL correctness, the 23505ErrConflict mapping, index-backed pagination, and Lua atomicity. Docker is required to run them.
  • Everything under -race, with a coverage gate on internal/... in CI.
go test -race -cover ./...

Observability

  • Logsslog (JSON in prod). One structured line per request: method, route, status, duration, request_id, and user_id/tenant_id/role when authenticated.
  • Metrics/metrics (port 9090) exposes RED metrics (taskly_http_requests_total, taskly_http_request_duration_seconds, taskly_http_in_flight_requests) labelled by the chi route template (low cardinality), plus Go/process collectors.
  • Traces — every request is an otelhttp server span; spans export over OTLP/gRPC to Jaeger. Tracing is a no-op when OTEL_EXPORTER_OTLP_ENDPOINT is unset.
  • Correlation — the request_id ties a log line to its trace; given only a request_id you can pivot logs → trace → the slow span.

Security Notes

  • Tenant id is sourced from the verified token, never request input; the URL {org_id} is asserted to equal it. Optional RLS is a database-level backstop.
  • Rotating refresh tokens (hashed at rest) with reuse detection; pinned HS256, validated iss/aud/exp; bcrypt (cost ≥ 12); vague auth errors to avoid user enumeration.
  • Strict JSON decoding with DisallowUnknownFields and a 1 MiB body cap; parameterized SQL everywhere; panics recovered into 500s.
  • No secrets in code — 12-factor env config; JWT_SIGNING_KEY must be ≥ 32 bytes or the process refuses to start.

Lessons Learned

  • Ports earned their keep where a second implementation existed (Redis vs in-memory limiter; cache vs no-cache) — the service layer never changed. Where there was exactly one implementation, the interface was ceremony, so I kept ports minimal.
  • Passing the Actor explicitly (instead of pulling identity from context inside use-cases) made the service tests trivial and the tenant/RBAC rules obvious to read.
  • Cache-aside is easy to get subtly wrong. Caching only the hot, unfiltered first page kept invalidation exact; caching every filter/cursor permutation would have made correct invalidation far harder than it was worth.
  • TxManager via context kept atomic signup clean without leaking pgx.Tx into the use-case signatures.

Future Improvements

  • gRPC surface sharing the same use-cases (interceptors mirroring the HTTP chain).
  • Real async worker (cmd/worker) consuming a Redis/asynq queue for invitation emails, notifications, and webhooks (the EventPublisher seam already exists).
  • Invitations (tokenized accept flow) and richer member management.
  • Turn on RLS end-to-end (SET LOCAL app.current_org plumbing).
  • sqlc-generated queries; transactional outbox for reliable domain events.
  • k6 load script asserting p99 < 150 ms; Grafana dashboards.

🎒 Portfolio

Résumé bullets

  • Built a production-grade multi-tenant SaaS backend in Go with a hexagonal architecture and strict org_id tenant isolation (verified-JWT scoping + optional Postgres RLS), validated by dedicated tenant-isolation tests.
  • Implemented end-to-end auth & authorization: short-lived access JWTs + rotating refresh tokens with reuse detection, bcrypt hashing, and a single-source-of-truth RBAC matrix enforced at both the edge and the use-case layer.
  • Delivered the full observability triad (slog + Prometheus RED metrics + OpenTelemetry traces, correlated by request_id), Redis cache-aside + a shared token-bucket rate limiter, graceful shutdown, and a multi-stage distroless image with docker-compose and CI.

Interview talking points

  • Multi-tenancy — why shared-DB + org_id over schema/DB-per-tenant, how the predicate is made un-forgettable (explicit tenantID params), and where RLS fits (ADR-0002).
  • Hexagonal + DI — domain at the center, consumer-defined ports, adapters at the edge, one composition root; what made the service layer unit-testable in milliseconds (ADR-0001).
  • Auth — token design, rotation, reuse detection ("present a revoked refresh token → revoke the whole family"), and what a leaked signing key does and doesn't expose (ADR-0003).
  • Observability — when a log vs a metric vs a trace answers the question, and how request_id correlation makes an incident debuggable (ADR-0004).
  • Caching & rate limiting — cache-aside with precise write-through invalidation, and an atomic Lua token bucket correct across replicas.
  • Operations — graceful drain on SIGTERM behind a load balancer, readiness flipping during shutdown, and zero-downtime expand/contract migrations.

System-design stories this unlocks

  1. Design a multi-tenant SaaS — isolation strategies and the trade-offs chosen.
  2. Design an auth system — access/refresh, rotation, RBAC, hashing.
  3. Make this service observable — the triad and incident debugging.
  4. Add caching & rate limiting — invalidation and the token bucket.
  5. Decouple a slow side-effect — the EventPublisher seam → a worker queue.

Projects · Repo README · SPEC