Skip to content

Table of Contents

07 — Multi-Tenant SaaS Backend (Capstone)

Level: Production capstone  ·  Domain: Multi-tenant Project/Task management SaaS Mantra: Organizations → Projects → Tasks → Members, isolated, observable, and shippable.

Overview

This is the flagship project of the Learn Go track. You will build Taskly — a production-grade, multi-tenant Software-as-a-Service backend for project and task management. Multiple independent organizations (tenants) share one deployment, yet each tenant's data is strictly isolated. A single organization contains projects, projects contain tasks, and people join organizations as members with a role (owner, admin, member).

The point of the capstone is breadth and depth: it deliberately exercises essentially everything a senior Go backend engineer is expected to know and defend in an interview — clean hexagonal (ports & adapters) architecture, dual REST (chi) + gRPC surfaces, PostgreSQL with pgx/sqlc/golang-migrate, Redis for caching and rate limiting, JWT access/refresh auth with RBAC, multi-tenant isolation, the full observability triad (structured logs, Prometheus metrics, OpenTelemetry traces), graceful shutdown, Docker/compose, CI/CD via GitHub Actions, an async job queue for side-effects, testcontainers-based integration tests, and an ADR log.

By the end you should be able to walk an interviewer through this single repository and unlock four to six distinct system-design stories from it.

Learning Objectives

By completing this project you will be able to:

  • Design and defend a hexagonal / ports-and-adapters architecture in Go, with the domain at the center and all I/O at the edges via interfaces (ports).
  • Implement multi-tenancy with a shared database + org_id row scoping, propagate a tenant context through every layer, and reason about Row-Level Security (RLS) as an alternative.
  • Build secure JWT auth with short-lived access tokens and rotating refresh tokens, password hashing (bcrypt/argon2), and RBAC permission checks.
  • Expose the same use-cases over REST (chi) and gRPC, sharing one application core.
  • Use PostgreSQL idiomatically with pgxpool, type-safe queries via sqlc, and versioned, zero-downtime golang-migrate migrations.
  • Add Redis caching (cache-aside) and a Redis token-bucket rate limiter.
  • Wire the full observability triad: log/slog structured logs, prometheus/client_golang metrics, and OpenTelemetry traces exported to Jaeger/Tempo.
  • Apply 12-factor config, constructor-based dependency injection (optionally google/wire), and graceful shutdown with context/signal.NotifyContext.
  • Write a comprehensive test suite: unit tests against mocked ports, integration tests with testcontainers-go (Postgres + Redis), and explicit tenant-isolation tests.
  • Ship it: multi-stage Dockerfiles, docker-compose full stack, and a GitHub Actions pipeline (lint → test → build → push → deploy), documented with an ADR log.

Requirements

Functional

  • Authentication & accounts
  • POST /auth/signup — create a user and a first organization (the creator becomes owner).
  • POST /auth/login — verify password (bcrypt/argon2), issue an access JWT + refresh token.
  • POST /auth/refresh — rotate refresh token, issue a new access token; old refresh token is revoked.
  • POST /auth/logout — revoke the active refresh token (delete from store / Redis blacklist).
  • Organizations & membership
  • Create / read / update / delete (soft-delete) an organization.
  • List the organizations the current user belongs to.
  • Manage members: list members, change a member's role, remove a member.
  • Invitations
  • owner/admin invite an email to an org with a role; an async side-effect emails the invite.
  • Invitee accepts via a tokenized link, creating a membership.
  • RBAC-protected project/task CRUD
  • Projects: create, list (paginated/filtered), read, update, archive — all scoped to an org.
  • Tasks: create, list (filter by status/assignee/project), read, update, transition status, delete.
  • Every operation is gated by the caller's role permissions (see RBAC table).
  • Tenant-scoped data — every query is scoped by org_id; no endpoint can return another tenant's rows, ever.
  • Pagination & filtering — cursor (keyset) or limit/offset pagination, with status, assignee_id, project_id, and q (text) filters where relevant.
  • Async side-effect — an outbound job queue (reuse project 06's queue ideas, e.g. a Redis/asynq-style queue) processes invitation emails, task-assignment notifications, and outbound webhooks via a separate worker binary.

Non-Functional

  • Multi-tenant isolation — strict org_id scoping; a tenant-isolation test asserts tenant A cannot read or mutate tenant B's data. RLS available as a defense-in-depth option.
  • Performance — target p99 < 150 ms for read endpoints under nominal load; cache hot reads in Redis; index every tenant-scoped access path.
  • Horizontal scalability / statelessness — API and worker hold no in-process session state; all shared state lives in Postgres/Redis so instances scale out behind a load balancer.
  • Observability — every request carries a request_id and a trace; structured JSON logs, RED/USE Prometheus metrics, and OTel spans across REST → use-case → repo → DB.
  • Security — JWT access tokens (~15 min) + rotating refresh tokens (~30 days); passwords hashed with bcrypt (cost ≥ 12) or argon2id; per-IP and per-tenant rate limiting; secrets via env/secret manager; TLS terminated at the edge; input validation on every adapter.
  • Reliabilitygraceful shutdown (drain in-flight requests, close pools), health/readiness probes, zero-downtime migrations (expand/contract), and idempotent job handlers.
  • Quality> 70% coverage on the core (internal/*/domain and internal/*/app), -race clean, golangci-lint clean, coverage gate enforced in CI.

Architecture

Hexagonal: the application use-cases depend only on ports (interfaces). Inbound adapters (REST, gRPC) drive the use-cases; outbound adapters (Postgres, Redis, queue, email) implement the ports. A tenant context is resolved at the edge and propagated through context.Context to every layer.

flowchart TB
    subgraph Clients
        WEB[Web / SPA]
        MOB[Mobile]
        SVC[Internal Services]
    end

    subgraph Edge["API Gateway / Middleware chain"]
        direction TB
        RID[request-id] --> OTEL[otel trace start] --> RL[rate limit · Redis] --> AUTH[JWT auth] --> TEN[tenant resolution<br/>org_id → ctx] --> RBAC[RBAC check]
    end

    subgraph Inbound["Inbound adapters"]
        REST[REST adapter<br/>chi router]
        GRPC[gRPC adapter<br/>grpc-go]
    end

    subgraph Core["Application core (hexagon)"]
        UC["Use-cases / services<br/>(auth, org, project, task)"]
        PORTS{{"Ports (interfaces):<br/>Repo · Cache · QueuePublisher · Mailer · TokenIssuer"}}
        DOMAIN["Domain entities + rules<br/>(pure Go, no I/O)"]
        UC --> DOMAIN
        UC --> PORTS
    end

    subgraph Outbound["Outbound adapters"]
        PG[Postgres repo<br/>pgxpool + sqlc]
        RC[Redis cache<br/>go-redis]
        QP[Queue publisher]
        ML[Email/webhook sender]
    end

    subgraph Infra["Infrastructure"]
        PGDB[(PostgreSQL)]
        REDIS[(Redis)]
        JQ[[Job queue]]
        JAEGER[(Jaeger / Tempo)]
        PROM[(Prometheus)]
    end

    WEB & MOB & SVC --> Edge
    Edge --> REST & GRPC
    REST & GRPC --> UC
    PORTS -. implemented by .-> PG & RC & QP & ML
    PG --> PGDB
    RC --> REDIS
    QP --> JQ
    OTEL -. spans .-> JAEGER
    Edge -. metrics .-> PROM

    WORKER["cmd/worker"] --> JQ
    WORKER --> UC

A second view — the lifecycle of an authenticated, tenant-scoped GET .../tasks request:

sequenceDiagram
    autonumber
    participant C as Client
    participant M as Middleware chain
    participant H as chi Handler (inbound)
    participant U as TaskService (use-case)
    participant K as Redis cache (port)
    participant R as Postgres repo (port)
    participant D as PostgreSQL

    C->>M: GET /v1/orgs/{org}/projects/{p}/tasks  (Bearer access JWT)
    M->>M: request-id, start OTel span
    M->>M: rate limit (Redis token bucket)
    M->>M: verify JWT → claims{sub, org_id, role}
    M->>M: tenant resolve: assert path org == claims.org_id → ctx
    M->>M: RBAC: role may list tasks?
    M->>H: ctx(tenant, user, role, traceID)
    H->>U: ListTasks(ctx, filter)
    U->>K: Get(ctx, cacheKey(org,p,filter))
    alt cache hit
        K-->>U: []Task
    else miss
        U->>R: FindTasks(ctx, org_id, filter)
        R->>D: SELECT ... WHERE org_id=$1 AND project_id=$2 ...
        D-->>R: rows
        R-->>U: []Task
        U->>K: Set(ctx, key, value, ttl)
    end
    U-->>H: []Task
    H-->>C: 200 {data, page}  (span ended, metrics recorded)

Suggested Project Layout

Follows golang-standards/project-layout, with bounded contexts under internal/ each split into domain / app / adapters.

07-capstone-saas-backend/
├── cmd/
│   ├── api/                      # REST + gRPC server entrypoint (composition root / DI wiring)
│   │   └── main.go
│   ├── worker/                   # async job queue consumer
│   │   └── main.go
│   └── migrate/                  # golang-migrate runner / CLI wrapper
│       └── main.go
├── api/
│   ├── proto/                    # .proto definitions (source of truth for gRPC)
│   │   └── taskly/v1/
│   │       ├── task.proto
│   │       └── org.proto
│   └── openapi/
│       └── openapi.yaml          # OpenAPI 3 spec for the REST surface
├── internal/
│   ├── auth/                     # bounded context: authentication & tokens
│   │   ├── domain/               # User, Credential, RefreshToken, password rules
│   │   ├── app/                  # use-cases: Signup, Login, Refresh, Logout + ports
│   │   └── adapters/
│   │       ├── postgres/         # user & refresh_token repositories (sqlc)
│   │       ├── jwt/              # TokenIssuer/Verifier (golang-jwt)
│   │       └── http/             # auth REST handlers
│   ├── org/                      # bounded context: organizations & membership
│   │   ├── domain/               # Organization, Membership, Role, Invitation
│   │   ├── app/                  # use-cases + ports
│   │   └── adapters/{postgres,http,grpc}/
│   ├── project/                  # bounded context: projects
│   │   ├── domain/
│   │   ├── app/
│   │   └── adapters/{postgres,http,grpc}/
│   ├── task/                     # bounded context: tasks
│   │   ├── domain/
│   │   ├── app/
│   │   └── adapters/{postgres,http,grpc}/
│   ├── middleware/               # request-id, recover, auth, tenant, rbac, ratelimit, otel, logging
│   └── platform/                 # cross-cutting infrastructure (shared adapters)
│       ├── config/               # envconfig/viper loader, 12-factor
│       ├── logging/              # slog setup, context injectors
│       ├── postgres/             # pgxpool factory, txn helpers, health check
│       ├── redis/                # go-redis client factory, cache + token-bucket
│       ├── otel/                 # tracer + meter providers, exporters
│       ├── httpserver/           # chi server, graceful shutdown, health/ready
│       ├── grpcserver/           # grpc-go server, interceptors, graceful stop
│       └── queue/                # job queue publisher + consumer abstractions
├── db/
│   ├── migrations/               # golang-migrate *.up.sql / *.down.sql
│   └── queries/                  # sqlc .sql source queries
├── deployments/
│   ├── docker-compose.yml        # full stack: api, worker, postgres, redis, jaeger, prometheus, grafana
│   ├── Dockerfile.api            # multi-stage build for api
│   ├── Dockerfile.worker         # multi-stage build for worker
│   ├── prometheus/prometheus.yml
│   ├── grafana/                  # dashboards + provisioning
│   └── k8s/                      # (stretch) Deployment/Service/HPA/Ingress manifests
├── .github/
│   └── workflows/
│       ├── ci.yml                # lint → test (race+cover) → build
│       └── release.yml           # docker build & push → deploy
├── docs/
│   ├── adr/                      # Architecture Decision Records (0001-*.md ...)
│   ├── runbook.md
│   └── architecture.md
├── test/                         # integration & e2e suites, testcontainers helpers, load tests
├── sqlc.yaml                     # sqlc codegen config
├── buf.yaml                      # buf lint/breaking config
├── buf.gen.yaml                  # buf code generation config
├── Makefile                      # build, test, lint, migrate, sqlc, buf, compose targets
├── .golangci.yml
├── go.mod
└── README.md

Data Model / Database

Tenant isolation strategy

Chosen strategy: shared database, shared schema, org_id row scoping. Every tenant-owned table carries an org_id column. Every query carries a mandatory WHERE org_id = $1 predicate, and the org_id is taken from the verified JWT claim / tenant context, never from arbitrary client input. This keeps operations simple (one schema, one migration set, one connection pool) and scales to many small tenants — the right default for a SaaS like this.

Defense-in-depth: enforcement currently relies on the repository layer always scoping by org_id. A repository helper (e.g. scopedQuery(ctx)) reads org_id from context so a developer cannot "forget" the predicate. As a stretch / alternative, enable PostgreSQL Row-Level Security (RLS): a per-request SET LOCAL app.current_org = '<uuid>' plus CREATE POLICY clauses make the database itself reject cross-tenant access even if application code has a bug. (Pure isolation alternatives — schema-per-tenant or database-per-tenant — are noted in ADR-0002 and rejected for operational overhead at this scale.)

Schema (key migrations)

-- 0001_init.up.sql

CREATE EXTENSION IF NOT EXISTS "pgcrypto";   -- gen_random_uuid()
CREATE EXTENSION IF NOT EXISTS "citext";     -- case-insensitive email

-- Global identity (users are not tenant-scoped; they belong to orgs via memberships)
CREATE TABLE users (
    id             UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    email          CITEXT NOT NULL UNIQUE,
    password_hash  TEXT   NOT NULL,
    display_name   TEXT   NOT NULL DEFAULT '',
    created_at     TIMESTAMPTZ NOT NULL DEFAULT now(),
    updated_at     TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Tenants
CREATE TABLE organizations (
    id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name        TEXT NOT NULL,
    slug        CITEXT NOT NULL UNIQUE,
    created_at  TIMESTAMPTZ NOT NULL DEFAULT now(),
    updated_at  TIMESTAMPTZ NOT NULL DEFAULT now(),
    deleted_at  TIMESTAMPTZ
);

CREATE TYPE member_role AS ENUM ('owner', 'admin', 'member');

-- Membership join: which user has which role in which org
CREATE TABLE memberships (
    id         UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    org_id     UUID NOT NULL REFERENCES organizations(id) ON DELETE CASCADE,
    user_id    UUID NOT NULL REFERENCES users(id)         ON DELETE CASCADE,
    role       member_role NOT NULL DEFAULT 'member',
    created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
    UNIQUE (org_id, user_id)
);
CREATE INDEX idx_memberships_user ON memberships (user_id);
CREATE INDEX idx_memberships_org  ON memberships (org_id, role);

-- Invitations (drive the async email side-effect)
CREATE TABLE invitations (
    id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    org_id      UUID NOT NULL REFERENCES organizations(id) ON DELETE CASCADE,
    email       CITEXT NOT NULL,
    role        member_role NOT NULL DEFAULT 'member',
    token_hash  TEXT NOT NULL,
    invited_by  UUID NOT NULL REFERENCES users(id),
    expires_at  TIMESTAMPTZ NOT NULL,
    accepted_at TIMESTAMPTZ,
    created_at  TIMESTAMPTZ NOT NULL DEFAULT now(),
    UNIQUE (org_id, email)
);
CREATE INDEX idx_invitations_org ON invitations (org_id);

-- Tenant-scoped: projects
CREATE TABLE projects (
    id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    org_id      UUID NOT NULL REFERENCES organizations(id) ON DELETE CASCADE,
    name        TEXT NOT NULL,
    description TEXT NOT NULL DEFAULT '',
    status      TEXT NOT NULL DEFAULT 'active',  -- active | archived
    created_by  UUID NOT NULL REFERENCES users(id),
    created_at  TIMESTAMPTZ NOT NULL DEFAULT now(),
    updated_at  TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- Composite index: every list is scoped by (org_id, ...)
CREATE INDEX idx_projects_org_status ON projects (org_id, status, created_at DESC);

CREATE TYPE task_status AS ENUM ('todo', 'in_progress', 'done', 'cancelled');

-- Tenant-scoped: tasks (carry org_id directly for cheap scoping + RLS)
CREATE TABLE tasks (
    id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    org_id      UUID NOT NULL REFERENCES organizations(id) ON DELETE CASCADE,
    project_id  UUID NOT NULL REFERENCES projects(id)      ON DELETE CASCADE,
    title       TEXT NOT NULL,
    description TEXT NOT NULL DEFAULT '',
    status      task_status NOT NULL DEFAULT 'todo',
    priority    SMALLINT NOT NULL DEFAULT 3,
    assignee_id UUID REFERENCES users(id),
    due_at      TIMESTAMPTZ,
    created_by  UUID NOT NULL REFERENCES users(id),
    created_at  TIMESTAMPTZ NOT NULL DEFAULT now(),
    updated_at  TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- Composite indexes on the hot, tenant-scoped access paths
CREATE INDEX idx_tasks_org_project       ON tasks (org_id, project_id, status);
CREATE INDEX idx_tasks_org_assignee      ON tasks (org_id, assignee_id) WHERE assignee_id IS NOT NULL;
CREATE INDEX idx_tasks_org_created       ON tasks (org_id, created_at DESC);

-- Auth: rotating refresh tokens (store only the hash)
CREATE TABLE refresh_tokens (
    id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id     UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    token_hash  TEXT NOT NULL UNIQUE,
    expires_at  TIMESTAMPTZ NOT NULL,
    revoked_at  TIMESTAMPTZ,
    replaced_by UUID,                    -- points at the rotated successor token
    user_agent  TEXT,
    created_at  TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX idx_refresh_tokens_user ON refresh_tokens (user_id) WHERE revoked_at IS NULL;

Row-Level Security (alternative / stretch)

-- 0002_rls.up.sql  (optional defense-in-depth)
ALTER TABLE projects ENABLE ROW LEVEL SECURITY;
ALTER TABLE tasks    ENABLE ROW LEVEL SECURITY;

CREATE POLICY org_isolation_projects ON projects
    USING (org_id = current_setting('app.current_org')::uuid);
CREATE POLICY org_isolation_tasks ON tasks
    USING (org_id = current_setting('app.current_org')::uuid);

-- Middleware/repo issues, per transaction:  SET LOCAL app.current_org = '<org uuid>';

API Design

REST surface (chi)

Base path /v1. The active organization is taken from the JWT org_id claim; org-scoped resources also include {org_id} in the path for clarity and to allow org switching. The path org_id must equal the token's org_id claim or the request is rejected 403.

Method & Path Description Min role
POST /v1/auth/signup Create user + first org (creator = owner) public
POST /v1/auth/login Verify password, issue access+refresh public
POST /v1/auth/refresh Rotate refresh token, new access token public (valid refresh)
POST /v1/auth/logout Revoke active refresh token member
GET /v1/orgs List my organizations member
POST /v1/orgs Create a new organization member
GET /v1/orgs/{org_id} Get organization member
PATCH /v1/orgs/{org_id} Update organization admin
DELETE /v1/orgs/{org_id} Soft-delete organization owner
GET /v1/orgs/{org_id}/members List members member
PATCH /v1/orgs/{org_id}/members/{user_id} Change member role admin
DELETE /v1/orgs/{org_id}/members/{user_id} Remove member admin
POST /v1/orgs/{org_id}/invitations Invite by email (async send) admin
POST /v1/invitations/accept Accept invite via token public (valid token)
GET /v1/orgs/{org_id}/projects List projects (paged/filtered) member
POST /v1/orgs/{org_id}/projects Create project member
GET/PATCH/DELETE /v1/orgs/{org_id}/projects/{project_id} Read/update/archive member/member/admin
GET /v1/orgs/{org_id}/projects/{project_id}/tasks List tasks (filter status/assignee) member
POST /v1/orgs/{org_id}/projects/{project_id}/tasks Create task member
GET/PATCH/DELETE /v1/orgs/{org_id}/.../tasks/{task_id} Read/update/delete member

Error envelope

All non-2xx responses share one shape:

{
  "error": {
    "code": "forbidden",
    "message": "you do not have permission to modify members",
    "request_id": "01J9Z3K8Q7XM2C9V0YF4B7N1AE",
    "details": [{ "field": "role", "issue": "must be one of owner|admin|member" }]
  }
}

code is a stable machine string (unauthenticated, forbidden, not_found, validation_failed, rate_limited, conflict, internal). HTTP status mirrors it.

Pagination convention

Keyset (cursor) pagination by default:

GET /v1/orgs/{org}/projects/{p}/tasks?limit=50&cursor=<opaque>&status=in_progress&assignee_id=<uuid>

200 OK
{
  "data": [ /* ...items... */ ],
  "page": { "limit": 50, "next_cursor": "eyJpZCI6Li4ufQ==", "has_more": true }
}

Sample JWT claims (access token)

{
  "sub": "9b1c0e7a-2c4d-4f0a-9a11-7e0d6b5f2c10",
  "org_id": "1f4a2d88-0c2b-4d6e-9f3a-55b1c2e8a7d4",
  "role": "admin",
  "scope": "access",
  "iss": "taskly",
  "aud": "taskly-api",
  "iat": 1750896000,
  "exp": 1750896900,
  "jti": "01J9Z3K8Q7XM2C9V0YF4B7N1AE"
}

Refresh tokens are opaque, random, single-use, and stored hashed in refresh_tokens; on /auth/refresh the presented token is revoked and replaced_by is set (rotation + reuse detection).

gRPC (internal surface)

For service-to-service calls and the worker, expose the same use-cases over gRPC.

syntax = "proto3";
package taskly.v1;
option go_package = "github.com/you/taskly/api/proto/taskly/v1;tasklyv1";

import "google/protobuf/timestamp.proto";

service TaskService {
  rpc CreateTask (CreateTaskRequest) returns (Task);
  rpc GetTask    (GetTaskRequest)    returns (Task);
  rpc ListTasks  (ListTasksRequest)  returns (ListTasksResponse);
  rpc UpdateTask (UpdateTaskRequest) returns (Task);
  rpc DeleteTask (DeleteTaskRequest) returns (DeleteTaskResponse);
}

enum TaskStatus {
  TASK_STATUS_UNSPECIFIED = 0;
  TASK_STATUS_TODO        = 1;
  TASK_STATUS_IN_PROGRESS = 2;
  TASK_STATUS_DONE        = 3;
  TASK_STATUS_CANCELLED   = 4;
}

message Task {
  string id          = 1;
  string org_id      = 2;   // populated from the authenticated context, not trusted from input
  string project_id  = 3;
  string title       = 4;
  string description = 5;
  TaskStatus status  = 6;
  int32  priority    = 7;
  string assignee_id = 8;
  google.protobuf.Timestamp due_at     = 9;
  google.protobuf.Timestamp created_at = 10;
}

message CreateTaskRequest { string project_id = 1; string title = 2; string description = 3; int32 priority = 4; string assignee_id = 5; }
message GetTaskRequest    { string id = 1; }
message ListTasksRequest  { string project_id = 1; TaskStatus status = 2; string assignee_id = 3; int32 limit = 4; string cursor = 5; }
message ListTasksResponse { repeated Task tasks = 1; string next_cursor = 2; bool has_more = 3; }
message UpdateTaskRequest { string id = 1; string title = 2; string description = 3; TaskStatus status = 4; int32 priority = 5; string assignee_id = 6; }
message DeleteTaskRequest { string id = 1; }
message DeleteTaskResponse{ bool deleted = 1; }

A unary server interceptor mirrors the REST middleware chain: extract bearer token from metadata → verify → resolve tenant → RBAC → inject context, plus otelgrpc and metrics.

RBAC matrix (role → allowed operations)

Operation owner admin member
View org / members
Update org settings
Delete org
Invite member
Change member role ✅*
Remove member ✅*
Create / view project
Archive / delete project
Create / update / view task
Delete task own only

* admins cannot modify owners; only an owner can promote/demote owners or transfer ownership.

Tech Stack

Concern Library / tool (import path)
HTTP router github.com/go-chi/chi/v5
Postgres driver/pool github.com/jackc/pgx/v5, github.com/jackc/pgx/v5/pgxpool
Type-safe queries sqlc (github.com/sqlc-dev/sqlc)
Migrations github.com/golang-migrate/migrate/v4
Redis client github.com/redis/go-redis/v9
JWT github.com/golang-jwt/jwt/v5
Password hashing golang.org/x/crypto/bcrypt and/or golang.org/x/crypto/argon2
Structured logging log/slog (stdlib)
Metrics github.com/prometheus/client_golang/prometheus + .../promhttp
Tracing go.opentelemetry.io/otel, .../sdk/trace, .../exporters/otlp/otlptrace/otlptracegrpc
HTTP/gRPC instrumentation go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp, .../google.golang.org/grpc/otelgrpc
gRPC google.golang.org/grpc, google.golang.org/protobuf
Proto tooling buf (github.com/bufbuild/buf), protoc-gen-go, protoc-gen-go-grpc
Config (12-factor) github.com/kelseyhightower/envconfig (or github.com/spf13/viper)
Dependency injection constructor injection; optional github.com/google/wire
Job queue github.com/hibiken/asynq (Redis-backed) or project-06 queue
UUIDs github.com/google/uuid
Testing github.com/stretchr/testify, stdlib testing
Integration containers github.com/testcontainers/testcontainers-go (+ modules/postgres, modules/redis)
Linting golangci-lint (github.com/golangci/golangci-lint)
Load testing k6 (Grafana) or vegeta
Release / images optional goreleaser or ko

Implementation Milestones

Phase 0 — Foundation, config, logging

  • Initialize module, Makefile, .golangci.yml, repo layout, README skeleton.
  • internal/platform/configenvconfig/viper loader, validation, .env.example.
  • internal/platform/loggingslog JSON handler, level from config, context fields helper.
  • internal/platform/httpserver — chi server, /healthz + /readyz, graceful shutdown via signal.NotifyContext.
  • internal/platform/postgrespgxpool factory, ping/health, sqlc.yaml, first migration runner (cmd/migrate).

Phase 1 — Auth & JWT

  • User domain + password hashing (bcrypt/argon2id) with timing-safe verify.
  • golang-jwt token issuer/verifier; access (15m) + opaque refresh tokens (hashed at rest).
  • signup, login, refresh (with rotation + reuse detection), logout use-cases + handlers.
  • Auth middleware: bearer extraction → verify → inject user_id/claims into context.

Phase 2 — Multi-tenancy & RBAC

  • Organizations, memberships, roles domain + repositories.
  • Tenant-resolution middleware: derive org_id from claim, assert against path, inject into context.
  • scopedQuery repo helper enforcing WHERE org_id = $ctx.
  • RBAC middleware/policy: role → permission checks per route; tenant-isolation guard.

Phase 3 — Core domain CRUD

  • Projects use-cases + repo + REST handlers (list/create/get/update/archive), paginated.
  • Tasks use-cases + repo + REST handlers (CRUD + status transition), filtered + paginated.
  • Invitations: create (queues email job) + accept flow.

Phase 4 — Caching & rate limiting

  • internal/platform/redis client; cache-aside for hot reads (project/task lists) with TTL + invalidation on writes.
  • Redis token-bucket rate limiter middleware (per-IP and per-tenant) with 429 + Retry-After.

Phase 5 — gRPC

  • buf config + TaskService proto + generated stubs.
  • gRPC server with auth/tenant/RBAC + otelgrpc interceptors, sharing the same use-cases.

Phase 6 — Observability

  • Prometheus metrics: RED (rate/errors/duration) per route, DB pool, cache hit ratio, queue depth; /metrics.
  • OpenTelemetry tracer provider → OTLP → Jaeger/Tempo; spans across middleware → use-case → repo → DB.
  • Correlate request_id/trace_id into every log line.

Phase 7 — Async jobs

  • cmd/worker consuming the queue; idempotent handlers for invitation email, task-assignment notification, webhook.
  • Publisher port wired into use-cases; retries + dead-letter handling.

Phase 8 — Testing

  • Unit tests on domain/app against mocked ports (> 70% core coverage).
  • Integration tests with testcontainers-go (Postgres + Redis): repos, migrations, cache, rate limiter.
  • Tenant-isolation tests (A cannot read/write B), RBAC matrix tests, auth/token rotation tests.
  • -race in CI; k6/vegeta load script.

Phase 9 — CI/CD

  • GitHub Actions ci.yml: lint → go test -race -cover → build, with coverage gate.
  • release.yml: build multi-stage images → push to registry → deploy step.

Phase 10 — Docs & ADRs

  • OpenAPI spec, generated proto docs, README architecture overview, runbook, Postman/Bruno collection.
  • Write the ADR log (see Documentation Deliverables).

Phase 11 — Hardening

  • Security pass (headers, input validation, secret handling), zero-downtime migration drill, load test + p99 verification, graceful-shutdown drain test.

Testing Strategy

  • Unit tests (core): internal/*/domain and internal/*/app tested in isolation with hand-written or testify/mock fakes for every port (Repo, Cache, QueuePublisher, Mailer, TokenIssuer). Pure, fast, deterministic; gate > 70% coverage here.
  • Integration tests (testcontainers-go): spin up real Postgres and Redis containers per suite (shared via TestMain), run migrations, and exercise repositories, the cache-aside path, the rate limiter, and the refresh-token store against the real engines. Containers are torn down after the suite.
  • API / contract tests: boot the chi router (and gRPC server) against the test DB; assert status codes, the error envelope, pagination, and request validation end-to-end. Optionally validate responses against the OpenAPI schema.
  • RBAC tests: drive every cell of the role→operation matrix and assert allow/deny.
  • Tenant-isolation tests (critical): seed org A and org B; authenticate as a member of A and assert that every read and write against B's projects/tasks returns 403/404 and never leaks rows — both at the repo layer and the HTTP layer.
  • Auth / token tests: login, access-token expiry, refresh rotation, refresh-reuse detection (reusing a rotated token revokes the chain), logout revocation.
  • Concurrency: run the full suite under go test -race.
  • Load test: k6/vegeta script hitting hot read/write endpoints; assert p99 < 150 ms and capture throughput; verify rate limiter returns 429 under burst.
  • CI gates: lint clean, race clean, and coverage threshold on core packages enforced in ci.yml.

Deployment

  • Images: one multi-stage Dockerfile per binary (Dockerfile.api, Dockerfile.worker) — build stage on golang:1.x, final stage on gcr.io/distroless/static or alpine, non-root user, static binary, HEALTHCHECK.
# Dockerfile.api (sketch)
FROM golang:1.23 AS build
WORKDIR /src
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 go build -trimpath -ldflags="-s -w" -o /out/api ./cmd/api

FROM gcr.io/distroless/static:nonroot
COPY --from=build /out/api /api
USER nonroot:nonroot
EXPOSE 8080 9090
ENTRYPOINT ["/api"]
  • docker-compose (full stack): api, worker, postgres, redis, jaeger, prometheus, grafana. The compose file wires env, health checks, depends_on with condition: service_healthy, volumes for Postgres/Redis, and Prometheus scrape config pointing at the api /metrics. Jaeger receives OTLP traces; Grafana provisions a dashboard from Prometheus.
  • GitHub Actions pipeline: lint (golangci-lint)test (go test -race -coverprofile)builddocker build & push (on tag/main) → deploy. Coverage gate fails the job below threshold; proto and sqlc generation are verified up-to-date (git diff --exit-code).
  • Zero-downtime migrations: expand/contract — additive, backward-compatible migrations first (new nullable columns/tables/indexes via CREATE INDEX CONCURRENTLY), deploy code that writes both old+new, backfill, then a later contract migration drops the old. Migrations run as a separate cmd/migrate job/init container, never inside the request path.
  • Health & readiness: /healthz (process alive) and /readyz (DB + Redis reachable) back k8s liveness/readiness probes; readiness flips false during graceful drain.
  • 12-factor config: strictly env-driven (DATABASE_URL, REDIS_URL, JWT_SIGNING_KEY, OTEL_EXPORTER_OTLP_ENDPOINT, LOG_LEVEL, ...); no config in the image.
  • Secrets: injected via environment / secret manager (never committed); signing keys rotatable via kid in JWT header.
  • Kubernetes (stretch): deployments/k8s with Deployment + Service + HPA (CPU/RPS) + Ingress + a migration Job, plus PodDisruptionBudget for safe rollouts.

Documentation Deliverables

  • README — what/why, quickstart (make up), architecture overview with the mermaid diagrams, environment variables table, and the demo flow (signup → create project → create task).
  • OpenAPI spec (api/openapi/openapi.yaml) — complete REST contract; optionally serve a Swagger/Redoc UI.
  • Generated proto docsbuf + protoc-gen-doc HTML/Markdown for the gRPC services.
  • ADR log (docs/adr/) — example records:
  • 0001 — Adopt hexagonal (ports & adapters) architecture.
  • 0002 — Multi-tenancy strategy: shared DB + org_id scoping (vs schema/db-per-tenant; RLS option).
  • 0003 — REST (chi) + gRPC split: public REST, internal gRPC.
  • 0004 — Auth & token strategy: short-lived access JWT + rotating refresh tokens with reuse detection.
  • 0005 — Caching strategy: Redis cache-aside with TTL + write-through invalidation.
  • 0006 — Rate limiting: Redis token bucket, per-IP and per-tenant.
  • 0007 — Observability stack: slog + Prometheus + OpenTelemetry/Jaeger.
  • 0008 — Persistence & migrations: pgx + sqlc + golang-migrate, expand/contract.
  • Runbook (docs/runbook.md) — deploy/rollback, run migrations, common alerts and remedies, scaling, secret rotation, dashboards and trace lookup by request_id.
  • Postman / Bruno collection — the full request set with auth flow and env variables, ready to import.

Stretch Goals / Future Improvements

  • Billing — Stripe subscriptions, plan-based seat limits, metered usage, webhooks.
  • Feature flags — per-tenant flags (e.g. Unleash/OpenFeature) to ship gradually.
  • Audit log — append-only record of who-did-what-when, per tenant.
  • Realtime — WebSockets/SSE for live task updates and presence.
  • Event-driventransactional outbox + Kafka/NATS for reliable domain events.
  • Per-tenant quotas — rate limits and storage/seat quotas enforced per plan.
  • RLS enforcement — turn on PostgreSQL Row-Level Security for hard DB-level isolation.
  • Blue-green / canary deploys with automated rollback on SLO breach.
  • SLOs & alerting — error-budget burn alerts on the p99/error-rate SLOs in Prometheus/Grafana.

Lessons-Learned Prompts

  1. Architecture: Where did the hexagonal boundary pay off, and where did it feel like ceremony? Which port was hardest to keep free of leaking infrastructure details?
  2. Multi-tenancy: How confident are you that no query can cross tenants? What would change if one tenant grew 100× larger than the rest, and when would you move to RLS or db-per-tenant?
  3. Security: Walk through the refresh-token rotation and reuse-detection flow. What attacks does it stop, and what is still exposed if the signing key leaks?
  4. Observability: Given a user-reported slow request and only its request_id, trace it end to end. Which signal (log, metric, or trace) answered which question?
  5. Performance: Which indexes and which cache entries actually moved p99? How did you measure, and what was the cache hit ratio under load?
  6. Testing: What did testcontainers catch that mocked unit tests could not? Which test gives you the most confidence before a deploy?
  7. Operations: Describe a zero-downtime schema change you made with expand/contract. What would you do differently for a column that needs a type change?
  8. Trade-offs: If you had one more week, what is the single highest-leverage improvement and why?

Portfolio & Resume

Resume Bullets

  • Built a production-grade multi-tenant SaaS backend in Go (hexagonal architecture, REST + gRPC) serving strictly isolated tenant data, sustaining p99 < 150 ms with a Redis cache-aside layer and validated under k6 load tests.
  • Implemented end-to-end security and reliability: JWT access + rotating refresh tokens with reuse detection, RBAC (owner/admin/member), Redis token-bucket rate limiting, graceful shutdown, and zero-downtime expand/contract migrations (pgx + sqlc + golang-migrate).
  • Delivered full observability and CI/CD: structured slog logs, Prometheus RED metrics, and OpenTelemetry traces to Jaeger, with a GitHub Actions pipeline (lint → race tests → build → image push) and testcontainers integration tests holding > 70% core coverage.

Interview Talking Points

  • Multi-tenancy isolation — shared DB + org_id scoping driven from verified JWT claims, a scopedQuery guardrail, tenant-isolation tests, and RLS as defense-in-depth.
  • Hexagonal architecture + DI — domain at the center, ports as interfaces, adapters at the edge, constructor injection at a single composition root (optionally google/wire).
  • JWT / RBAC — token design, rotation, reuse detection, and a role→operation permission matrix.
  • Observability triad — when logs vs metrics vs traces each answer the question, and how request_id/trace_id correlation makes incidents debuggable.
  • Integration testing with testcontainers — real Postgres + Redis in tests, what it catches over mocks, and how it stays fast.
  • CI/CD & graceful shutdown — the pipeline gates, image strategy, and how the server drains in-flight work on SIGTERM behind a load balancer.

System Design Stories

This single repository unlocks distinct interview stories:

  1. "Design a multi-tenant SaaS" — isolation strategies and the trade-offs you chose.
  2. "Design an auth system" — access/refresh tokens, rotation, RBAC, password hashing.
  3. "Make this service observable" — the logs/metrics/traces triad and incident debugging.
  4. "Add caching and rate limiting" — Redis cache-aside invalidation and the token-bucket limiter.
  5. "Evolve the schema with zero downtime" — expand/contract migrations on a live tenant DB.
  6. "Decouple a slow side-effect" — moving email/webhooks to an idempotent async worker queue.