Table of Contents
- 07 — Multi-Tenant SaaS Backend (Capstone)
- Overview
- Learning Objectives
- Requirements
- Architecture
- Suggested Project Layout
- Data Model / Database
- API Design
- Tech Stack
- Implementation Milestones
- Testing Strategy
- Deployment
- Documentation Deliverables
- Stretch Goals / Future Improvements
- Lessons-Learned Prompts
- Portfolio & Resume
07 — Multi-Tenant SaaS Backend (Capstone)¶
Level: Production capstone · Domain: Multi-tenant Project/Task management SaaS Mantra: Organizations → Projects → Tasks → Members, isolated, observable, and shippable.
Overview¶
This is the flagship project of the Learn Go track. You will build Taskly — a
production-grade, multi-tenant Software-as-a-Service backend for project and task
management. Multiple independent organizations (tenants) share one deployment, yet each
tenant's data is strictly isolated. A single organization contains projects, projects
contain tasks, and people join organizations as members with a role (owner, admin,
member).
The point of the capstone is breadth and depth: it deliberately exercises essentially
everything a senior Go backend engineer is expected to know and defend in an interview —
clean hexagonal (ports & adapters) architecture, dual REST (chi) + gRPC surfaces,
PostgreSQL with pgx/sqlc/golang-migrate, Redis for caching and rate limiting,
JWT access/refresh auth with RBAC, multi-tenant isolation, the full
observability triad (structured logs, Prometheus metrics, OpenTelemetry traces),
graceful shutdown, Docker/compose, CI/CD via GitHub Actions, an async job
queue for side-effects, testcontainers-based integration tests, and an ADR log.
By the end you should be able to walk an interviewer through this single repository and unlock four to six distinct system-design stories from it.
Learning Objectives¶
By completing this project you will be able to:
- Design and defend a hexagonal / ports-and-adapters architecture in Go, with the domain at the center and all I/O at the edges via interfaces (ports).
- Implement multi-tenancy with a shared database +
org_idrow scoping, propagate a tenant context through every layer, and reason about Row-Level Security (RLS) as an alternative. - Build secure JWT auth with short-lived access tokens and rotating refresh tokens, password hashing (bcrypt/argon2), and RBAC permission checks.
- Expose the same use-cases over REST (chi) and gRPC, sharing one application core.
- Use PostgreSQL idiomatically with
pgxpool, type-safe queries via sqlc, and versioned, zero-downtime golang-migrate migrations. - Add Redis caching (cache-aside) and a Redis token-bucket rate limiter.
- Wire the full observability triad:
log/slogstructured logs,prometheus/client_golangmetrics, and OpenTelemetry traces exported to Jaeger/Tempo. - Apply 12-factor config, constructor-based dependency injection (optionally
google/wire), and graceful shutdown withcontext/signal.NotifyContext. - Write a comprehensive test suite: unit tests against mocked ports, integration tests with testcontainers-go (Postgres + Redis), and explicit tenant-isolation tests.
- Ship it: multi-stage Dockerfiles, docker-compose full stack, and a GitHub Actions pipeline (lint → test → build → push → deploy), documented with an ADR log.
Requirements¶
Functional¶
- Authentication & accounts
POST /auth/signup— create a user and a first organization (the creator becomesowner).POST /auth/login— verify password (bcrypt/argon2), issue an access JWT + refresh token.POST /auth/refresh— rotate refresh token, issue a new access token; old refresh token is revoked.POST /auth/logout— revoke the active refresh token (delete from store / Redis blacklist).- Organizations & membership
- Create / read / update / delete (soft-delete) an organization.
- List the organizations the current user belongs to.
- Manage members: list members, change a member's role, remove a member.
- Invitations
owner/admininvite an email to an org with a role; an async side-effect emails the invite.- Invitee accepts via a tokenized link, creating a membership.
- RBAC-protected project/task CRUD
- Projects: create, list (paginated/filtered), read, update, archive — all scoped to an org.
- Tasks: create, list (filter by status/assignee/project), read, update, transition status, delete.
- Every operation is gated by the caller's role permissions (see RBAC table).
- Tenant-scoped data — every query is scoped by
org_id; no endpoint can return another tenant's rows, ever. - Pagination & filtering — cursor (keyset) or limit/offset pagination, with
status,assignee_id,project_id, andq(text) filters where relevant. - Async side-effect — an outbound job queue (reuse project 06's queue ideas, e.g. a
Redis/
asynq-style queue) processes invitation emails, task-assignment notifications, and outbound webhooks via a separate worker binary.
Non-Functional¶
- Multi-tenant isolation — strict
org_idscoping; a tenant-isolation test asserts tenant A cannot read or mutate tenant B's data. RLS available as a defense-in-depth option. - Performance — target p99 < 150 ms for read endpoints under nominal load; cache hot reads in Redis; index every tenant-scoped access path.
- Horizontal scalability / statelessness — API and worker hold no in-process session state; all shared state lives in Postgres/Redis so instances scale out behind a load balancer.
- Observability — every request carries a
request_idand a trace; structured JSON logs, RED/USE Prometheus metrics, and OTel spans across REST → use-case → repo → DB. - Security — JWT access tokens (~15 min) + rotating refresh tokens (~30 days); passwords hashed with bcrypt (cost ≥ 12) or argon2id; per-IP and per-tenant rate limiting; secrets via env/secret manager; TLS terminated at the edge; input validation on every adapter.
- Reliability — graceful shutdown (drain in-flight requests, close pools), health/readiness probes, zero-downtime migrations (expand/contract), and idempotent job handlers.
- Quality — > 70% coverage on the core (
internal/*/domainandinternal/*/app),-raceclean,golangci-lintclean, coverage gate enforced in CI.
Architecture¶
Hexagonal: the application use-cases depend only on ports (interfaces). Inbound
adapters (REST, gRPC) drive the use-cases; outbound adapters (Postgres, Redis, queue, email)
implement the ports. A tenant context is resolved at the edge and propagated through
context.Context to every layer.
flowchart TB
subgraph Clients
WEB[Web / SPA]
MOB[Mobile]
SVC[Internal Services]
end
subgraph Edge["API Gateway / Middleware chain"]
direction TB
RID[request-id] --> OTEL[otel trace start] --> RL[rate limit · Redis] --> AUTH[JWT auth] --> TEN[tenant resolution<br/>org_id → ctx] --> RBAC[RBAC check]
end
subgraph Inbound["Inbound adapters"]
REST[REST adapter<br/>chi router]
GRPC[gRPC adapter<br/>grpc-go]
end
subgraph Core["Application core (hexagon)"]
UC["Use-cases / services<br/>(auth, org, project, task)"]
PORTS{{"Ports (interfaces):<br/>Repo · Cache · QueuePublisher · Mailer · TokenIssuer"}}
DOMAIN["Domain entities + rules<br/>(pure Go, no I/O)"]
UC --> DOMAIN
UC --> PORTS
end
subgraph Outbound["Outbound adapters"]
PG[Postgres repo<br/>pgxpool + sqlc]
RC[Redis cache<br/>go-redis]
QP[Queue publisher]
ML[Email/webhook sender]
end
subgraph Infra["Infrastructure"]
PGDB[(PostgreSQL)]
REDIS[(Redis)]
JQ[[Job queue]]
JAEGER[(Jaeger / Tempo)]
PROM[(Prometheus)]
end
WEB & MOB & SVC --> Edge
Edge --> REST & GRPC
REST & GRPC --> UC
PORTS -. implemented by .-> PG & RC & QP & ML
PG --> PGDB
RC --> REDIS
QP --> JQ
OTEL -. spans .-> JAEGER
Edge -. metrics .-> PROM
WORKER["cmd/worker"] --> JQ
WORKER --> UC
A second view — the lifecycle of an authenticated, tenant-scoped GET .../tasks request:
sequenceDiagram
autonumber
participant C as Client
participant M as Middleware chain
participant H as chi Handler (inbound)
participant U as TaskService (use-case)
participant K as Redis cache (port)
participant R as Postgres repo (port)
participant D as PostgreSQL
C->>M: GET /v1/orgs/{org}/projects/{p}/tasks (Bearer access JWT)
M->>M: request-id, start OTel span
M->>M: rate limit (Redis token bucket)
M->>M: verify JWT → claims{sub, org_id, role}
M->>M: tenant resolve: assert path org == claims.org_id → ctx
M->>M: RBAC: role may list tasks?
M->>H: ctx(tenant, user, role, traceID)
H->>U: ListTasks(ctx, filter)
U->>K: Get(ctx, cacheKey(org,p,filter))
alt cache hit
K-->>U: []Task
else miss
U->>R: FindTasks(ctx, org_id, filter)
R->>D: SELECT ... WHERE org_id=$1 AND project_id=$2 ...
D-->>R: rows
R-->>U: []Task
U->>K: Set(ctx, key, value, ttl)
end
U-->>H: []Task
H-->>C: 200 {data, page} (span ended, metrics recorded)
Suggested Project Layout¶
Follows golang-standards/project-layout,
with bounded contexts under internal/ each split into domain / app / adapters.
07-capstone-saas-backend/
├── cmd/
│ ├── api/ # REST + gRPC server entrypoint (composition root / DI wiring)
│ │ └── main.go
│ ├── worker/ # async job queue consumer
│ │ └── main.go
│ └── migrate/ # golang-migrate runner / CLI wrapper
│ └── main.go
├── api/
│ ├── proto/ # .proto definitions (source of truth for gRPC)
│ │ └── taskly/v1/
│ │ ├── task.proto
│ │ └── org.proto
│ └── openapi/
│ └── openapi.yaml # OpenAPI 3 spec for the REST surface
├── internal/
│ ├── auth/ # bounded context: authentication & tokens
│ │ ├── domain/ # User, Credential, RefreshToken, password rules
│ │ ├── app/ # use-cases: Signup, Login, Refresh, Logout + ports
│ │ └── adapters/
│ │ ├── postgres/ # user & refresh_token repositories (sqlc)
│ │ ├── jwt/ # TokenIssuer/Verifier (golang-jwt)
│ │ └── http/ # auth REST handlers
│ ├── org/ # bounded context: organizations & membership
│ │ ├── domain/ # Organization, Membership, Role, Invitation
│ │ ├── app/ # use-cases + ports
│ │ └── adapters/{postgres,http,grpc}/
│ ├── project/ # bounded context: projects
│ │ ├── domain/
│ │ ├── app/
│ │ └── adapters/{postgres,http,grpc}/
│ ├── task/ # bounded context: tasks
│ │ ├── domain/
│ │ ├── app/
│ │ └── adapters/{postgres,http,grpc}/
│ ├── middleware/ # request-id, recover, auth, tenant, rbac, ratelimit, otel, logging
│ └── platform/ # cross-cutting infrastructure (shared adapters)
│ ├── config/ # envconfig/viper loader, 12-factor
│ ├── logging/ # slog setup, context injectors
│ ├── postgres/ # pgxpool factory, txn helpers, health check
│ ├── redis/ # go-redis client factory, cache + token-bucket
│ ├── otel/ # tracer + meter providers, exporters
│ ├── httpserver/ # chi server, graceful shutdown, health/ready
│ ├── grpcserver/ # grpc-go server, interceptors, graceful stop
│ └── queue/ # job queue publisher + consumer abstractions
├── db/
│ ├── migrations/ # golang-migrate *.up.sql / *.down.sql
│ └── queries/ # sqlc .sql source queries
├── deployments/
│ ├── docker-compose.yml # full stack: api, worker, postgres, redis, jaeger, prometheus, grafana
│ ├── Dockerfile.api # multi-stage build for api
│ ├── Dockerfile.worker # multi-stage build for worker
│ ├── prometheus/prometheus.yml
│ ├── grafana/ # dashboards + provisioning
│ └── k8s/ # (stretch) Deployment/Service/HPA/Ingress manifests
├── .github/
│ └── workflows/
│ ├── ci.yml # lint → test (race+cover) → build
│ └── release.yml # docker build & push → deploy
├── docs/
│ ├── adr/ # Architecture Decision Records (0001-*.md ...)
│ ├── runbook.md
│ └── architecture.md
├── test/ # integration & e2e suites, testcontainers helpers, load tests
├── sqlc.yaml # sqlc codegen config
├── buf.yaml # buf lint/breaking config
├── buf.gen.yaml # buf code generation config
├── Makefile # build, test, lint, migrate, sqlc, buf, compose targets
├── .golangci.yml
├── go.mod
└── README.md
Data Model / Database¶
Tenant isolation strategy¶
Chosen strategy: shared database, shared schema, org_id row scoping. Every tenant-owned
table carries an org_id column. Every query carries a mandatory WHERE org_id = $1 predicate,
and the org_id is taken from the verified JWT claim / tenant context, never from arbitrary
client input. This keeps operations simple (one schema, one migration set, one connection pool)
and scales to many small tenants — the right default for a SaaS like this.
Defense-in-depth: enforcement currently relies on the repository layer always scoping by org_id.
A repository helper (e.g. scopedQuery(ctx)) reads org_id from context so a developer cannot
"forget" the predicate. As a stretch / alternative, enable PostgreSQL Row-Level Security
(RLS): a per-request SET LOCAL app.current_org = '<uuid>' plus CREATE POLICY clauses make the
database itself reject cross-tenant access even if application code has a bug. (Pure isolation
alternatives — schema-per-tenant or database-per-tenant — are noted in ADR-0002 and rejected for
operational overhead at this scale.)
Schema (key migrations)¶
-- 0001_init.up.sql
CREATE EXTENSION IF NOT EXISTS "pgcrypto"; -- gen_random_uuid()
CREATE EXTENSION IF NOT EXISTS "citext"; -- case-insensitive email
-- Global identity (users are not tenant-scoped; they belong to orgs via memberships)
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
email CITEXT NOT NULL UNIQUE,
password_hash TEXT NOT NULL,
display_name TEXT NOT NULL DEFAULT '',
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- Tenants
CREATE TABLE organizations (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name TEXT NOT NULL,
slug CITEXT NOT NULL UNIQUE,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
deleted_at TIMESTAMPTZ
);
CREATE TYPE member_role AS ENUM ('owner', 'admin', 'member');
-- Membership join: which user has which role in which org
CREATE TABLE memberships (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
org_id UUID NOT NULL REFERENCES organizations(id) ON DELETE CASCADE,
user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
role member_role NOT NULL DEFAULT 'member',
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
UNIQUE (org_id, user_id)
);
CREATE INDEX idx_memberships_user ON memberships (user_id);
CREATE INDEX idx_memberships_org ON memberships (org_id, role);
-- Invitations (drive the async email side-effect)
CREATE TABLE invitations (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
org_id UUID NOT NULL REFERENCES organizations(id) ON DELETE CASCADE,
email CITEXT NOT NULL,
role member_role NOT NULL DEFAULT 'member',
token_hash TEXT NOT NULL,
invited_by UUID NOT NULL REFERENCES users(id),
expires_at TIMESTAMPTZ NOT NULL,
accepted_at TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
UNIQUE (org_id, email)
);
CREATE INDEX idx_invitations_org ON invitations (org_id);
-- Tenant-scoped: projects
CREATE TABLE projects (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
org_id UUID NOT NULL REFERENCES organizations(id) ON DELETE CASCADE,
name TEXT NOT NULL,
description TEXT NOT NULL DEFAULT '',
status TEXT NOT NULL DEFAULT 'active', -- active | archived
created_by UUID NOT NULL REFERENCES users(id),
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- Composite index: every list is scoped by (org_id, ...)
CREATE INDEX idx_projects_org_status ON projects (org_id, status, created_at DESC);
CREATE TYPE task_status AS ENUM ('todo', 'in_progress', 'done', 'cancelled');
-- Tenant-scoped: tasks (carry org_id directly for cheap scoping + RLS)
CREATE TABLE tasks (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
org_id UUID NOT NULL REFERENCES organizations(id) ON DELETE CASCADE,
project_id UUID NOT NULL REFERENCES projects(id) ON DELETE CASCADE,
title TEXT NOT NULL,
description TEXT NOT NULL DEFAULT '',
status task_status NOT NULL DEFAULT 'todo',
priority SMALLINT NOT NULL DEFAULT 3,
assignee_id UUID REFERENCES users(id),
due_at TIMESTAMPTZ,
created_by UUID NOT NULL REFERENCES users(id),
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- Composite indexes on the hot, tenant-scoped access paths
CREATE INDEX idx_tasks_org_project ON tasks (org_id, project_id, status);
CREATE INDEX idx_tasks_org_assignee ON tasks (org_id, assignee_id) WHERE assignee_id IS NOT NULL;
CREATE INDEX idx_tasks_org_created ON tasks (org_id, created_at DESC);
-- Auth: rotating refresh tokens (store only the hash)
CREATE TABLE refresh_tokens (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
token_hash TEXT NOT NULL UNIQUE,
expires_at TIMESTAMPTZ NOT NULL,
revoked_at TIMESTAMPTZ,
replaced_by UUID, -- points at the rotated successor token
user_agent TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX idx_refresh_tokens_user ON refresh_tokens (user_id) WHERE revoked_at IS NULL;
Row-Level Security (alternative / stretch)¶
-- 0002_rls.up.sql (optional defense-in-depth)
ALTER TABLE projects ENABLE ROW LEVEL SECURITY;
ALTER TABLE tasks ENABLE ROW LEVEL SECURITY;
CREATE POLICY org_isolation_projects ON projects
USING (org_id = current_setting('app.current_org')::uuid);
CREATE POLICY org_isolation_tasks ON tasks
USING (org_id = current_setting('app.current_org')::uuid);
-- Middleware/repo issues, per transaction: SET LOCAL app.current_org = '<org uuid>';
API Design¶
REST surface (chi)¶
Base path /v1. The active organization is taken from the JWT org_id claim; org-scoped
resources also include {org_id} in the path for clarity and to allow org switching. The path
org_id must equal the token's org_id claim or the request is rejected 403.
| Method & Path | Description | Min role |
|---|---|---|
POST /v1/auth/signup |
Create user + first org (creator = owner) | public |
POST /v1/auth/login |
Verify password, issue access+refresh | public |
POST /v1/auth/refresh |
Rotate refresh token, new access token | public (valid refresh) |
POST /v1/auth/logout |
Revoke active refresh token | member |
GET /v1/orgs |
List my organizations | member |
POST /v1/orgs |
Create a new organization | member |
GET /v1/orgs/{org_id} |
Get organization | member |
PATCH /v1/orgs/{org_id} |
Update organization | admin |
DELETE /v1/orgs/{org_id} |
Soft-delete organization | owner |
GET /v1/orgs/{org_id}/members |
List members | member |
PATCH /v1/orgs/{org_id}/members/{user_id} |
Change member role | admin |
DELETE /v1/orgs/{org_id}/members/{user_id} |
Remove member | admin |
POST /v1/orgs/{org_id}/invitations |
Invite by email (async send) | admin |
POST /v1/invitations/accept |
Accept invite via token | public (valid token) |
GET /v1/orgs/{org_id}/projects |
List projects (paged/filtered) | member |
POST /v1/orgs/{org_id}/projects |
Create project | member |
GET/PATCH/DELETE /v1/orgs/{org_id}/projects/{project_id} |
Read/update/archive | member/member/admin |
GET /v1/orgs/{org_id}/projects/{project_id}/tasks |
List tasks (filter status/assignee) | member |
POST /v1/orgs/{org_id}/projects/{project_id}/tasks |
Create task | member |
GET/PATCH/DELETE /v1/orgs/{org_id}/.../tasks/{task_id} |
Read/update/delete | member |
Error envelope¶
All non-2xx responses share one shape:
{
"error": {
"code": "forbidden",
"message": "you do not have permission to modify members",
"request_id": "01J9Z3K8Q7XM2C9V0YF4B7N1AE",
"details": [{ "field": "role", "issue": "must be one of owner|admin|member" }]
}
}
code is a stable machine string (unauthenticated, forbidden, not_found,
validation_failed, rate_limited, conflict, internal). HTTP status mirrors it.
Pagination convention¶
Keyset (cursor) pagination by default:
GET /v1/orgs/{org}/projects/{p}/tasks?limit=50&cursor=<opaque>&status=in_progress&assignee_id=<uuid>
200 OK
{
"data": [ /* ...items... */ ],
"page": { "limit": 50, "next_cursor": "eyJpZCI6Li4ufQ==", "has_more": true }
}
Sample JWT claims (access token)¶
{
"sub": "9b1c0e7a-2c4d-4f0a-9a11-7e0d6b5f2c10",
"org_id": "1f4a2d88-0c2b-4d6e-9f3a-55b1c2e8a7d4",
"role": "admin",
"scope": "access",
"iss": "taskly",
"aud": "taskly-api",
"iat": 1750896000,
"exp": 1750896900,
"jti": "01J9Z3K8Q7XM2C9V0YF4B7N1AE"
}
Refresh tokens are opaque, random, single-use, and stored hashed in refresh_tokens; on
/auth/refresh the presented token is revoked and replaced_by is set (rotation + reuse
detection).
gRPC (internal surface)¶
For service-to-service calls and the worker, expose the same use-cases over gRPC.
syntax = "proto3";
package taskly.v1;
option go_package = "github.com/you/taskly/api/proto/taskly/v1;tasklyv1";
import "google/protobuf/timestamp.proto";
service TaskService {
rpc CreateTask (CreateTaskRequest) returns (Task);
rpc GetTask (GetTaskRequest) returns (Task);
rpc ListTasks (ListTasksRequest) returns (ListTasksResponse);
rpc UpdateTask (UpdateTaskRequest) returns (Task);
rpc DeleteTask (DeleteTaskRequest) returns (DeleteTaskResponse);
}
enum TaskStatus {
TASK_STATUS_UNSPECIFIED = 0;
TASK_STATUS_TODO = 1;
TASK_STATUS_IN_PROGRESS = 2;
TASK_STATUS_DONE = 3;
TASK_STATUS_CANCELLED = 4;
}
message Task {
string id = 1;
string org_id = 2; // populated from the authenticated context, not trusted from input
string project_id = 3;
string title = 4;
string description = 5;
TaskStatus status = 6;
int32 priority = 7;
string assignee_id = 8;
google.protobuf.Timestamp due_at = 9;
google.protobuf.Timestamp created_at = 10;
}
message CreateTaskRequest { string project_id = 1; string title = 2; string description = 3; int32 priority = 4; string assignee_id = 5; }
message GetTaskRequest { string id = 1; }
message ListTasksRequest { string project_id = 1; TaskStatus status = 2; string assignee_id = 3; int32 limit = 4; string cursor = 5; }
message ListTasksResponse { repeated Task tasks = 1; string next_cursor = 2; bool has_more = 3; }
message UpdateTaskRequest { string id = 1; string title = 2; string description = 3; TaskStatus status = 4; int32 priority = 5; string assignee_id = 6; }
message DeleteTaskRequest { string id = 1; }
message DeleteTaskResponse{ bool deleted = 1; }
A unary server interceptor mirrors the REST middleware chain: extract bearer token from
metadata → verify → resolve tenant → RBAC → inject context, plus otelgrpc and metrics.
RBAC matrix (role → allowed operations)¶
| Operation | owner | admin | member |
|---|---|---|---|
| View org / members | ✅ | ✅ | ✅ |
| Update org settings | ✅ | ✅ | ❌ |
| Delete org | ✅ | ❌ | ❌ |
| Invite member | ✅ | ✅ | ❌ |
| Change member role | ✅ | ✅* | ❌ |
| Remove member | ✅ | ✅* | ❌ |
| Create / view project | ✅ | ✅ | ✅ |
| Archive / delete project | ✅ | ✅ | ❌ |
| Create / update / view task | ✅ | ✅ | ✅ |
| Delete task | ✅ | ✅ | own only |
* admins cannot modify owners; only an owner can promote/demote owners or transfer ownership.
Tech Stack¶
| Concern | Library / tool (import path) |
|---|---|
| HTTP router | github.com/go-chi/chi/v5 |
| Postgres driver/pool | github.com/jackc/pgx/v5, github.com/jackc/pgx/v5/pgxpool |
| Type-safe queries | sqlc (github.com/sqlc-dev/sqlc) |
| Migrations | github.com/golang-migrate/migrate/v4 |
| Redis client | github.com/redis/go-redis/v9 |
| JWT | github.com/golang-jwt/jwt/v5 |
| Password hashing | golang.org/x/crypto/bcrypt and/or golang.org/x/crypto/argon2 |
| Structured logging | log/slog (stdlib) |
| Metrics | github.com/prometheus/client_golang/prometheus + .../promhttp |
| Tracing | go.opentelemetry.io/otel, .../sdk/trace, .../exporters/otlp/otlptrace/otlptracegrpc |
| HTTP/gRPC instrumentation | go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp, .../google.golang.org/grpc/otelgrpc |
| gRPC | google.golang.org/grpc, google.golang.org/protobuf |
| Proto tooling | buf (github.com/bufbuild/buf), protoc-gen-go, protoc-gen-go-grpc |
| Config (12-factor) | github.com/kelseyhightower/envconfig (or github.com/spf13/viper) |
| Dependency injection | constructor injection; optional github.com/google/wire |
| Job queue | github.com/hibiken/asynq (Redis-backed) or project-06 queue |
| UUIDs | github.com/google/uuid |
| Testing | github.com/stretchr/testify, stdlib testing |
| Integration containers | github.com/testcontainers/testcontainers-go (+ modules/postgres, modules/redis) |
| Linting | golangci-lint (github.com/golangci/golangci-lint) |
| Load testing | k6 (Grafana) or vegeta |
| Release / images | optional goreleaser or ko |
Implementation Milestones¶
Phase 0 — Foundation, config, logging
- Initialize module,
Makefile,.golangci.yml, repo layout,READMEskeleton. -
internal/platform/config—envconfig/viperloader, validation,.env.example. -
internal/platform/logging—slogJSON handler, level from config, context fields helper. -
internal/platform/httpserver— chi server,/healthz+/readyz, graceful shutdown viasignal.NotifyContext. -
internal/platform/postgres—pgxpoolfactory, ping/health,sqlc.yaml, first migration runner (cmd/migrate).
Phase 1 — Auth & JWT
- User domain + password hashing (bcrypt/argon2id) with timing-safe verify.
-
golang-jwttoken issuer/verifier; access (15m) + opaque refresh tokens (hashed at rest). -
signup,login,refresh(with rotation + reuse detection),logoutuse-cases + handlers. - Auth middleware: bearer extraction → verify → inject
user_id/claims into context.
Phase 2 — Multi-tenancy & RBAC
- Organizations, memberships, roles domain + repositories.
- Tenant-resolution middleware: derive
org_idfrom claim, assert against path, inject into context. -
scopedQueryrepo helper enforcingWHERE org_id = $ctx. - RBAC middleware/policy: role → permission checks per route; tenant-isolation guard.
Phase 3 — Core domain CRUD
- Projects use-cases + repo + REST handlers (list/create/get/update/archive), paginated.
- Tasks use-cases + repo + REST handlers (CRUD + status transition), filtered + paginated.
- Invitations: create (queues email job) + accept flow.
Phase 4 — Caching & rate limiting
-
internal/platform/redisclient; cache-aside for hot reads (project/task lists) with TTL + invalidation on writes. - Redis token-bucket rate limiter middleware (per-IP and per-tenant) with
429+Retry-After.
Phase 5 — gRPC
-
bufconfig +TaskServiceproto + generated stubs. - gRPC server with auth/tenant/RBAC +
otelgrpcinterceptors, sharing the same use-cases.
Phase 6 — Observability
- Prometheus metrics: RED (rate/errors/duration) per route, DB pool, cache hit ratio, queue depth;
/metrics. - OpenTelemetry tracer provider → OTLP → Jaeger/Tempo; spans across middleware → use-case → repo → DB.
- Correlate
request_id/trace_idinto every log line.
Phase 7 — Async jobs
-
cmd/workerconsuming the queue; idempotent handlers for invitation email, task-assignment notification, webhook. - Publisher port wired into use-cases; retries + dead-letter handling.
Phase 8 — Testing
- Unit tests on
domain/appagainst mocked ports (> 70% core coverage). - Integration tests with
testcontainers-go(Postgres + Redis): repos, migrations, cache, rate limiter. - Tenant-isolation tests (A cannot read/write B), RBAC matrix tests, auth/token rotation tests.
-
-racein CI; k6/vegeta load script.
Phase 9 — CI/CD
- GitHub Actions
ci.yml: lint →go test -race -cover→ build, with coverage gate. -
release.yml: build multi-stage images → push to registry → deploy step.
Phase 10 — Docs & ADRs
- OpenAPI spec, generated proto docs, README architecture overview, runbook, Postman/Bruno collection.
- Write the ADR log (see Documentation Deliverables).
Phase 11 — Hardening
- Security pass (headers, input validation, secret handling), zero-downtime migration drill, load test + p99 verification, graceful-shutdown drain test.
Testing Strategy¶
- Unit tests (core):
internal/*/domainandinternal/*/apptested in isolation with hand-written ortestify/mockfakes for every port (Repo, Cache, QueuePublisher, Mailer, TokenIssuer). Pure, fast, deterministic; gate> 70%coverage here. - Integration tests (testcontainers-go): spin up real Postgres and Redis containers per
suite (shared via
TestMain), run migrations, and exercise repositories, the cache-aside path, the rate limiter, and the refresh-token store against the real engines. Containers are torn down after the suite. - API / contract tests: boot the chi router (and gRPC server) against the test DB; assert status codes, the error envelope, pagination, and request validation end-to-end. Optionally validate responses against the OpenAPI schema.
- RBAC tests: drive every cell of the role→operation matrix and assert allow/deny.
- Tenant-isolation tests (critical): seed org A and org B; authenticate as a member of A and
assert that every read and write against B's projects/tasks returns
403/404and never leaks rows — both at the repo layer and the HTTP layer. - Auth / token tests: login, access-token expiry, refresh rotation, refresh-reuse detection (reusing a rotated token revokes the chain), logout revocation.
- Concurrency: run the full suite under
go test -race. - Load test:
k6/vegetascript hitting hot read/write endpoints; assert p99 < 150 ms and capture throughput; verify rate limiter returns429under burst. - CI gates: lint clean, race clean, and coverage threshold on core packages enforced in
ci.yml.
Deployment¶
- Images: one multi-stage
Dockerfileper binary (Dockerfile.api,Dockerfile.worker) — build stage ongolang:1.x, final stage ongcr.io/distroless/staticoralpine, non-root user, static binary,HEALTHCHECK.
# Dockerfile.api (sketch)
FROM golang:1.23 AS build
WORKDIR /src
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 go build -trimpath -ldflags="-s -w" -o /out/api ./cmd/api
FROM gcr.io/distroless/static:nonroot
COPY --from=build /out/api /api
USER nonroot:nonroot
EXPOSE 8080 9090
ENTRYPOINT ["/api"]
- docker-compose (full stack):
api,worker,postgres,redis,jaeger,prometheus,grafana. The compose file wires env, health checks,depends_onwithcondition: service_healthy, volumes for Postgres/Redis, and Prometheus scrape config pointing at the api/metrics. Jaeger receives OTLP traces; Grafana provisions a dashboard from Prometheus. - GitHub Actions pipeline:
lint (golangci-lint)→test (go test -race -coverprofile)→build→docker build & push(on tag/main) →deploy. Coverage gate fails the job below threshold; proto and sqlc generation are verified up-to-date (git diff --exit-code). - Zero-downtime migrations: expand/contract — additive, backward-compatible migrations
first (new nullable columns/tables/indexes via
CREATE INDEX CONCURRENTLY), deploy code that writes both old+new, backfill, then a later contract migration drops the old. Migrations run as a separatecmd/migratejob/init container, never inside the request path. - Health & readiness:
/healthz(process alive) and/readyz(DB + Redis reachable) back k8s liveness/readiness probes; readiness flips false during graceful drain. - 12-factor config: strictly env-driven (
DATABASE_URL,REDIS_URL,JWT_SIGNING_KEY,OTEL_EXPORTER_OTLP_ENDPOINT,LOG_LEVEL, ...); no config in the image. - Secrets: injected via environment / secret manager (never committed); signing keys rotatable
via
kidin JWT header. - Kubernetes (stretch):
deployments/k8swith Deployment + Service + HPA (CPU/RPS) + Ingress + a migrationJob, plusPodDisruptionBudgetfor safe rollouts.
Documentation Deliverables¶
- README — what/why, quickstart (
make up), architecture overview with the mermaid diagrams, environment variables table, and the demo flow (signup → create project → create task). - OpenAPI spec (
api/openapi/openapi.yaml) — complete REST contract; optionally serve a Swagger/Redoc UI. - Generated proto docs —
buf+protoc-gen-docHTML/Markdown for the gRPC services. - ADR log (
docs/adr/) — example records: 0001— Adopt hexagonal (ports & adapters) architecture.0002— Multi-tenancy strategy: shared DB +org_idscoping (vs schema/db-per-tenant; RLS option).0003— REST (chi) + gRPC split: public REST, internal gRPC.0004— Auth & token strategy: short-lived access JWT + rotating refresh tokens with reuse detection.0005— Caching strategy: Redis cache-aside with TTL + write-through invalidation.0006— Rate limiting: Redis token bucket, per-IP and per-tenant.0007— Observability stack: slog + Prometheus + OpenTelemetry/Jaeger.0008— Persistence & migrations: pgx + sqlc + golang-migrate, expand/contract.- Runbook (
docs/runbook.md) — deploy/rollback, run migrations, common alerts and remedies, scaling, secret rotation, dashboards and trace lookup byrequest_id. - Postman / Bruno collection — the full request set with auth flow and env variables, ready to import.
Stretch Goals / Future Improvements¶
- Billing — Stripe subscriptions, plan-based seat limits, metered usage, webhooks.
- Feature flags — per-tenant flags (e.g. Unleash/OpenFeature) to ship gradually.
- Audit log — append-only record of who-did-what-when, per tenant.
- Realtime — WebSockets/SSE for live task updates and presence.
- Event-driven — transactional outbox + Kafka/NATS for reliable domain events.
- Per-tenant quotas — rate limits and storage/seat quotas enforced per plan.
- RLS enforcement — turn on PostgreSQL Row-Level Security for hard DB-level isolation.
- Blue-green / canary deploys with automated rollback on SLO breach.
- SLOs & alerting — error-budget burn alerts on the p99/error-rate SLOs in Prometheus/Grafana.
Lessons-Learned Prompts¶
- Architecture: Where did the hexagonal boundary pay off, and where did it feel like ceremony? Which port was hardest to keep free of leaking infrastructure details?
- Multi-tenancy: How confident are you that no query can cross tenants? What would change if one tenant grew 100× larger than the rest, and when would you move to RLS or db-per-tenant?
- Security: Walk through the refresh-token rotation and reuse-detection flow. What attacks does it stop, and what is still exposed if the signing key leaks?
- Observability: Given a user-reported slow request and only its
request_id, trace it end to end. Which signal (log, metric, or trace) answered which question? - Performance: Which indexes and which cache entries actually moved p99? How did you measure, and what was the cache hit ratio under load?
- Testing: What did testcontainers catch that mocked unit tests could not? Which test gives you the most confidence before a deploy?
- Operations: Describe a zero-downtime schema change you made with expand/contract. What would you do differently for a column that needs a type change?
- Trade-offs: If you had one more week, what is the single highest-leverage improvement and why?
Portfolio & Resume¶
Resume Bullets¶
- Built a production-grade multi-tenant SaaS backend in Go (hexagonal architecture, REST + gRPC) serving strictly isolated tenant data, sustaining p99 < 150 ms with a Redis cache-aside layer and validated under k6 load tests.
- Implemented end-to-end security and reliability: JWT access + rotating refresh tokens with reuse detection, RBAC (owner/admin/member), Redis token-bucket rate limiting, graceful shutdown, and zero-downtime expand/contract migrations (pgx + sqlc + golang-migrate).
- Delivered full observability and CI/CD: structured
sloglogs, Prometheus RED metrics, and OpenTelemetry traces to Jaeger, with a GitHub Actions pipeline (lint → race tests → build → image push) and testcontainers integration tests holding > 70% core coverage.
Interview Talking Points¶
- Multi-tenancy isolation — shared DB +
org_idscoping driven from verified JWT claims, ascopedQueryguardrail, tenant-isolation tests, and RLS as defense-in-depth. - Hexagonal architecture + DI — domain at the center, ports as interfaces, adapters at the edge,
constructor injection at a single composition root (optionally
google/wire). - JWT / RBAC — token design, rotation, reuse detection, and a role→operation permission matrix.
- Observability triad — when logs vs metrics vs traces each answer the question, and how
request_id/trace_idcorrelation makes incidents debuggable. - Integration testing with testcontainers — real Postgres + Redis in tests, what it catches over mocks, and how it stays fast.
- CI/CD & graceful shutdown — the pipeline gates, image strategy, and how the server drains in-flight work on SIGTERM behind a load balancer.
System Design Stories¶
This single repository unlocks distinct interview stories:
- "Design a multi-tenant SaaS" — isolation strategies and the trade-offs you chose.
- "Design an auth system" — access/refresh tokens, rotation, RBAC, password hashing.
- "Make this service observable" — the logs/metrics/traces triad and incident debugging.
- "Add caching and rate limiting" — Redis cache-aside invalidation and the token-bucket limiter.
- "Evolve the schema with zero downtime" — expand/contract migrations on a live tenant DB.
- "Decouple a slow side-effect" — moving email/webhooks to an idempotent async worker queue.