SQL Architecture & Database Standards
Targets PostgreSQL 18 as the primary engine; MySQL 9 and SQLite 3.53 noted where they differ. See STACK.md for pinned tool versions.
1. Schema design
- Primary key:
id UUID PRIMARY KEY DEFAULT uuidv7()on every table. UUID v7 is sortable, distributed-friendly, and doesn't leak counts via URLs. PG 18 has nativeuuidv7(); on earlier engines use an extension or app-generated UUID v7. - Natural identifiers stay UNIQUE: the surrogate
idis for joins; domain meaning lives inUNIQUEconstraints (email,slug,iso_code). - Foreign keys: always declared at the DB level (
REFERENCES other(id) ON DELETE RESTRICTby default). Never enforce relationships in app code alone. - NOT NULL by default. Make NULL an explicit, justified choice — every nullable column should have a documented reason.
- CHECK constraints: push invariants into the DB (
CHECK (price >= 0),CHECK (status IN (...))). They survive bad code paths. - Audit columns (opt-in):
created_at+updated_at(timestamptz NOT NULL DEFAULT now()) on tables whose rows change after creation. Skip on pure lookup tables (countries,currencies) and event/log tables (whereevent_atalone is enough). - Soft delete (default):
deleted_at timestamptz NULL. Filter via partial indexes (CREATE INDEX ... WHERE deleted_at IS NULL) so queries stay fast. Use hard delete only when GDPR/compliance requires it, or for append-only event tables where deletion never makes sense.
2. Naming conventions
- Tables:
snake_case, plural (users,order_items). - Columns:
snake_case, singular (email,created_at). - PK: always
id. - FKs:
<reftable_singular>_id(user_id,order_id). - Indexes:
ix_<table>_<cols>; uniqueux_<table>_<cols>; partialix_<table>_<cols>_active. - Constraints:
<table>_<purpose>_check,<table>_<purpose>_fkey. - Sequences: PG generates them; rename only if necessary, never reference manually — UUID v7 PKs avoid the need.
3. Indexing strategy
- Index every FK column — JOINs without indexes are silent killers.
- B-tree is the default. GIN for arrays / JSONB containment. GiST for ranges / geometry. BRIN for huge append-only tables sorted by time.
- Composite index column order: most-selective column first, then by query pattern. The same composite index can serve
WHERE a = ?andWHERE a = ? AND b = ?— but notWHERE b = ?alone. - Partial indexes for soft-deleted rows (
... WHERE deleted_at IS NULL) and other common filters. - Don't over-index. Each index slows writes and consumes RAM. Add one only when
EXPLAIN ANALYZEproves a real query needs it. - Drop unused indexes. PG: query
pg_stat_user_indexesperiodically; remove any withidx_scan = 0.
4. Query patterns
- Parameter binding always.
WHERE id = $1— never string-concatenate user input. - Explicit columns.
SELECT id, email, created_at FROM users— neverSELECT *in production code (breaks on schema additions, returns extra bytes). RETURNINGon writes.INSERT ... RETURNING id, created_atsaves a round-trip and surfaces server defaults.ON CONFLICTfor upserts.INSERT ... ON CONFLICT (email) DO UPDATE SET ...— atomic, no race.- Pagination = cursor, not OFFSET.
WHERE (created_at, id) < ($1, $2) ORDER BY created_at DESC, id DESC LIMIT 20. OFFSET is O(N) on every page. - CTEs (
WITH) for readability, but be aware PG ≤ 11 materializes them (perf cliff). PG 12+ inlines unlessMATERIALIZEDis forced.
5. JOINs & N+1 prevention
- N+1 is the single most common perf bug. One query that fetches parents followed by N queries for children = always wrong.
- Detect it: log every SQL statement in dev with timings; any list endpoint emitting more than 2–3 queries per request is suspect.
- Fix it: single JOIN, or batched
WHERE id IN (...)for the children, or DataLoader-style batchers if behind an API layer. - JOIN vs subquery: different shapes, similar plans in modern PG — let
EXPLAIN ANALYZEdecide. Prefer JOINs when both sides return many rows; correlated subqueries when the inner table is small or you want a scalar. LEFT JOINonly when you genuinely need rows from the left without a match. ALEFT JOINfiltered byWHERE right.x = ?silently becomes an INNER JOIN — clearer to write it that way.
6. Transactions & isolation
- Wrap related writes in a transaction. Anything that must succeed or fail together.
- Isolation default:
READ COMMITTED(PG default). Bump toREPEATABLE READfor multi-statement reads that need a consistent snapshot, andSERIALIZABLEwhen reads-then-writes need true linearizability — handle40001(serialization failure) by retrying. - Keep transactions short. Holding a transaction across a network call or user input is a deadlock waiting to happen.
- Advisory locks (
pg_advisory_lock) for distributed coordination that doesn't fit a row-level lock — singleton workers, leader election, idempotent jobs. - Roll back explicitly on error. Application drivers should bracket every transaction in begin/commit/rollback — never leak open transactions.
7. Migrations
- Forward-only. No
downmigrations in production. Revert by writing a new forward migration. Down migrations rot fast and lie about what they restore. - Idempotent where possible:
CREATE TABLE IF NOT EXISTS,ADD COLUMN IF NOT EXISTS. Lets a half-applied migration re-run safely. - One change per migration. Easier to review, easier to revert, atomic in CI.
- Naming:
NNNNNN_snake_case_description.sql(e.g.000023_add_orders_paid_at_index.sql). Number monotonically; pad enough to last. - Large-table changes: add column as nullable first, backfill in batches (separate migration or async job), then add
NOT NULL/ index. SingleALTER TABLE ... NOT NULLon a billion-row table will lock writes for minutes. - Tools:
golang-migrate(Go) andalembic(Python) — both run raw.sqlfiles. STACK.md pins versions.
8. Security
- Parameter binding always. Restated because it's the single most common SQL injection vector.
- Least-privilege roles. The app's runtime role can
SELECT/INSERT/UPDATE/DELETEbut should notCREATE/DROP/ALTER. Migrations run as a separate, elevated role. - Row-Level Security (PG) for multi-tenant:
ENABLE ROW LEVEL SECURITY+ policies keyed on a session variable holdingtenant_id. Defense in depth against app-layer mistakes. - No secrets in connection strings stored in repo. Pull from env / secret manager. PG supports
~/.pgpassfor local dev. - Encrypted at rest is a deployment concern, but flag it: enable at the volume / disk layer.
9. Performance & EXPLAIN
EXPLAIN ANALYZEis the only honest perf tool. Don't optimize without it.- Read the plan top-to-bottom. Look for
Seq Scanon large tables (missing index?), highactual rowsvsplan rows(stale stats — runANALYZE), nested loops over huge inputs (wrong index). - Slow query logging:
log_min_duration_statement = 200(ms) in dev. Tighter in prod, but at least log the 99th percentile offenders. - Connection pooling. Postgres pays a real cost per connection. Use
pgbouncer(transaction-mode for OLTP) in front; in-app pools (e.g.pgx,psycopg) for finer control. VACUUM ANALYZEruns automatically (autovacuum) — be aware of bloat on heavyUPDATE/DELETEtables and tune autovacuum thresholds for them.
10. JSON columns — when (and when not)
- Use JSONB only for genuinely schemaless payloads: raw webhook bodies, opaque third-party event blobs, user-defined opaque config.
- Never for structured domain data. If you find yourself querying
data->'order'->>'status'repeatedly, it's a column, not a JSON field. Normalize. - Index with GIN if you must query JSO