When this skill is activated, always start your first response with the 🧢 emoji.

Database Engineering

A disciplined framework for designing, optimizing, and evolving relational databases in production. This skill covers schema design, indexing strategies, query optimization, safe migrations, and operational concerns like connection pooling and partitioning. It is opinionated about PostgreSQL but most principles apply to any SQL database. The goal is to help you make the right trade-off at each decision point, not just hand you a syntax reference.

When to use this skill

Trigger this skill when the user:

Designs a database schema or needs normalization guidance
Asks about creating or tuning indexes (composite, partial, covering)
Wants to understand or optimize a slow query or EXPLAIN plan
Plans a database migration (adding columns, renaming, dropping, backfilling)
Implements soft deletes, audit trails, or temporal data patterns
Sets up connection pooling (PgBouncer, application-level pools)
Partitions a large table by time, hash, or range
Chooses between replication strategies (read replicas, logical replication)
Investigates deadlocks, connection exhaustion, or lock contention

Do NOT trigger this skill for:

NoSQL / document store design (MongoDB, DynamoDB) - different trade-off space
ORM-specific configuration questions unrelated to the underlying SQL

Key principles

Normalize first, then denormalize with a documented reason - Start in third normal form. Every denormalization must be a conscious decision backed by a measured performance requirement, not a guess. Write a comment explaining why.
Index for your queries, not your tables - An index that does not serve a query is write overhead and bloat. Before adding an index, write out the query it serves and confirm with EXPLAIN ANALYZE that it is actually used.
Migrations must be reversible - Every schema change should have a rollback path. Use the expand-contract pattern for breaking changes: add the new shape, migrate data, deprecate the old shape, then drop it in a later release.
Measure before optimizing - EXPLAIN ANALYZE is the ground truth. Never tune a query without first reading the plan. A query that looks slow may be fast; a query that looks fast may be causing invisible downstream load.
Plan for growth at schema design time - Ask: "What happens at 100x rows? At 10x write throughput?" Identify which columns will need indexes, which tables might need partitioning, and which joins will become expensive before the schema is locked.

Core concepts

Normalization forms

Form	What it eliminates	When to stop here
1NF	Repeating groups, non-atomic columns	Almost never - baseline only
2NF	Partial dependencies on composite keys	Rare - get to 3NF
3NF	Transitive dependencies	Default target for OLTP schemas
BCNF	Remaining anomalies in 3NF edge cases	When you have overlapping candidate keys

Denormalize (with intent) for read-heavy aggregations, pre-computed summaries, or when JOINs across normalized tables are measured to be a bottleneck.

Index types

Type	Structure	Best for
B-tree	Balanced tree	Equality, range, ORDER BY, IS NULL - the default
Hash	Hash table	Equality-only lookups (rarely faster than B-tree in Postgres)
GIN	Inverted index	JSONB keys, full-text search, array containment
GiST	Generalized search tree	Geometric data, range types, nearest-neighbor
BRIN	Block range index	Very large append-only tables sorted by a natural order (e.g. timestamps)

Composite B-tree indexes follow the leftmost prefix rule: an index on (a, b, c) serves queries filtering on a, (a, b), or (a, b, c) - but not (b, c) alone.

ACID and WAL

ACID (Atomicity, Consistency, Isolation, Durability) guarantees that transactions are all-or-nothing, maintain invariants, are isolated from each other, and survive crashes. PostgreSQL implements these via MVCC (Multi-Version Concurrency Control) - readers never block writers and vice versa.

WAL (Write-Ahead Log) is the mechanism for durability and replication. Every change is written to the WAL before it hits the data file. Streaming replication ships WAL segments to replicas. Logical replication decodes WAL into row-level change events.

Connection pooling

Each PostgreSQL connection is a forked OS process (~5-10 MB RAM). At 500 direct connections, the database is spending more time on connection overhead than queries. PgBouncer in transaction mode is the standard solution - it multiplexes many application connections onto a small pool of server connections. Target 10-20 server connections per core as a starting point.

Read replicas

Streaming replicas receive WAL in near-real-time (seconds of lag typical, configurable). Use them to offload analytics, reporting, and read-heavy background jobs. Replication lag means replicas can return stale data - never send reads that require post-write consistency to a replica.

Common tasks

Design a normalized schema

Start from an e-commerce domain. Identify entities, attributes, and relationships before writing DDL.

-- 1. Core entities in 3NF
CREATE TABLE customers (
  id          BIGSERIAL PRIMARY KEY,
  email       TEXT        NOT NULL UNIQUE,
  created_at  TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE TABLE products (
  id          BIGSERIAL PRIMARY KEY,
  sku         TEXT        NOT NULL UNIQUE,
  name        TEXT        NOT NULL,
  price_cents INT         NOT NULL CHECK (price_cents >= 0)
);

-- 2. Orders reference customers - foreign key with index
CREATE TABLE orders (
  id          BIGSERIAL PRIMARY KEY,
  customer_id BIGINT      NOT NULL REFERENCES customers(id),
  status      TEXT        NOT NULL DEFAULT 'pending'
                          CHECK (status IN ('pending','confirmed','shipped','cancelled')),
  created_at  TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE INDEX idx_orders_customer_id ON orders(customer_id);
CREATE INDEX idx_orders_status_created ON orders(status, created_at DESC);

-- 3. Junction table for order line items
CREATE TABLE order_items (
  id          BIGSERIAL PRIMARY KEY,
  order_id    BIGINT      NOT NULL REFERENCES orders(id),
  product_id  BIGINT      NOT NULL REFERENCES products(id),
  quantity    INT         NOT NULL CHECK (quantity > 0),
  unit_price_cents INT    NOT NULL
);

CREATE INDEX idx_order_items_order_id ON order_items(order_id);

unit_price_cents is intentionally denormalized from products.price_cents. Prices change over time; the order must record what the customer was charged.

Create effective indexes

-- Composite index: filter first on equality columns, then range/sort
-- Serves: WHERE org_id = ? AND status = ? ORDER BY created_at DESC
CREATE INDEX idx_orders_org_status_created
  ON orders(org_id, status, created_at DESC);

-- Partial index: only index the rows you actually query
-- Saves space and stays small even as the table grows
CREATE INDEX idx_orders_pending
  ON orders(customer_id, created_at)
  WHERE status = 'pending';

-- Covering index: include non-filter columns to avoid heap fetch
-- The query can be answered entirely from the index (index-only scan)
CREATE INDEX idx_products_sku_covering
  ON products(sku)
  INCLUDE (name, price_cents);

-- Check index usage - drop indexes with low scans
SELECT schemaname, tablename, indexname, idx_scan, idx_tup_read
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC;

Read and optimize EXPLAIN plans

-- Always use EXPLAIN ANALYZE (BUFFERS) for real execution data
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.id, c.email, sum(oi.quantity * oi.unit_price_cents)
FROM orders o
JOIN customers c ON c.id = o.customer_id
JOIN order_items oi ON oi.order_id = o.id
WHERE o.status = 'pending'
GROUP BY o.id,

database-engineering

How to add

Drop this on your repo README

Related skills

webapp-testing

brand-guidelines

frontend-design

mcp-builder

Get new Design e Frontend skills every Monday