Database Schema Design
When to use this skill
Lists specific situations where this skill should be triggered:
- New Project: Database schema design for a new application
- Schema Refactoring: Redesigning an existing schema for performance or scalability
- Relationship Definition: Implementing 1:1, 1:N, N:M relationships between tables
- Migration: Safely applying schema changes
- Performance Issues: Index and schema optimization to resolve slow queries
Input Format
The required and optional input information to collect from the user:
Required Information
- Database Type: PostgreSQL, MySQL, MongoDB, SQLite, etc.
- Domain Description: What data will be stored (e.g., e-commerce, blog, social media)
- Key Entities: Core data objects (e.g., User, Product, Order)
Optional Information
- Expected Data Volume: Small (<10K rows), Medium (10K-1M), Large (>1M) (default: Medium)
- Read/Write Ratio: Read-heavy, Write-heavy, Balanced (default: Balanced)
- Transaction Requirements: Whether ACID is required (default: true)
- Sharding/Partitioning: Whether large data distribution is needed (default: false)
Input Example
Design a database for an e-commerce platform:
- DB: PostgreSQL
- Entities: User, Product, Order, Review
- Relationships:
- A User can have multiple Orders
- An Order contains multiple Products (N:M)
- A Review is linked to a User and a Product
- Expected data: 100,000 users, 10,000 products
- Read-heavy (frequent product lookups)
Instructions
Specifies the step-by-step task sequence to follow precisely.
Step 1: Define Entities and Attributes
Identify core data objects and their attributes.
Tasks:
- Extract nouns from business requirements → entities
- List each entity's attributes (columns)
- Determine data types (VARCHAR, INTEGER, TIMESTAMP, JSON, etc.)
- Designate Primary Keys (UUID vs Auto-increment ID)
Example (E-commerce):
Users
- id: UUID PRIMARY KEY
- email: VARCHAR(255) UNIQUE NOT NULL
- username: VARCHAR(50) UNIQUE NOT NULL
- password_hash: VARCHAR(255) NOT NULL
- created_at: TIMESTAMP DEFAULT NOW()
- updated_at: TIMESTAMP DEFAULT NOW()
Products
- id: UUID PRIMARY KEY
- name: VARCHAR(255) NOT NULL
- description: TEXT
- price: DECIMAL(10, 2) NOT NULL
- stock: INTEGER DEFAULT 0
- category_id: UUID REFERENCES Categories(id)
- created_at: TIMESTAMP DEFAULT NOW()
Orders
- id: UUID PRIMARY KEY
- user_id: UUID REFERENCES Users(id)
- total_amount: DECIMAL(10, 2) NOT NULL
- status: VARCHAR(20) DEFAULT 'pending'
- created_at: TIMESTAMP DEFAULT NOW()
OrderItems (Junction table)
- id: UUID PRIMARY KEY
- order_id: UUID REFERENCES Orders(id) ON DELETE CASCADE
- product_id: UUID REFERENCES Products(id)
- quantity: INTEGER NOT NULL
- price: DECIMAL(10, 2) NOT NULL
Step 2: Design Relationships and Normalization
Define relationships between tables and apply normalization.
Tasks:
- 1:1 relationship: Foreign Key + UNIQUE constraint
- 1:N relationship: Foreign Key
- N:M relationship: Create junction table
- Determine normalization level (1NF ~ 3NF)
Decision Criteria:
- OLTP systems → normalize to 3NF (data integrity)
- OLAP/analytics systems → denormalization allowed (query performance)
- Read-heavy → minimize JOINs with partial denormalization
- Write-heavy → full normalization to eliminate redundancy
Example (ERD Mermaid):
erDiagram
Users ||--o{ Orders : places
Orders ||--|{ OrderItems : contains
Products ||--o{ OrderItems : "ordered in"
Categories ||--o{ Products : categorizes
Users ||--o{ Reviews : writes
Products ||--o{ Reviews : "reviewed by"
Users {
uuid id PK
string email UK
string username UK
string password_hash
timestamp created_at
}
Products {
uuid id PK
string name
decimal price
int stock
uuid category_id FK
}
Orders {
uuid id PK
uuid user_id FK
decimal total_amount
string status
timestamp created_at
}
OrderItems {
uuid id PK
uuid order_id FK
uuid product_id FK
int quantity
decimal price
}
Step 3: Establish Indexing Strategy
Design indexes for query performance.
Tasks:
- Primary Keys automatically create indexes
- Columns frequently used in WHERE clauses → add indexes
- Foreign Keys used in JOINs → indexes
- Consider composite indexes (WHERE col1 = ? AND col2 = ?)
- UNIQUE indexes (email, username, etc.)
Checklist:
- Indexes on frequently queried columns
- Indexes on Foreign Key columns
- Composite index order optimized (high selectivity columns first)
- Avoid excessive indexes (degrades INSERT/UPDATE performance)
Example (PostgreSQL):
-- Primary Keys (auto-indexed)
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
email VARCHAR(255) UNIQUE NOT NULL, -- UNIQUE = auto-indexed
username VARCHAR(50) UNIQUE NOT NULL,
password_hash VARCHAR(255) NOT NULL,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);
-- Foreign Keys + explicit indexes
CREATE TABLE orders (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
total_amount DECIMAL(10, 2) NOT NULL,
status VARCHAR(20) DEFAULT 'pending',
created_at TIMESTAMP DEFAULT NOW()
);
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_orders_status ON orders(status);
CREATE INDEX idx_orders_created_at ON orders(created_at);
-- Composite index (status and created_at frequently queried together)
CREATE INDEX idx_orders_status_created ON orders(status, created_at DESC);
-- Products table
CREATE TABLE products (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name VARCHAR(255) NOT NULL,
description TEXT,
price DECIMAL(10, 2) NOT NULL CHECK (price >= 0),
stock INTEGER DEFAULT 0 CHECK (stock >= 0),
category_id UUID REFERENCES categories(id),
created_at TIMESTAMP DEFAULT NOW()
);
CREATE INDEX idx_products_category ON products(category_id);
CREATE INDEX idx_products_price ON products(price); -- price range search
CREATE INDEX idx_products_name ON products(name); -- product name search
-- Full-text search (PostgreSQL)
CREATE INDEX idx_products_name_fts ON products USING GIN(to_tsvector('english', name));
CREATE INDEX idx_products_description_fts ON products USING GIN(to_tsvector('english', description));
Step 4: Set Up Constraints and Triggers
Add constraints to ensure data integrity.
Tasks:
- NOT NULL: required columns
- UNIQUE: columns that must be unique
- CHECK: value range constraints (e.g., price >= 0)
- Foreign Key + CASCADE option
- Set default values
Example:
CREATE TABLE products (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name VARCHAR(255) NOT NULL,
price DECIMAL(10, 2) NOT NULL CHECK (price >= 0),
stock INTEGER DEFAULT 0 CHECK (stock >= 0),
discount_percent INTEGER CHECK (discount_percent >= 0 AND discount_percent <= 100),
category_id UUID REFERENCES categories(id) ON DELETE SET NULL,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);
-- Trigger: auto-update updated_at
CREATE OR REPLACE FUNCTION update_updated_at_column()
RETURNS TRIGGER AS $$
BEGIN
NEW.updated_at = NOW();
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER update_products_updated_at
BEFORE UPDATE ON products
FOR EACH ROW
EXECUTE FUNCTION update_updated_at_column();
Step 5: Write Migration Scripts
Write migrations that safely apply schema changes.
Tasks:
- UP migration: apply changes
- DOWN migration: rollback
- Wrap in transactions
- Prevent data loss (use ALTER TABLE carefully)
Example (SQL migration):
-- migrations/001_create_initial_schema.up.sql
BEGIN;
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
CREATE TABLE