Databases: Production Configuration & Operations
Configure, tune, design schemas, migrate, back up, and review database engines - from single-node dev setups to PCI-compliant production clusters. The goal is correct, performant, durable databases that survive failures, pass audits, and don't wake you up at 3am.
Target versions (May 2026):
- PostgreSQL 18.4 (EOL 2030-11; May 2026 security release), previous: 17.10, 16.14
- MongoDB 8.0.20 (GA, EOL 2029-10); rapid lane (8.3 latest) is Atlas-only with a short window - verify live before pinning
- MariaDB 11.8.7 (LTS, EOL 2028-06); 12.x rolling GA is quarterly and EOLs at each successor (12.2 reached EOL 2026-05), with 12.3 the next yearly LTS - verify live
- MySQL 8.4.8 (LTS); innovation lane (9.x) has a short support window - verify live
- SQL Server 2025 RTM + CU3 (GA 2025-11-18)
- PgBouncer 1.25.1, Pgpool-II 4.7.1, ProxySQL 3.0.6
This skill covers six domains depending on context:
- Configuration - engine settings, authentication, TLS, tuning parameters
- Schema design - indexing strategy, partitioning, normalization, type selection
- Migration - cross-engine migration, zero-downtime DDL, ORM migration tooling
- Operations - backup/restore, replication, connection pooling, monitoring
- Performance - query plan analysis, index optimization, vacuum/maintenance
- Compliance - PCI-DSS 4.0 encryption, audit logging, key management, data masking
When to use
- Configuring database engines (postgresql.conf, mongod.conf, my.cnf, MSSQL settings)
- Designing or reviewing database schemas (indexes, partitioning, types, constraints)
- Planning or executing cross-engine migrations (MySQL -> PostgreSQL, etc.)
- Setting up backup/restore strategies and PITR
- Configuring replication (streaming, logical, replica sets, GTID)
- Tuning connection pooling (PgBouncer, ProxySQL, application-side pools)
- Analyzing query performance (EXPLAIN, slow query logs, index usage)
- Database-level PCI-DSS 4.0 compliance (encryption, audit logging, access control)
- Evaluating managed vs self-hosted database decisions
- Setting up database monitoring and alerting
When NOT to use
- Deploying databases on Kubernetes (StatefulSets, PVCs, operators) - use kubernetes
- Provisioning managed databases (RDS, Cloud SQL, Atlas) via IaC - use terraform
- Docker Compose for database containers - use docker
- Database-related Ansible playbooks and roles - use ansible
- Application-level database bugs (N+1, transaction misuse, ORM pitfalls) - use code-review
- SQL injection detection, connection string secrets in code - use security-audit
- CI/CD pipelines that run migrations - use ci-cd
- Redis/Valkey (cache/KV stores) and other non-relational engines (Cassandra, DynamoDB, ClickHouse, etc.) - outside this skill's four primary engines; use the relevant platform skill or general guidance
AI Self-Check
AI tools consistently produce the same database mistakes. Before returning any generated SQL, schema, migration, or config, verify against this list:
Migrations
-
IF NOT EXISTS/IF EXISTSguards onADD COLUMN/DROP COLUMN(bare DDL crashes on re-run) - Adding
NOT NULLcolumn includes aDEFAULTvalue (or two-step: add nullable, backfill, alter NOT NULL) - Index creation uses
CONCURRENTLYon PostgreSQL (prevents full table lock) - Large table changes run in batches, not a single transaction (lock escalation, OOM risk)
- Migration is backward-compatible (old app version can still run against new schema)
- No
DROP TABLEorDROP DATABASEwithout explicit user confirmation - Migration is idempotent - can be run twice without error
- Rollback/down migration exists and is tested
- PG enum changes use
ALTER TYPE ... ADD VALUEoutside a transaction (PG 12+ allows it inside a transaction, but the new value cannot be used until that transaction commits - run it standalone to use the value immediately)
Schema
- New timestamp columns use
timestamptz(PG),DATETIME2(MSSQL),DATETIME(MySQL -TIMESTAMPhas the 2038 problem) - Character set is
utf8mb4for MySQL (notutf8which is 3-byte only),UTF8for PG - Identity columns use
GENERATED ALWAYS AS IDENTITYoverSERIALin PG 10+ (non-bypassable) - Foreign keys have explicit
ON DELETEbehavior (don't rely on engine defaults) - Indexes exist on all foreign key columns (PG does NOT auto-create these, unlike MySQL InnoDB)
- Composite index column order follows: equality columns first, range column last, selectivity-ordered
Configuration
- No
trustormd5authentication inpg_hba.conf(usescram-sha-256) - MySQL
sql_modeincludesSTRICT_TRANS_TABLES(prevents silent data truncation) - TLS enforced for all connections (not just "available")
-
max_connectionsis sized for the actual workload, not left at default - Password authentication uses modern hashing (SCRAM-SHA-256 for PG,
caching_sha2_passwordfor MySQL) - No default/example passwords in config files
Bulk Operations
- Bulk inserts are chunked - never pass an unbounded array into a single
INSERT ... VALUESstatement - Chunk size computed as
floor(max_host_parameters / columns_per_row), where host-parameter limits are: PostgreSQL 65,535, MySQL 65,535, SQLite 32,766 (defaultSQLITE_MAX_VARIABLE_NUMBER), MSSQL 2,100 - When the app targets multiple backends, chunk size uses the lowest limit across all supported engines
- Chunked writes stay inside one transaction when replace-all semantics are required (DELETE + INSERT pattern)
- Tests assert that multiple insert calls fire when row count crosses the safe chunk threshold
General
- All SQL uses parameterized queries / prepared statements (never string concatenation)
- Connection pool settings don't exceed
max_connectionsacross all app instances - Backup strategy tested by actually restoring (backup without restore test = hope, not a strategy)
- Secrets (passwords, connection strings) injected via env vars or secret managers, not config files
- Current source checked: dated versions, CLI flags, API names, and support windows are verified against primary docs before repeating them
- Hidden state identified: local config, credentials, caches, contexts, branches, cluster targets, or previous runs are made explicit before acting
- Verification is real: final checks exercise the actual runtime, parser, service, or integration point instead of only linting prose or happy paths
- Routing overlap checked: overlapping skills, trigger terms, and "When NOT to use" boundaries are checked before returning guidance
- Spec claims verified: claims about tool behavior, output contracts, or repo conventions are checked against current docs, scripts, or skill files
- Engine/version checked: SQL syntax, index options, and replication advice match the named engine and major version
- Data-loss path gated: migrations, deletes, reindexes, and failovers include backup, dry-run, rollback, or maintenance-window guidance
Performance
- Use
EXPLAIN/EXPLAIN ANALYZEwith realistic parameters before adding indexes. - Tune connection pools to database capacity; more app connections can reduce throughput.
- Batch writes and migrations in bounded chunks to avoid lock escalation, replication lag, and runaway transactions.
Best Practices
- Take restorable backups before schema changes and verify restore procedures periodically.
- Separate online, background, and analytical workloads where query shape or latency differs.
- Prefer additive migrations with backfills and compatibility windows for zero-downtime services.
Workflow
Step 1: Determine the domain
Based on the request:
- "Configure PostgreSQL / tune settings" -> Configuration
- **"