Designing Distributed Systems
Design scalable, reliable, and fault-tolerant distributed systems using proven patterns and consistency models.
Purpose
Distributed systems are the foundation of modern cloud-native applications. Understanding fundamental trade-offs (CAP theorem, PACELC), consistency models, replication patterns, and resilience strategies is essential for building systems that scale globally while maintaining correctness and availability.
When to Use This Skill
Apply when:
- Designing microservices architectures with multiple services
- Building systems that must scale across multiple datacenters or regions
- Choosing between consistency vs availability during network partitions
- Selecting replication strategies (single-leader, multi-leader, leaderless)
- Implementing distributed transactions (saga pattern, event sourcing, CQRS)
- Designing partition-tolerant systems with proper consistency guarantees
- Building resilient services with circuit breakers, bulkheads, retries
- Implementing service discovery and inter-service communication
Core Concepts
CAP Theorem Fundamentals
CAP Theorem: In a distributed system experiencing a network partition, choose between Consistency (C) or Availability (A). Partition tolerance (P) is mandatory.
Network partitions WILL occur → Always design for P
During partition:
├─ CP (Consistency + Partition Tolerance)
│ Use when: Financial transactions, inventory, seat booking
│ Trade-off: System unavailable during partition
│ Examples: HBase, MongoDB (default), etcd
│
└─ AP (Availability + Partition Tolerance)
Use when: Social media, caching, analytics, shopping carts
Trade-off: Stale reads possible, conflicts need resolution
Examples: Cassandra, DynamoDB, Riak
PACELC: Extends CAP to consider normal operations (no partition).
- If Partition: Choose Availability (A) or Consistency (C)
- Else (normal): Choose Latency (L) or Consistency (C)
Consistency Models Spectrum
Strong Consistency ◄─────────────────────► Eventual Consistency
│ │ │
Linearizable Causal Consistency Convergent
(Slowest, (Middle Ground, (Fastest,
Most Consistent) Causally Ordered) Eventually Consistent)
Strong Consistency (Linearizability):
- All operations appear atomically in sequential order
- Reads always return most recent write
- Use for: Bank balances, inventory stock, seat booking
- Trade-off: Higher latency, reduced availability
Eventual Consistency:
- If no new updates, all replicas eventually converge
- Use for: Social feeds, product catalogs, user profiles, DNS
- Trade-off: Stale reads possible, conflict resolution needed
Causal Consistency:
- Causally related operations seen in same order by all nodes
- Use for: Chat apps, collaborative editing, comment threads
- Trade-off: More complex than eventual, requires causality tracking
Bounded Staleness:
- Staleness bounded by time or version count
- Use for: Real-time dashboards, leaderboards, monitoring
- Trade-off: Must monitor lag, more complex than eventual
Replication Patterns
1. Leader-Follower (Single-Leader):
- All writes to leader, replicated to followers
- Followers handle reads (load distribution)
- Synchronous: Wait for follower ACK (strong consistency, higher latency)
- Asynchronous: Don't wait (eventual consistency, possible data loss)
- Use for: Most common pattern, strong consistency with sync replication
2. Multi-Leader:
- Multiple leaders accept writes in different datacenters
- Leaders replicate to each other
- Conflict resolution required: Last-Write-Wins, application merge, vector clocks
- Use for: Multi-datacenter, low write latency, geo-distributed users
- Trade-off: Conflict resolution complexity
3. Leaderless (Dynamo-style):
- No single leader, quorum-based reads/writes
- Quorum rule: W + R > N (W=write quorum, R=read quorum, N=replicas)
- Example: N=5, W=3, R=2 → Strong consistency (overlap guaranteed)
- Use for: Maximum availability, partition tolerance
- Trade-off: Complexity, read repair needed
Partitioning Strategies
Hash Partitioning (Consistent Hashing):
- Key → Hash(Key) → Partition assignment
- Even distribution, minimal rebalancing when nodes added/removed
- Use for: Point queries by ID, even distribution critical
- Examples: Cassandra, DynamoDB, Redis Cluster
Range Partitioning:
- Key ranges assigned to partitions (A-F, G-M, N-S, T-Z)
- Enables range queries, ordered data
- Risk: Hot spots if data skewed
- Use for: Time-series data, leaderboards, range scans
- Examples: HBase, Bigtable
Geographic Partitioning:
- Partition by location (US-East, EU-West, APAC)
- Use for: Data locality, GDPR compliance, low latency
- Examples: Spanner, Cosmos DB
Resilience Patterns
Circuit Breaker:
[Closed] → Normal operation
│ (failures exceed threshold)
▼
[Open] → Fail fast (don't call failing service)
│ (timeout expires)
▼
[Half-Open] → Try single request
│ success → [Closed]
│ failure → [Open]
- Prevents cascading failures
- Fast-fail instead of waiting for timeout
- See references/resilience-patterns.md
Bulkhead Isolation:
- Isolate resources (thread pools, connection pools)
- Failure in one partition doesn't affect others
- Like ship compartments preventing total flooding
Timeout and Retry:
- Timeout: Set deadlines, fail fast if exceeded
- Retry: Exponential backoff with jitter
- Idempotency: Ensure safe retry (critical)
Rate Limiting and Backpressure:
- Protect services from overload
- Token bucket, leaky bucket algorithms
- Backpressure: Signal upstream to slow down
Transaction Patterns
Saga Pattern:
- Coordinate distributed transactions across services
- No distributed 2PC (two-phase commit)
Choreography: Services react to events
Order Service → OrderCreated event
Payment Service → listens → PaymentProcessed event
Inventory Service → listens → InventoryReserved event
(Compensating: if payment fails → InventoryReleased event)
Orchestration: Central coordinator
Saga Orchestrator:
1. Call Order Service
2. Call Payment Service
3. Call Inventory Service
(If step fails → call compensating transactions in reverse)
Event Sourcing:
- Store state changes as immutable events
- Rebuild state by replaying events
- Audit trail, time travel, debugging
- Trade-off: Query complexity, snapshot optimization
CQRS (Command Query Responsibility Segregation):
- Separate read and write models
- Write model: Normalized, transactional
- Read model: Denormalized, cached, optimized
- Use for: Different read/write patterns, high read:write ratio (10:1+)
- Often paired with Event Sourcing
Service Discovery
Client-Side Discovery:
- Client queries service registry (Consul, etcd, Eureka)
- Client load balances and calls service directly
- Pro: No proxy overhead
- Con: Client complexity
Server-Side Discovery:
- Client calls load balancer
- Load balancer queries registry and routes
- Pro: Simple clients
- Con: Load balancer single point of failure
Service Mesh:
- Sidecar proxies handle discovery, routing, retry, circuit breaking
- Examples: Istio, Linkerd
- Pro: Decouples communication logic from services
- Con: Operational complexity
Caching Strategies
Cache-Aside (Lazy Loading):
Read:
1. Check cache → hit? return
2. Miss? Query database
3. Store in cache, return
Write-Through:
Write:
1. Write to cache
2. Cache writes to database synchronously
3. Return success
Write-Behind (Write-Back):
Write:
1. Write to cache
2. Return success
3. Cache writes to database asynchronously (batched)
Cache Invalidation:
- TTL (Time-To-Live): Expire after duration
- Event-based: Invalidate on data change
- Manual: Explicit invalidation on update
Decision Frameworks
Choosing Consistency Model
Decisio