Network Architecture
Design secure, scalable cloud network architectures using proven patterns across AWS, GCP, and Azure. This skill provides decision frameworks for VPC design, subnet strategy, zero trust implementation, and hybrid connectivity.
When to Use This Skill
Invoke this skill when:
- Designing VPC/VNet topology for new cloud environments
- Implementing network segmentation and security controls
- Planning multi-VPC or multi-cloud connectivity
- Establishing hybrid cloud connectivity (on-premises to cloud)
- Migrating from flat network to sophisticated architecture
- Implementing zero trust network principles
- Optimizing network costs and performance
Core Network Architecture Patterns
Pattern 1: Flat (Single VPC) Architecture
Use When: Small applications, single environment, simple security requirements, team < 10 engineers
Characteristics:
- All resources in one VPC with subnet-level segmentation
- Public, private, and database subnet tiers
- Simplest to understand and manage
- No inter-VPC routing complexity
Tradeoffs:
- ✓ Lowest cost, fastest to set up
- ✗ Poor isolation, difficult to scale, entire VPC is blast radius
Pattern 2: Multi-VPC (Isolated) Architecture
Use When: Multiple environments (dev/staging/prod), strong isolation requirements, compliance mandates separation
Characteristics:
- Separate VPCs per environment or workload
- No direct connectivity without explicit setup
- Independent CIDR ranges
Tradeoffs:
- ✓ Strong blast radius containment, clear security boundaries
- ✗ Management overhead, duplicate infrastructure, higher costs
Pattern 3: Hub-and-Spoke (Transit Gateway) Architecture
Use When: 5+ VPCs need communication, centralized security inspection required, hybrid connectivity, multi-account setup
Characteristics:
- Central hub VPC/Transit Gateway
- Spoke VPCs connect to hub
- All inter-VPC traffic routes through hub
Tradeoffs:
- ✓ Simplified routing, centralized security, scales easily (100+ VPCs)
- ✗ Transit Gateway costs (~$0.05/hour + $0.02/GB), increased latency (hub hop)
Pattern 4: Full Mesh (VPC Peering) Architecture
Use When: Small number of VPCs (< 5), low latency critical, no centralized inspection needed
Characteristics:
- Every VPC directly connected via peering
- Direct VPC-to-VPC communication
Tradeoffs:
- ✓ Lowest latency, no Transit Gateway costs
- ✗ Management complexity scales as O(n²), doesn't scale beyond ~10 VPCs
Pattern 5: Hybrid (Multi-Pattern) Architecture
Use When: Large enterprise with diverse requirements, balancing cost/performance/security
Characteristics:
- Hub-spoke for most VPCs + direct peering for latency-sensitive pairs
- Combination based on workload requirements
Tradeoffs:
- ✓ Optimized for specific needs
- ✗ More complex to design and manage
Pattern Selection Framework
Number of VPCs?
│
├─► 1 VPC → Flat (Single VPC)
├─► 2-4 VPCs + No inter-VPC communication → Multi-VPC (Isolated)
├─► 2-5 VPCs + Low latency critical → Full Mesh (VPC Peering)
├─► 5+ VPCs + Centralized inspection → Hub-and-Spoke (Transit Gateway)
└─► 10+ VPCs + Mixed requirements → Hybrid (Multi-Pattern)
Additional Considerations:
├─► Hybrid connectivity required? → Hub-and-Spoke preferred
├─► Centralized egress/inspection? → Hub-and-Spoke with Inspection VPC
├─► Multi-account environment? → Hub-and-Spoke with AWS RAM sharing
└─► Cost optimization priority? → Flat or Multi-VPC (avoid TGW fees)
Subnet Strategy
Standard Three-Tier Design
Public Subnets:
- Route to Internet Gateway
- Use for load balancers, bastion hosts, NAT Gateways
- CIDR: /24 to /27 (256 to 32 IPs)
Private Subnets:
- Route to NAT Gateway for outbound
- Use for application servers, containers, compute workloads
- CIDR: /20 to /22 (4,096 to 1,024 IPs)
Database Subnets:
- No direct internet route
- Use for RDS, ElastiCache, managed databases
- CIDR: /24 to /26 (256 to 64 IPs)
Multi-AZ Distribution
Production: Distribute each tier across 3 Availability Zones minimum Dev/Test: 1-2 AZs acceptable for cost savings
CIDR Block Planning
VPC Sizing:
- /16 (65,536 IPs) - Large production environments
- /20 (4,096 IPs) - Medium environments
- /24 (256 IPs) - Small/dev environments
Critical Rules:
- Non-overlapping CIDR ranges across VPCs
- Coordinate with on-premises network team for hybrid connectivity
- Reserve address space for future expansion
For detailed subnet planning, see references/subnet-strategy.md
NAT Gateway Strategy
Decision Framework
Cost vs Resilience?
│
├─► Cost Priority (Dev/Test)
│ └─► Single NAT Gateway (~$32/month)
│ └─► Risk: Single point of failure
│
├─► Balanced (Most Production)
│ └─► One NAT Gateway per AZ (~$96/month for 3 AZs)
│ └─► Resilience: AZ failure doesn't break connectivity
│
└─► Maximum Resilience
└─► Multiple NAT Gateways per AZ + monitoring
└─► Critical workloads, SLA-dependent
Alternative: Centralized Egress Pattern
└─► Hub-and-Spoke: Single egress VPC with NAT
└─► Reduces NAT Gateway count, centralized logging
No Outbound Internet Needed?
- Skip NAT Gateway entirely (cost savings)
- Use VPC Endpoints for AWS service access
Security Controls
Security Groups (Recommended)
Characteristics:
- Stateful (return traffic auto-allowed)
- Instance-level control
- Allow rules only (implicit deny)
- Can reference other security groups
Use For:
- Service-to-service communication
- Instance-level security
- Most common use case
Best Practices:
- Use descriptive names (app-alb-sg, app-backend-sg)
- Reference other security groups instead of CIDR blocks
- Keep rules minimal and specific
Network ACLs (Optional)
Characteristics:
- Stateless (must allow both request and response)
- Subnet-level control
- Allow and deny rules
- Processes rules in order (lowest number first)
Use For:
- Explicit deny rules (block specific IPs)
- Compliance requirements (defense in depth)
- Additional layer beyond security groups
Best Practices:
- Use sparingly (complex to manage)
- Remember to allow ephemeral ports (1024-65535)
- Test thoroughly (stateless nature causes issues)
For security group architecture patterns, see references/security-controls.md
Zero Trust Principles
Core Tenets
-
Never Trust, Always Verify
- Authenticate every request regardless of source
- No implicit trust based on network location
-
Least Privilege Access
- Grant minimum necessary permissions
- Time-bound access (just-in-time)
-
Assume Breach
- Segment network aggressively
- Monitor all traffic
- Rapid detection and response
Implementation Patterns
Microsegmentation:
- Isolate every workload with granular security group rules
- Service-to-service communication only between specific services
- Reduce blast radius
Identity-Based Access:
- Use IAM roles instead of IP addresses for authorization
- VPC Endpoints with IAM policies
- Service-to-service identity verification
Continuous Verification:
- VPC Flow Logs for traffic analysis
- Monitor rejected connections
- Alert on anomalies
For zero trust architecture patterns, see references/zero-trust-networking.md
Hybrid Connectivity
VPN (Virtual Private Network)
Use When: Dev/test environments, backup connectivity, temporary connections
Characteristics:
- Encrypted tunnel over public internet
- Throughput: ~1.25 Gbps per tunnel
- Latency: Variable (internet-dependent)
- Cost: Low (~$0.05/hour + data transfer)
- Setup: Quick (no contracts)
Direct Connect / ExpressRoute / Cloud Interconnect
Use When: Production workloads, large data transfers, real-time applications
Characteristics:
- Dedicated network connection (bypasses public internet)
- Throughput: Up to 100 Gbps
- Latency: Low and consistent
- Cost: Higher (port fees + data transfer)
- Setup: Slower