Infrastructure as Code
Provision and manage cloud infrastructure using code-based automation tools. This skill covers tool selection, state management, module design, and operational patterns across Terraform/OpenTofu, Pulumi, and AWS CDK.
When to Use
Use this skill when:
- Provisioning cloud infrastructure (compute, networking, databases, storage)
- Migrating from manual infrastructure to code-based workflows
- Designing reusable infrastructure modules
- Implementing multi-cloud or hybrid-cloud deployments
- Establishing state management and drift detection patterns
- Integrating infrastructure provisioning into CI/CD pipelines
- Evaluating IaC tools (Terraform vs Pulumi vs CDK)
Common requests:
- "Create a Terraform module for VPC provisioning"
- "Set up remote state with locking for team collaboration"
- "Compare Pulumi vs Terraform for our use case"
- "Design composable infrastructure modules"
- "Implement drift detection for existing infrastructure"
Core Concepts
Infrastructure as Code Fundamentals
Key Principles:
- Declarative vs Imperative - Describe desired state (Terraform) or program infrastructure (Pulumi)
- Idempotency - Same input produces same output, safe to re-run
- Version Control - Infrastructure changes tracked in Git
- State Management - Track actual infrastructure state
- Module Composition - Reusable, versioned infrastructure components
Benefits:
- Reproducibility (same code = same infrastructure)
- Auditability (Git history shows all changes)
- Collaboration (code reviews for infrastructure changes)
- Automation (CI/CD deploys infrastructure)
- Disaster recovery (rebuild from code)
Tool Selection Framework
Choose IaC tools based on team composition and cloud strategy:
Terraform/OpenTofu - Declarative, HCL-based
- Multi-cloud and hybrid-cloud deployments
- Operations/SRE teams prefer declarative approach
- Largest provider ecosystem (AWS, GCP, Azure, 3000+ providers)
- Mature module registry and community
Pulumi - Imperative, programming language-based
- Developer-centric teams familiar with TypeScript/Python/Go
- Complex logic requires programming constructs (loops, conditionals, functions)
- Native unit testing using familiar test frameworks
- Strong typing and IDE support
AWS CDK - AWS-native, programming language-based
- AWS-only infrastructure
- Tight integration with AWS services
- L1/L2/L3 construct abstractions
- CloudFormation under the hood
Decision Tree:
Multi-cloud required?
├─ YES → Team composition?
│ ├─ Ops/SRE focused → Terraform/OpenTofu
│ └─ Developer focused → Pulumi
└─ NO → AWS only?
├─ YES → Language preference?
│ ├─ HCL/declarative → Terraform
│ ├─ TypeScript/Python → AWS CDK
│ └─ YAML/simple → CloudFormation
└─ NO → GCP/Azure only?
└─ Terraform or Pulumi
State Management Architecture
Remote state with locking enables team collaboration:
Backend Selection:
| Cloud Provider | Recommended Backend | Locking Mechanism |
|---|---|---|
| AWS | S3 + DynamoDB | DynamoDB table |
| GCP | Google Cloud Storage | Native |
| Azure | Azure Blob Storage | Lease-based |
| Multi-cloud | Terraform Cloud/Enterprise | Built-in |
| Pulumi | Pulumi Service | Built-in |
State Isolation Strategies:
-
Directory Separation (recommended for most teams)
- Separate directories per environment (
prod/,staging/,dev/) - Complete state file isolation
- No risk of cross-environment contamination
- Separate directories per environment (
-
Workspaces
- Single codebase, multiple environments
- Shared state backend, environment namespacing
- Risk: accidental cross-environment operations
-
Layered Architecture
- Separate state files for networking, compute, data layers
- Blast radius reduction
- Cross-layer references via remote state data sources
Critical State Management Rules:
- Always use remote state for team environments
- Enable state file encryption at rest
- Enable versioning on state storage
- Use state locking to prevent concurrent modifications
- Never commit state files to Git
- Mark sensitive outputs as
sensitive = true
Module Design Patterns
Composable Module Structure:
modules/
├── vpc/ # Network foundation
├── security-group/ # Reusable security group patterns
├── rds/ # Database with backups, encryption
├── ecs-cluster/ # Container orchestration base
├── ecs-service/ # Individual microservice
└── alb/ # Application load balancer
Module Versioning:
- Pin module versions in production (
version = "5.1.0") - Use semantic versioning for internal modules
- Test module updates in non-prod first
- Maintain CHANGELOG for module releases
Module Design Principles:
- Clear input contract (required vs optional variables)
- Documented outputs (what consumers can reference)
- Sane defaults where possible
- Validation rules for inputs
- Examples directory showing usage
When to Create a Module:
- Resource group is reused 3+ times
- Clear boundaries and responsibilities
- Stable interface contract
- Team has module maintenance capacity
When to Keep Monolithic:
- One-off infrastructure
- Rapid prototyping phase
- High coupling between resources
- Small team, simple infrastructure
Quick Reference
Terraform/OpenTofu Commands
# Initialize providers and backend
terraform init
# Plan changes (preview)
terraform plan
# Apply changes
terraform apply
# Destroy infrastructure
terraform destroy
# Format HCL files
terraform fmt
# Validate syntax
terraform validate
# Show state
terraform state list
terraform state show <resource>
# Import existing resources
terraform import <resource.name> <id>
# Workspace management
terraform workspace list
terraform workspace new staging
terraform workspace select prod
Pulumi Commands
# Initialize new project
pulumi new aws-typescript
# Preview changes
pulumi preview
# Apply changes
pulumi up
# Destroy infrastructure
pulumi destroy
# Show stack outputs
pulumi stack output
# Manage stacks
pulumi stack ls
pulumi stack select prod
# Import existing resources
pulumi import <type> <name> <id>
# Export/import state
pulumi stack export > state.json
pulumi stack import < state.json
AWS CDK Commands
# Initialize new app
cdk init app --language typescript
# Synthesize CloudFormation
cdk synth
# Preview changes
cdk diff
# Deploy stack
cdk deploy
# Destroy stack
cdk destroy
# Bootstrap account/region
cdk bootstrap
# List stacks
cdk list
Common Patterns Checklist
Infrastructure Provisioning:
- Remote state configured with locking
- State file encryption enabled
- Provider versions pinned
- Module versions pinned (production)
- Variables have descriptions and types
- Sensitive outputs marked as sensitive
- Tagging strategy implemented
- Cost allocation tags applied
Module Development:
- Clear README with usage examples
- Required vs optional variables documented
- Outputs documented with descriptions
- Validation rules for critical inputs
- Examples directory with working code
- Tests for module behavior (Terratest/CDK assertions)
- CHANGELOG for version tracking
- Semantic versioning followed
Operational Readiness:
- Drift detection scheduled
- CI/CD pipeline for plan/apply
- State backup strategy
- Disaster recovery documented
- Team access controls configured (IAM/RBAC)
- Cost estimation integrated (Infracost)
- Security scanning integrated (Checkov/tfsec)
- Documentation kept current
Detailed Documentation
For comprehensive patterns and implementation details:
Tool-Specific Patterns:
references/terraform-patterns.md- Terraform/OpenTofu best practices, HCL patternsreferences/pulumi-patterns.md- Pulumi across TypeScript/Python/Go
**Architec