Terraform & OpenTofu: Production Infrastructure-as-Code
Write, review, and architect Terraform/OpenTofu infrastructure - from individual resources to multi-account, PCI-compliant platform architectures. The goal is reproducible, drift-free, auditable infrastructure that passes both peer review and QSA assessment.
Target versions (May 2026): Terraform 1.14.9 (IBM/HashiCorp, BSL; 1.15.0-rc2 in progress), OpenTofu 1.11.6 (Linux Foundation, MPL). Helm provider v3.1+, K8s provider v3.0+, AWS provider v6.x, Azure v4.x, GCP v7.x.
This skill covers HCL, modules, operations, state, CI/CD, policy-as-code, audit trails, PCI-DSS 4.0 controls, drift detection, and CDE isolation.
Terraform vs OpenTofu (2026)
IBM acquired HashiCorp for $6.4B (closed Feb 2025). Terraform stays BSL 1.1; OpenTofu is Linux Foundation/MPL.
- Choose Terraform: HCP/TFE, Stacks, or vendor support.
- Choose OpenTofu: client-side state encryption, BSL concerns,
enabled, OCI registries, or Linux Foundation governance. - Shared protocol: most providers still work on both, for now.
- CDKTF: deprecated Dec 2025 and archived; migrate to HCL or AWS CDK.
When to use
- Writing or reviewing Terraform/OpenTofu configurations
- Designing module architecture or registry patterns
- Planning state management, backend strategy, or migration
- Setting up CI/CD pipelines for IaC (plan/apply workflows)
- Implementing policy-as-code gates (Checkov, OPA, Sentinel)
- PCI-DSS 4.0 compliance for infrastructure provisioning
- Multi-account/multi-cloud architecture with blast radius controls
- Reviewing AI-generated Terraform for security and correctness
When NOT to use
- Kubernetes manifests or Helm charts (use kubernetes)
- Read-only Kubernetes cluster health checks after provisioning or maintenance (use cluster-health)
- Ansible playbooks or configuration management (use ansible)
- Docker/container optimization (use docker)
- CI/CD pipeline design (use ci-cd)
- Database engine configuration, schema design, or migrations (use databases)
- Security auditing application code (use security-audit)
AI Self-Check
AI tools consistently produce the same Terraform mistakes. Before returning any generated HCL, verify against this list:
- No hardcoded values - regions, AMI IDs, CIDR blocks, account IDs must be variables
- No overly permissive IAM - no
"Action": "*"or"Resource": "*"unless explicitly requested - No
0.0.0.0/0ingress on security groups (except port 443 for public ALBs, justified) - S3 buckets:
aws_s3_bucket_public_access_blockwith all four settingstrue(unless public access is explicitly required and justified), plus SSE-KMS encryption (aws_s3_bucket_server_side_encryption_configuration), versioning enabled, access logging (aws_s3_bucket_logging), and no overly permissive bucket policy (reviewaws_s3_bucket_policyfor broadPrincipal: "*"grants) - Provider versions pinned in
required_providerswith~>constraints - Backend config present (not local) with encryption and locking
-
lifecycleblocks where needed (create_before_destroy,prevent_destroyon stateful resources) -
sensitive = trueon variables/outputs containing secrets - Tags on every taggable resource (at minimum: Name, Environment, Owner, pci_scope if applicable)
- No deprecated resource arguments (check provider changelog - AI trains on old syntax)
- No
provisionerblocks - use Ansible or user_data instead - State file does NOT contain plaintext secrets (use ephemeral resources on TF 1.10+ or data sources for runtime secret lookup)
-
terraform fmtandterraform validatepass
AI should never own terraform apply. In March 2026, an AI-assisted Terraform workflow deleted production infrastructure through escalating cleanup logic. Plan output is reviewed by a human. Always.
- Current source checked: dated versions, CLI flags, API names, and support windows are verified against primary docs before repeating them
- Hidden state identified: local config, credentials, caches, contexts, branches, cluster targets, or previous runs are made explicit before acting
- Verification is real: final checks exercise the actual runtime, parser, service, or integration point instead of only linting prose or happy paths
- Routing overlap checked: overlapping skills, trigger terms, and "When NOT to use" boundaries are checked before returning guidance
- Spec claims verified: claims about tool behavior, output contracts, or repo conventions are checked against current docs, scripts, or skill files
- Provider docs checked: resource arguments, defaults, and deprecations match pinned provider versions
- State impact reviewed: imports, moves, destroys, and replacements are visible in plan output before apply
Performance
- Scope plans to changed stacks/modules during iteration, then run full plans before merge.
- Use remote state and data sources sparingly; excessive cross-stack reads slow plans and create hidden coupling.
- Cache providers in CI and pin versions to avoid repeated downloads and surprise upgrades.
Best Practices
- Never let automation apply production plans without a reviewed plan artifact and human approval.
- Use
movedblocks for refactors instead of delete/recreate churn. - Protect stateful resources with backups,
prevent_destroy, and explicit migration steps.
Workflow
Step 1: Determine the domain
Based on the request:
- "Create a VPC/RDS/EC2/resource" -> HCL
- "Create a reusable module" -> Modules
- "Set up state backend" / "migrate state" -> Operations
- "Make this PCI compliant" / "policy gates" -> Compliance
- "Review this Terraform" -> Apply production checklist + critical rules + AI self-check
- "Review S3 buckets" -> S3 hardening review (see below) + AI self-check
Step 2: Gather requirements
Before writing HCL, determine:
- Cloud provider(s) and account/project structure
- Resource type and its dependencies
- Environment (dev/staging/prod) and promotion strategy
- State backend and locking mechanism
- Compliance scope: PCI CDE? Regulated? What tags/policies apply?
- Existing modules: reuse before creating new ones
- Secrets: how are they injected? (Vault, SSM, Secrets Manager - never tfvars)
Step 3: Build
Follow the domain-specific section below. Always terraform fmt + terraform validate + run Checkov before finishing.
Step 4: Validate
terraform fmt -check -recursive # Format check
terraform validate # Syntax + provider validation
tflint --recursive # Provider-specific linting
checkov -d . --framework terraform # Security/compliance scan
terraform plan -out=plan.tfplan # Review the plan
terraform show -json plan.tfplan | conftest test - # Policy-as-code gate (OPA)
HCL Patterns
Resource structure
resource "aws_instance" "web" {
ami = var.ami_id
instance_type = var.instance_type
subnet_id = var.private_subnet_id
root_block_device {
encrypted = true
kms_key_id = var.kms_key_arn
volume_size = 20
}
metadata_options {
http_endpoint = "enabled"
http_tokens = "required" # IMDSv2 - enforce this always
}
tags = merge(var.common_tags, {
Name = "${var.project}-web-${var.environment}"
})
lifecycle {
create_before_destroy = true
}
}
Key patterns
Variables: type them. Default non-sensitive ones. Mark secrets sensitive. Use validation blocks for constraints.
variable "environment" {
type = string
description = "Deployment environment"
validation {
condition = contains(["dev", "staging", "prod"], var.environment)
error_message = "Must be dev, staging, or prod."
}
}
variable "db_password" {
type