Cost Optimization

Purpose

Cloud cost optimization transforms uncontrolled spending into strategic resource allocation through the FinOps lifecycle: Inform, Optimize, and Operate. This skill provides decision frameworks for commitment-based discounts (Reserved Instances, Savings Plans), right-sizing strategies, Kubernetes cost management, and automated cost governance across multi-cloud environments.

When to Use This Skill

Invoke cost-optimization when:

Reducing cloud spend by 15-40% through systematic optimization
Implementing cost visibility dashboards and allocation tracking
Establishing budget alerts and anomaly detection
Optimizing Kubernetes resource requests and cluster efficiency
Managing Reserved Instances, Savings Plans, or Committed Use Discounts
Automating idle resource cleanup and right-sizing recommendations
Setting up showback/chargeback models for internal teams
Preventing cost overruns through CI/CD cost estimation (Infracost)
Responding to finance team requests for cloud cost reduction

FinOps Principles

The FinOps Lifecycle

┌─────────────────────────────────────────────────────┐
│  INFORM → OPTIMIZE → OPERATE (continuous loop)      │
│    ↓         ↓           ↓                          │
│ Visibility  Action   Automation                     │
└─────────────────────────────────────────────────────┘

Inform Phase: Establish cost visibility

Enable cost allocation tags (Owner, Project, Environment)
Deploy real-time cost dashboards for engineering teams
Integrate cloud billing data (AWS CUR, Azure Consumption API, GCP BigQuery)
Set up Kubernetes cost monitoring (Kubecost, OpenCost)

Optimize Phase: Take action on cost drivers

Purchase commitment-based discounts (40-72% savings)
Right-size over-provisioned resources (target 60-80% utilization)
Implement spot/preemptible instances for fault-tolerant workloads
Clean up idle resources (unattached volumes, old snapshots)

Operate Phase: Automate and govern

Budget alerts with cascading notifications (50%, 75%, 90%, 100%)
Automated cleanup scripts for idle resources
CI/CD cost estimation to prevent surprise increases
Continuous monitoring with anomaly detection

Core FinOps Principles

Collaboration: Cross-functional teams (finance, engineering, operations, product)
Accountability: Teams own the cost of their services
Transparency: All costs visible and understandable to stakeholders
Optimization: Continuous improvement of cost efficiency

For detailed FinOps maturity models and organizational structures, see references/finops-foundations.md.

Cost Optimization Strategies

1. Commitment-Based Discounts

Reserved Instances (RIs): 40-72% discount for 1-3 year commitments

Standard RI: Instance type locked, highest discount (60% for 3-year)
Convertible RI: Flexible instance types, moderate discount (54% for 3-year)
Use for: Databases (RDS, ElastiCache), stable production EC2 workloads

Savings Plans: Flexible compute commitments

Compute Savings Plans: Applies to EC2, Fargate, Lambda (54% discount for 3-year)
EC2 Instance Savings Plans: Tied to instance family (66% discount for 3-year)
Use for: Workloads that change instance types or regions

GCP Committed Use Discounts (CUDs): 25-70% discount

Resource-based CUDs: Commit to vCPU, memory, GPUs
Spend-based CUDs: Commit to dollar amount (flexible)
Sustained Use Discounts: Automatic 20-30% discount for sustained usage (no commitment)

Decision Framework:

Reserve when:
├─ Workload is production-critical (24/7 uptime required)
├─ Usage is predictable (stable baseline over 6+ months)
├─ Architecture is stable (unlikely to change instance types)
└─ Financial commitment acceptable (1-3 year lock-in)

Use On-Demand when:
├─ Development/testing environments
├─ Unpredictable spiky workloads
├─ Short-term projects (<6 months)
└─ Evaluating new instance types

For detailed commitment strategies and RI coverage analysis, see references/commitment-strategies.md.

2. Spot and Preemptible Instances

Discount: 70-90% off on-demand pricing (interruptible with 2-minute warning)

Use Spot For: CI/CD workers, batch jobs, ML training (with checkpointing), Kubernetes workers, data analytics Avoid Spot For: Stateful databases, real-time services, long-running jobs without checkpointing

Best Practices:

Diversify instance types and spread across Availability Zones
Implement graceful shutdown handlers
Auto-fallback to on-demand when capacity unavailable
Kubernetes: Mix 70% spot + 30% on-demand nodes with taints/tolerations

3. Right-Sizing Strategies

Target Utilization: 60-80% average (leave headroom for spikes)

Compute Right-Sizing:

Analyze actual CPU/memory utilization over 30+ days
Downsize instances with <40% average utilization
Consolidate underutilized workloads
Switch instance families (compute-optimized vs. memory-optimized)

Database Right-Sizing:

Analyze connection pool usage (max connections vs. allocated)
Downgrade storage IOPS if utilization <50%
Evaluate read replica necessity (can caching replace it?)
Consider serverless options (Aurora Serverless, Azure SQL Serverless)

Kubernetes Right-Sizing:

Set requests = average usage (not peak)
Set limits = 2-3x requests (allow bursting)
Use Vertical Pod Autoscaler (VPA) for automated recommendations
Identify pods with 0% CPU usage (candidates for consolidation)

Storage Right-Sizing:

Delete unattached volumes (EBS, Azure Disks, GCP Persistent Disks)
Delete old snapshots (>90 days, retention policy not required)
Implement lifecycle policies (S3 Intelligent-Tiering, Azure Blob Lifecycle)
Compress/deduplicate data

Right-Sizing Tools:

AWS Compute Optimizer: ML-based EC2, Lambda, EBS recommendations
Azure Advisor: VM rightsizing, reserved instance advice
GCP Recommender: VM, disk, commitment recommendations
VPA (Vertical Pod Autoscaler): Automated container resource requests

4. Kubernetes Cost Management

Resource Requests and Limits:

# Set requests = average usage (enables efficient bin-packing)
resources:
  requests:
    cpu: 500m        # 0.5 CPU cores (average usage)
    memory: 1Gi      # 1 GiB memory (average usage)
  limits:
    cpu: 1500m       # 1.5 CPU cores (3x requests, allows bursting)
    memory: 3Gi      # 3 GiB memory (3x requests)

Namespace Quotas: Prevent runaway resource consumption

ResourceQuota: Limit total CPU/memory per namespace
LimitRange: Default/max requests per pod
PriorityClass: Ensure critical pods get resources

Cluster Autoscaling:

Scale down idle nodes to reduce costs
Scale-to-zero for dev clusters during off-hours
Use multiple node pools (spot + on-demand mix)
Set max node limits to prevent overspend

Cost Visibility:

Deploy Kubecost or OpenCost for namespace-level cost tracking
Allocate costs by labels (team, project, environment)
Track idle cost (cluster capacity not allocated to workloads)
Generate showback/chargeback reports

For detailed Kubernetes cost optimization patterns, see references/kubernetes-cost-optimization.md.

Cost Visibility and Monitoring

Tagging for Cost Allocation

Required Tags:

Owner or Team - Responsible team/department
Project or Application - Business unit or application name
Environment - prod, staging, dev, test
CostCenter - Finance cost center code

Enable Cost Allocation Tags:

AWS: Activate tags in Cost Allocation Tags console
Azure: Apply tags via Azure Policy enforcement
GCP: Use labels on all resources, export to BigQuery

For comprehensive tagging strategies, see references/tagging-for-cost-allocation.md.

Monitoring and Dashboards

Native Cloud Tools:

AWS Cost Explorer: Analyze spending patterns, forecast costs

optimizing-costs

How to add

Drop this on your repo README

Related skills

internal-comms

babysit

do

smart-explore

Get new DevOps e Infra skills every Monday