AWS CloudFormation CloudWatch Monitoring
Overview
Creates CloudWatch monitoring infrastructure using CloudFormation templates: metrics, alarms, dashboards, log groups, anomaly detection, synthesized canaries, and Application Signals.
When to Use
- Creating CloudWatch metrics and alarms for production infrastructure
- Building CloudWatch dashboards for multi-region visualization
- Implementing log groups with retention, encryption, and metric filters
- Configuring anomaly detection and composite alarms
- Setting up cross-stack references with Parameters and Outputs
- Validating and deploying monitoring stacks with CloudFormation
Instructions
Follow these steps to create CloudWatch monitoring infrastructure with CloudFormation:
1. Define Alarm Parameters
Specify metric namespaces, dimensions, and threshold values:
Parameters:
ErrorRateThreshold:
Type: Number
Default: 5
Description: Error rate threshold for alarms (percentage)
LatencyThreshold:
Type: Number
Default: 1000
Description: Latency threshold in milliseconds
CpuUtilizationThreshold:
Type: Number
Default: 80
Description: CPU utilization threshold (percentage)
LogRetentionDays:
Type: Number
Default: 30
AllowedValues:
- 1
- 3
- 7
- 14
- 30
- 60
- 90
- 120
- 365
Description: Number of days to retain log events
2. Create CloudWatch Alarms
Set up alarms for CPU, memory, disk, and custom metrics:
Resources:
HighCpuAlarm:
Type: AWS::CloudWatch::Alarm
Properties:
AlarmName: !Sub "${AWS::StackName}-high-cpu"
AlarmDescription: Trigger when CPU utilization exceeds threshold
MetricName: CPUUtilization
Namespace: AWS/EC2
Dimensions:
- Name: InstanceId
Value: !Ref InstanceId
Statistic: Average
Period: 60
EvaluationPeriods: 3
Threshold: !Ref CpuUtilizationThreshold
ComparisonOperator: GreaterThanThreshold
AlarmActions:
- !Ref AlarmTopic
ErrorRateAlarm:
Type: AWS::CloudWatch::Alarm
Properties:
AlarmName: !Sub "${AWS::StackName}-error-rate"
MetricName: ErrorRate
Namespace: !Ref CustomNamespace
Dimensions:
- Name: Service
Value: !Ref ServiceName
Statistic: Average
Period: 60
EvaluationPeriods: 5
Threshold: !Ref ErrorRateThreshold
ComparisonOperator: GreaterThanThreshold
3. Configure Alarm Actions
Define SNS topics for notification delivery:
Resources:
AlarmNotificationTopic:
Type: AWS::SNS::Topic
Properties:
DisplayName: !Sub "${AWS::StackName}-alarms"
TopicName: !Sub "${AWS::StackName}-alarms"
AlarmTopicPolicy:
Type: AWS::SNS::TopicPolicy
Properties:
PolicyDocument:
Statement:
- Effect: Allow
Principal:
Service: cloudwatch.amazonaws.com
Action: sns:Publish
Resource: !Ref AlarmNotificationTopic
Topics:
- !Ref AlarmNotificationTopic
4. Create Dashboards
Build visualization widgets for metrics across resources:
Resources:
MonitoringDashboard:
Type: AWS::CloudWatch::Dashboard
Properties:
DashboardName: !Sub "${AWS::StackName}-dashboard"
DashboardBody: !Sub |
{
"widgets": [
{
"type": "metric",
"x": 0,
"y": 0,
"width": 12,
"height": 6,
"properties": {
"title": "CPU Utilization",
"metrics": [["AWS/EC2", "CPUUtilization", "InstanceId", "${InstanceId}"]],
"period": 300,
"stat": "Average",
"region": "${AWS::Region}"
}
}
]
}
5. Set Up Log Groups
Configure retention policies and encryption settings:
Resources:
ApplicationLogGroup:
Type: AWS::Logs::LogGroup
Properties:
LogGroupName: !Sub "/aws/applications/${Environment}/${ApplicationName}"
RetentionInDays: !Ref LogRetentionDays
KmsKeyId: !Ref LogEncryptionKey
6. Implement Metric Filters
Create metrics from log data:
Resources:
ErrorMetricFilter:
Type: AWS::Logs::MetricFilter
Properties:
LogGroupName: !Ref ApplicationLogGroup
FilterPattern: '[level="ERROR", msg]'
MetricTransformations:
- MetricValue: "1"
MetricNamespace: !Sub "${AWS::StackName}/Application"
MetricName: ErrorCount
7. Add Composite Alarms
Build multi-condition alarm logic:
Resources:
SystemHealthComposite:
Type: AWS::CloudWatch::CompositeAlarm
Properties:
AlarmName: !Sub "${AWS::StackName}-system-health"
AlarmRule: !Or
- !Ref HighCpuAlarm
- !Ref ErrorRateAlarm
AlarmActions:
- !Ref AlarmTopic
8. Configure Log Insights Queries
Create saved queries for log analysis:
Resources:
ErrorAnalysisQuery:
Type: AWS::Logs::QueryDefinition
Properties:
Name: !Sub "${AWS::StackName}-errors"
LogGroupNames:
- !Ref ApplicationLogGroup
QueryString: |
fields @timestamp, @message
| filter @message like /ERROR/
| sort @timestamp desc
| limit 100
9. Validate Template
Before deploying, validate the CloudFormation template:
aws cloudformation validate-template --template-body file://template.yaml
For parameterized templates, test with sample values:
aws cloudformation validate-template \
--template-body file://monitoring.yaml \
--capabilities CAPABILITY_IAM
10. Deploy and Verify
Deploy the stack and verify resources:
# Deploy stack
aws cloudformation create-stack \
--stack-name my-monitoring-stack \
--template-body file://monitoring.yaml \
--parameters file://parameters.json \
--capabilities CAPABILITY_IAM
# Wait for completion
aws cloudformation wait stack-create-complete \
--stack-name my-monitoring-stack
# Verify alarms are in OK state
aws cloudwatch describe-alarms --stack-name my-monitoring-stack
# Check dashboard accessibility
aws cloudwatch get-dashboard --dashboard-name my-monitoring-stack-dashboard
Test alarm actions before production:
# Set alarm to testing state
aws cloudwatch set-alarm-state \
--alarm-name my-alarm \
--state-value ALARM \
--state-reason "Testing alarm action"
Best Practices
- Use composite alarms to reduce noise and avoid false positives
- Set meaningful thresholds based on baseline metrics
- Configure appropriate evaluation periods (3-5 datapoints)
- Enable anomaly detection for metrics with variable patterns
- Use metric math for derived metrics (error rates, latency percentiles)
- Set appropriate log retention based on compliance needs
- Encrypt log groups with sensitive data using KMS
- Test alarm actions with
set-alarm-statebefore production
Examples
Example 1: Complete Monitoring Stack
AWSTemplateFormatVersion: '2010-09-09'
Description: Complete CloudWatch monitoring setup
Parameters:
Environment:
Type: String
Default: prod
AllowedValues: [dev, staging, prod]
Resources:
# SNS Topic for notifications
AlarmTopic:
Type: AWS::SNS::Topic
Properties:
DisplayName: !Sub "${Environment}-alarms"
# Alarm for high CPU
CpuAlarm:
Type: AWS::CloudWatch::Alarm
Properties:
AlarmName: !Sub "${AWS::StackName}-cpu-high"
MetricName: CPUUtilization
Namespace: AWS/EC2
Statistic: Average
Period: 300
EvaluationPeriods: 2
Threshold: 80
ComparisonOperator: GreaterThanThreshold
AlarmActions:
- !Ref AlarmTopic
# Dashboard
Dashboard:
Type: AWS::CloudWatch::Dashboard
Properties:
DashboardName: !Ref AWS::StackName
DashboardBody: !Sub |
{