aws-cloudformation-cloudwatch

AWS CloudFormation CloudWatch Monitoring

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "aws-cloudformation-cloudwatch" with this command: npx skills add giuseppe-trisciuoglio/developer-kit/giuseppe-trisciuoglio-developer-kit-aws-cloudformation-cloudwatch

AWS CloudFormation CloudWatch Monitoring

Overview

Create production-ready monitoring and observability infrastructure using AWS CloudFormation templates. This skill covers CloudWatch metrics, alarms, dashboards, log groups, log insights, anomaly detection, synthesized canaries, Application Signals, and best practices for parameters, outputs, and cross-stack references.

When to Use

Use this skill when:

  • Creating custom CloudWatch metrics

  • Configuring CloudWatch alarms for thresholds and anomaly detection

  • Creating CloudWatch dashboards for multi-region visualization

  • Implementing log groups with retention and encryption

  • Configuring log subscriptions and cross-account log aggregation

  • Implementing synthesized canaries for synthetic monitoring

  • Enabling Application Signals for application performance monitoring

  • Organizing templates with Parameters, Outputs, Mappings, Conditions

  • Implementing cross-stack references with export/import

  • Using Transform for macros and reuse

CloudFormation Template Structure

Base Template with Standard Format

AWSTemplateFormatVersion: 2010-09-09 Description: CloudWatch monitoring and observability stack

Metadata: AWS::CloudFormation::Interface: ParameterGroups: - Label: default: Monitoring Configuration Parameters: - Environment - LogRetentionDays - EnableAnomalyDetection - Label: default: Alarm Thresholds Parameters: - ErrorRateThreshold - LatencyThreshold - CpuUtilizationThreshold

Parameters: Environment: Type: String Default: dev AllowedValues: - dev - staging - production Description: Deployment environment

LogRetentionDays: Type: Number Default: 30 AllowedValues: - 1 - 3 - 5 - 7 - 14 - 30 - 60 - 90 - 120 - 150 - 180 - 365 - 400 - 545 - 731 - 1095 - 1827 - 2190 - 2555 - 2922 - 3285 - 3650 Description: Number of days to retain log events

EnableAnomalyDetection: Type: String Default: false AllowedValues: - true - false Description: Enable CloudWatch anomaly detection

ErrorRateThreshold: Type: Number Default: 5 Description: Error rate threshold for alarms (percentage)

LatencyThreshold: Type: Number Default: 1000 Description: Latency threshold in milliseconds

CpuUtilizationThreshold: Type: Number Default: 80 Description: CPU utilization threshold (percentage)

Mappings: EnvironmentConfig: dev: LogRetentionDays: 7 ErrorRateThreshold: 10 LatencyThreshold: 2000 CpuUtilizationThreshold: 90 staging: LogRetentionDays: 14 ErrorRateThreshold: 5 LatencyThreshold: 1500 CpuUtilizationThreshold: 85 production: LogRetentionDays: 30 ErrorRateThreshold: 1 LatencyThreshold: 500 CpuUtilizationThreshold: 80

Conditions: IsProduction: !Equals [!Ref Environment, production] IsStaging: !Equals [!Ref Environment, staging] EnableAnomaly: !Equals [!Ref EnableAnomalyDetection, true]

Transform:

  • AWS::Serverless-2016-10-31

Resources:

Log Group per applicazione

ApplicationLogGroup: Type: AWS::Logs::LogGroup Properties: LogGroupName: !Sub "/aws/applications/${Environment}/application" RetentionInDays: !Ref LogRetentionDays KmsKeyId: !Ref LogKmsKey Tags: - Key: Environment Value: !Ref Environment - Key: Application Value: !Ref ApplicationName

Outputs: LogGroupName: Description: Name of the application log group Value: !Ref ApplicationLogGroup Export: Name: !Sub "${AWS::StackName}-LogGroupName"

Parameters Best Practices

AWS-Specific Parameter Types

Parameters:

AWS-specific types for validation

CloudWatchNamespace: Type: AWS::CloudWatch::Namespace Description: CloudWatch metric namespace

AlarmActionArn: Type: AWS::SNS::Topic::Arn Description: SNS topic ARN for alarm actions

LogKmsKeyArn: Type: AWS::KMS::Key::Arn Description: KMS key ARN for log encryption

DashboardArn: Type: AWS::CloudWatch::Dashboard::Arn Description: Existing dashboard ARN to import

AnomalyDetectorArn: Type: AWS::CloudWatch::AnomalyDetector::Arn Description: Existing anomaly detector ARN

Parameter Constraints

Parameters: MetricName: Type: String Description: CloudWatch metric name ConstraintDescription: Must be 1-256 characters, alphanumeric, underscore, period, dash MinLength: 1 MaxLength: 256 AllowedPattern: "[a-zA-Z0-9._-]+"

ThresholdValue: Type: Number Description: Alarm threshold value MinValue: 0 MaxValue: 1000000000

EvaluationPeriods: Type: Number Description: Number of evaluation periods Default: 5 MinValue: 1 MaxValue: 100 ConstraintDescription: Must be between 1 and 100

DatapointsToAlarm: Type: Number Description: Datapoints that must breach to trigger alarm Default: 5 MinValue: 1 MaxValue: 10

Period: Type: Number Description: Metric period in seconds Default: 300 AllowedValues: - 10 - 30 - 60 - 300 - 900 - 3600 - 21600 - 86400

ComparisonOperator: Type: String Description: Alarm comparison operator Default: GreaterThanThreshold AllowedValues: - GreaterThanThreshold - GreaterThanOrEqualToThreshold - LessThanThreshold - LessThanOrEqualToThreshold - GreaterThanUpperBound - LessThanLowerBound

SSM Parameter References

Parameters: AlarmTopicArn: Type: AWS::SSM::Parameter::Value<AWS::SNS::Topic::Arn> Default: /monitoring/alarms/topic-arn Description: SNS topic ARN from SSM Parameter Store

DashboardConfig: Type: AWS::SSM::Parameter::Value<String> Default: /monitoring/dashboards/config Description: Dashboard configuration from SSM

Outputs and Cross-Stack References

Export/Import Patterns

Stack A - Monitoring Stack

AWSTemplateFormatVersion: 2010-09-09 Description: Central monitoring infrastructure stack

Resources: AlarmTopic: Type: AWS::SNS::Topic Properties: TopicName: !Sub "${AWS::StackName}-alarms" DisplayName: !Sub "${AWS::StackName} Alarm Notifications"

LogGroup: Type: AWS::Logs::LogGroup Properties: LogGroupName: !Sub "/aws/monitoring/${AWS::StackName}" RetentionInDays: 30

Outputs: AlarmTopicArn: Description: ARN of the alarm SNS topic Value: !Ref AlarmTopic Export: Name: !Sub "${AWS::StackName}-AlarmTopicArn"

LogGroupName: Description: Name of the log group Value: !Ref LogGroup Export: Name: !Sub "${AWS::StackName}-LogGroupName"

LogGroupArn: Description: ARN of the log group Value: !GetAtt LogGroup.Arn Export: Name: !Sub "${AWS::StackName}-LogGroupArn"

Stack B - Application Stack (imports from Monitoring Stack)

AWSTemplateFormatVersion: 2010-09-09 Description: Application stack with monitoring integration

Parameters: MonitoringStackName: Type: String Description: Name of the monitoring stack

Resources: LambdaFunction: Type: AWS::Lambda::Function Properties: FunctionName: !Sub "${AWS::StackName}-processor" Runtime: python3.11 Handler: app.handler Code: S3Bucket: !Ref CodeBucket S3Key: lambda/function.zip Role: !GetAtt LambdaExecutionRole.Arn

ErrorAlarm: Type: AWS::CloudWatch::Alarm Properties: AlarmName: !Sub "${AWS::StackName}-errors" AlarmDescription: Alert on Lambda errors MetricName: Errors Namespace: AWS/Lambda Dimensions: - Name: FunctionName Value: !Ref LambdaFunction Statistic: Sum Period: 60 EvaluationPeriods: 5 Threshold: 1 ComparisonOperator: GreaterThanThreshold AlarmActions: - !ImportValue !Sub "${MonitoringStackName}-AlarmTopicArn"

HighLatencyAlarm: Type: AWS::CloudWatch::Alarm Properties: AlarmName: !Sub "${AWS::StackName}-latency" AlarmDescription: Alert on high latency MetricName: Duration Namespace: AWS/Lambda Dimensions: - Name: FunctionName Value: !Ref LambdaFunction Statistic: P99 Period: 60 EvaluationPeriods: 3 Threshold: 5000 ComparisonOperator: GreaterThanThreshold AlarmActions: - !ImportValue !Sub "${MonitoringStackName}-AlarmTopicArn"

Nested Stacks for Modularity

AWSTemplateFormatVersion: 2010-09-09 Description: Main stack with nested monitoring stacks

Resources:

Nested stack for alarms

AlarmsStack: Type: AWS::CloudFormation::Stack Properties: TemplateURL: https://s3.amazonaws.com/bucket/monitoring/alarms.yaml TimeoutInMinutes: 15 Parameters: Environment: !Ref Environment AlarmTopicArn: !Ref AlarmTopicArn

Nested stack for dashboards

DashboardsStack: Type: AWS::CloudFormation::Stack Properties: TemplateURL: https://s3.amazonaws.com/bucket/monitoring/dashboards.yaml TimeoutInMinutes: 15 Parameters: Environment: !Ref Environment LogGroupNames: !Join [",", [!GetAtt AlarmsStack.Outputs.LogGroupName]]

Nested stack for log insights

LogInsightsStack: Type: AWS::CloudFormation::Stack Properties: TemplateURL: https://s3.amazonaws.com/bucket/monitoring/log-insights.yaml TimeoutInMinutes: 15 Parameters: Environment: !Ref Environment

CloudWatch Metrics and Alarms

Base Metric Alarm

AWSTemplateFormatVersion: 2010-09-09 Description: CloudWatch metric alarms

Resources:

Error rate alarm

ErrorRateAlarm: Type: AWS::CloudWatch::Alarm Properties: AlarmName: !Sub "${AWS::StackName}-error-rate" AlarmDescription: Alert when error rate exceeds threshold MetricName: ErrorRate Namespace: !Ref CustomNamespace Dimensions: - Name: Service Value: !Ref ServiceName - Name: Environment Value: !Ref Environment Statistic: Average Period: 60 EvaluationPeriods: 5 DatapointsToAlarm: 3 Threshold: !Ref ErrorRateThreshold ComparisonOperator: GreaterThanThreshold AlarmActions: - !Ref AlarmTopic InsufficientDataActions: - !Ref AlarmTopic OKActions: - !Ref AlarmTopic Tags: - Key: Environment Value: !Ref Environment - Key: Severity Value: high

P99 latency alarm

LatencyAlarm: Type: AWS::CloudWatch::Alarm Properties: AlarmName: !Sub "${AWS::StackName}-p99-latency" AlarmDescription: Alert when P99 latency exceeds threshold MetricName: Latency Namespace: !Ref CustomNamespace Dimensions: - Name: Service Value: !Ref ServiceName Statistic: p99 ExtendedStatistic: "p99" Period: 60 EvaluationPeriods: 3 Threshold: !Ref LatencyThreshold ComparisonOperator: GreaterThanThreshold TreatMissingData: notBreaching AlarmActions: - !Ref AlarmTopic

4xx errors alarm

ClientErrorAlarm: Type: AWS::CloudWatch::Alarm Properties: AlarmName: !Sub "${AWS::StackName}-4xx-errors" AlarmDescription: Alert on high 4xx error rate MetricName: 4XXError Namespace: AWS/ApiGateway Dimensions: - Name: ApiName Value: !Ref ApiName - Name: Stage Value: !Ref StageName Statistic: Sum Period: 300 EvaluationPeriods: 2 Threshold: 100 ComparisonOperator: GreaterThanThreshold AlarmActions: - !Ref AlarmTopic

5xx errors alarm

ServerErrorAlarm: Type: AWS::CloudWatch::Alarm Properties: AlarmName: !Sub "${AWS::StackName}-5xx-errors" AlarmDescription: Alert on high 5xx error rate MetricName: 5XXError Namespace: AWS/ApiGateway Dimensions: - Name: ApiName Value: !Ref ApiName - Name: Stage Value: !Ref StageName Statistic: Sum Period: 60 EvaluationPeriods: 2 Threshold: 10 ComparisonOperator: GreaterThanThreshold ComparisonOperator: GreaterThanOrEqualToThreshold AlarmActions: - !Ref AlarmTopic

Composite Alarm

AWSTemplateFormatVersion: 2010-09-09 Description: CloudWatch composite alarms

Resources:

Base alarm for Lambda errors

LambdaErrorAlarm: Type: AWS::CloudWatch::Alarm Properties: AlarmName: !Sub "${AWS::StackName}-lambda-errors" MetricName: Errors Namespace: AWS/Lambda Dimensions: - Name: FunctionName Value: !Ref LambdaFunction Statistic: Sum Period: 60 EvaluationPeriods: 5 Threshold: 5 ComparisonOperator: GreaterThanThreshold

Base alarm for Lambda throttles

LambdaThrottleAlarm: Type: AWS::CloudWatch::Alarm Properties: AlarmName: !Sub "${AWS::StackName}-lambda-throttles" MetricName: Throttles Namespace: AWS/Lambda Dimensions: - Name: FunctionName Value: !Ref LambdaFunction Statistic: Sum Period: 60 EvaluationPeriods: 5 Threshold: 3 ComparisonOperator: GreaterThanThreshold

Composite alarm combining both

LambdaHealthCompositeAlarm: Type: AWS::CloudWatch::CompositeAlarm Properties: AlarmName: !Sub "${AWS::StackName}-lambda-health" AlarmDescription: Composite alarm for Lambda function health AlarmRule: !Or - !Ref LambdaErrorAlarm - !Ref LambdaThrottleAlarm ActionsEnabled: true AlarmActions: - !Ref AlarmTopic Tags: - Key: Service Value: lambda - Key: Tier Value: application

Anomaly Detection Alarm

AWSTemplateFormatVersion: 2010-09-09 Description: CloudWatch anomaly detection

Resources:

Anomaly detector for metric

RequestRateAnomalyDetector: Type: AWS::CloudWatch::AnomalyDetector Properties: MetricName: RequestCount Namespace: !Ref CustomNamespace Dimensions: - Name: Service Value: !Ref ServiceName - Name: Environment Value: !Ref Environment Statistic: Sum Configuration: ExcludedTimeRanges: - StartTime: "2023-12-25T00:00:00" EndTime: "2023-12-26T00:00:00" MetricTimeZone: UTC

Alarm based on anomaly detection

AnomalyAlarm: Type: AWS::CloudWatch::Alarm Properties: AlarmName: !Sub "${AWS::StackName}-anomaly-detection" AlarmDescription: Alert on anomalous metric behavior MetricName: RequestCount Namespace: !Ref CustomNamespace Dimensions: - Name: Service Value: !Ref ServiceName AnomalyDetectorConfiguration: ExcludeTimeRange: StartTime: "2023-12-25T00:00:00" EndTime: "2023-12-26T00:00:00" Statistic: Sum Period: 300 EvaluationPeriods: 2 Threshold: 2 ComparisonOperator: GreaterThanUpperThreshold AlarmActions: - !Ref AlarmTopic

Alarm for low anomalous value

LowTrafficAnomalyAlarm: Type: AWS::CloudWatch::Alarm Properties: AlarmName: !Sub "${AWS::StackName}-low-traffic" AlarmDescription: Alert on unusually low traffic MetricName: RequestCount Namespace: !Ref CustomNamespace Dimensions: - Name: Service Value: !Ref ServiceName AnomalyDetectorConfiguration: Bound: Lower Statistic: Sum Period: 300 EvaluationPeriods: 3 Threshold: 0.5 ComparisonOperator: LessThanLowerThreshold AlarmActions: - !Ref AlarmTopic

CloudWatch Dashboards

Dashboard Base

AWSTemplateFormatVersion: 2010-09-09 Description: CloudWatch dashboard

Resources:

Main dashboard

MainDashboard: Type: AWS::CloudWatch::Dashboard Properties: DashboardName: !Sub "${AWS::StackName}-main" DashboardBody: !Sub | { "widgets": [ { "type": "metric", "x": 0, "y": 0, "width": 12, "height": 6, "properties": { "title": "API Gateway Requests", "view": "timeSeries", "stacked": false, "region": "${AWS::Region}", "metrics": [ ["AWS/ApiGateway", "Count", "ApiName", "${ApiName}", "Stage", "${StageName}"], [".", "4XXError", ".", ".", ".", "."], [".", "5XXError", ".", ".", ".", "."] ], "period": 300, "stat": "Sum" } }, { "type": "metric", "x": 12, "y": 0, "width": 12, "height": 6, "properties": { "title": "API Gateway Latency", "view": "timeSeries", "region": "${AWS::Region}", "metrics": [ ["AWS/ApiGateway", "Latency", "ApiName", "${ApiName}", "Stage", "${StageName}", {"stat": "p99"}], [".", ".", ".", ".", ".", ".", {"stat": "Average"}] ], "period": 300 } }, { "type": "metric", "x": 0, "y": 6, "width": 12, "height": 6, "properties": { "title": "Lambda Invocations", "view": "timeSeries", "region": "${AWS::Region}", "metrics": [ ["AWS/Lambda", "Invocations", "FunctionName", "${LambdaFunction}"], [".", "Errors", ".", "."], [".", "Throttles", ".", "."] ], "period": 60, "stat": "Sum" } }, { "type": "metric", "x": 12, "y": 6, "width": 12, "height": 6, "properties": { "title": "Lambda Duration", "view": "timeSeries", "region": "${AWS::Region}", "metrics": [ ["AWS/Lambda", "Duration", "FunctionName", "${LambdaFunction}", {"stat": "p99"}], [".", ".", ".", ".", {"stat": "Average"}], [".", ".", ".", ".", {"stat": "Maximum"}] ], "period": 60 } }, { "type": "log", "x": 0, "y": 12, "width": 24, "height": 6, "properties": { "title": "Application Logs", "view": "table", "region": "${AWS::Region}", "logGroupName": "${ApplicationLogGroup}", "timeRange": { "type": "relative", "from": 3600 }, "filterPattern": "ERROR | WARN" } } ] }

Dashboard for specific service

ServiceDashboard: Type: AWS::CloudWatch::Dashboard Properties: DashboardName: !Sub "${AWS::StackName}-${ServiceName}" DashboardBody: !Sub | { "start": "-PT6H", "widgets": [ { "type": "text", "x": 0, "y": 0, "width": 24, "height": 1, "properties": { "markdown": "# ${ServiceName} - ${Environment} Dashboard" } }, { "type": "metric", "x": 0, "y": 1, "width": 8, "height": 6, "properties": { "title": "Request Rate", "view": "timeSeries", "stacked": false, "region": "${AWS::Region}", "metrics": [ ["${CustomNamespace}", "RequestCount", "Service", "${ServiceName}", "Environment", "${Environment}"] ], "period": 60, "stat": "Sum" } }, { "type": "metric", "x": 8, "y": 1, "width": 8, "height": 6, "properties": { "title": "Error Rate %", "view": "timeSeries", "region": "${AWS::Region}", "metrics": [ ["${CustomNamespace}", "ErrorCount", "Service", "${ServiceName}"], [".", "RequestCount", ".", "."], [".", "SuccessCount", ".", "."] ], "period": 60, "stat": "Average" } }, { "type": "metric", "x": 16, "y": 1, "width": 8, "height": 6, "properties": { "title": "P99 Latency", "view": "timeSeries", "region": "${AWS::Region}", "metrics": [ ["${CustomNamespace}", "Latency", "Service", "${ServiceName}"] ], "period": 60, "stat": "p99" } } ] }

CloudWatch Logs

Log Group Configurations

AWSTemplateFormatVersion: 2010-09-09 Description: CloudWatch log groups configuration

Parameters: LogRetentionDays: Type: Number Default: 30 AllowedValues: - 1 - 3 - 5 - 7 - 14 - 30 - 60 - 90 - 120 - 150 - 180 - 365 - 400 - 545 - 731 - 1095 - 1827 - 2190 - 2555 - 2922 - 3285 - 3650

Resources:

Application log group

ApplicationLogGroup: Type: AWS::Logs::LogGroup Properties: LogGroupName: !Sub "/aws/applications/${Environment}/${ApplicationName}" RetentionInDays: !Ref LogRetentionDays KmsKeyId: !Ref LogKmsKeyArn Tags: - Key: Environment Value: !Ref Environment - Key: Application Value: !Ref ApplicationName - Key: Service Value: !Ref ServiceName

Lambda log group

LambdaLogGroup: Type: AWS::Logs::LogGroup Properties: LogGroupName: !Sub "/aws/lambda/${LambdaFunctionName}" RetentionInDays: !Ref LogRetentionDays KmsKeyId: !Ref LogKmsKeyArn

Subscription filter for Log Insights

LogSubscriptionFilter: Type: AWS::Logs::SubscriptionFilter Properties: DestinationArn: !GetAtt LogDestination.Arn FilterPattern: '[timestamp=*Z, request_id, level, message]' LogGroupName: !Ref ApplicationLogGroup RoleArn: !GetAtt LogSubscriptionRole.Arn

Metric filter for errors

ErrorMetricFilter: Type: AWS::Logs::MetricFilter Properties: FilterPattern: '[level="ERROR", msg]' LogGroupName: !Ref ApplicationLogGroup MetricTransformations: - MetricValue: "1" MetricNamespace: !Sub "${AWS::StackName}/Application" MetricName: ErrorCount - MetricValue: "$level" MetricNamespace: !Sub "${AWS::StackName}/Application" MetricName: LogLevel

Metric filter for warnings

WarningMetricFilter: Type: AWS::Logs::MetricFilter Properties: FilterPattern: '[level="WARN", msg]' LogGroupName: !Ref ApplicationLogGroup MetricTransformations: - MetricValue: "1" MetricNamespace: !Sub "${AWS::StackName}/Application" MetricName: WarningCount

Log group with custom retention

AuditLogGroup: Type: AWS::Logs::LogGroup Properties: LogGroupName: !Sub "/aws/audit/${Environment}/${ApplicationName}" RetentionInDays: 365 KmsKeyId: !Ref LogKmsKeyArn

Log Insights Query

AWSTemplateFormatVersion: 2010-09-09 Description: CloudWatch Logs Insights queries

Resources:

Query definition for recent errors

RecentErrorsQuery: Type: AWS::Logs::QueryDefinition Properties: Name: !Sub "${AWS::StackName}-recent-errors" QueryString: | fields @timestamp, @message | sort @timestamp desc | limit 100 | filter @message like /ERROR/ | display @timestamp, @message, @logStream

Query for performance analysis

PerformanceQuery: Type: AWS::Logs::QueryDefinition Properties: Name: !Sub "${AWS::StackName}-performance" QueryString: | fields @timestamp, @message, @duration | filter @duration > 1000 | sort @duration desc | limit 50 | display @timestamp, @duration, @message

Synthesized Canaries

AWSTemplateFormatVersion: 2010-09-09 Description: CloudWatch Synthesized Canaries

Parameters: CanarySchedule: Type: String Default: rate(5 minutes) Description: Schedule expression for canary

Resources:

Canary for API endpoint

ApiCanary: Type: AWS::Synthetics::Canary Properties: Name: !Sub "${AWS::StackName}-api-check" ArtifactS3Location: !Sub "s3://${ArtifactBucket}/canary/${AWS::StackName}" Code: S3Bucket: !Ref CanariesCodeBucket S3Key: canary/api-check.zip Handler: apiCheck.handler ExecutionRoleArn: !GetAtt CanaryRole.Arn RuntimeVersion: syn-python-selenium-1.1 Schedule: Expression: !Ref CanarySchedule DurationInSeconds: 120 SuccessRetentionPeriodInDays: 31 FailureRetentionPeriodInDays: 31 Tags: - Key: Environment Value: !Ref Environment - Key: Service Value: api

Alarm for canary failure

CanaryFailureAlarm: Type: AWS::CloudWatch::Alarm Properties: AlarmName: !Sub "${AWS::StackName}-canary-failed" AlarmDescription: Alert when synthesized canary fails MetricName: Failed Namespace: AWS/Synthetics Dimensions: - Name: CanaryName Value: !Ref ApiCanary Statistic: Sum Period: 60 EvaluationPeriods: 2 Threshold: 1 ComparisonOperator: GreaterThanOrEqualToThreshold AlarmActions: - !Ref AlarmTopic

Alarm for canary latency

CanaryLatencyAlarm: Type: AWS::CloudWatch::Alarm Properties: AlarmName: !Sub "${AWS::StackName}-canary-slow" AlarmDescription: Alert when canary latency is high MetricName: Duration Namespace: AWS/Synthetics Dimensions: - Name: CanaryName Value: !Ref ApiCanary Statistic: p99 Period: 300 EvaluationPeriods: 3 Threshold: 5000 ComparisonOperator: GreaterThanThreshold

CanaryRole: Type: AWS::IAM::Role Properties: RoleName: !Sub "${AWS::StackName}-canary-role" AssumeRolePolicyDocument: Version: "2012-10-17" Statement: - Effect: Allow Principal: Service: synthetics.amazonaws.com Action: sts:AssumeRole Policies: - PolicyName: SyntheticsLeastPrivilege PolicyDocument: Version: "2012-10-17" Statement: - Effect: Allow Action: - synthetics:DescribeCanaries - synthetics:DescribeCanaryRuns - synthetics:GetCanary - synthetics:ListTagsForResource Resource: "" - Effect: Allow Action: - synthetics:StartCanary - synthetics:StopCanary Resource: !Ref ApiCanary - Effect: Allow Action: - logs:CreateLogGroup - logs:CreateLogStream - logs:PutLogEvents - logs:DescribeLogStreams Resource: !Sub "arn:aws:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/lambda/cw-syn-canary-" - Effect: Allow Action: - s3:PutObject - s3:GetObject Resource: !Sub "s3://${ArtifactBucket}/canary/${AWS::StackName}/*" - Effect: Allow Action: - kms:Decrypt Resource: !Ref KmsKeyArn Condition: StringEquals: kms:ViaService: !Sub "s3.${AWS::Region}.amazonaws.com"

CloudWatch Application Signals

AWSTemplateFormatVersion: 2010-09-09 Description: CloudWatch Application Signals for APM

Resources:

Service level indicator for availability

AvailabilitySLI: Type: AWS::CloudWatch::ServiceLevelObjective Properties: Name: !Sub "${AWS::StackName}-availability" Description: Service level objective for availability Monitor: MonitorName: !Sub "${AWS::StackName}-monitor" MonitorType: AWS_SERVICE_LEVEL_INDICATOR ResourceGroup: !Ref ResourceGroup SliMetric: MetricName: Availability Namespace: !Sub "${AWS::StackName}/Application" Dimensions: - Name: Service Value: !Ref ServiceName Target: ComparisonOperator: GREATER_THAN_OR_EQUAL Threshold: 99.9 Period: RollingInterval: Count: 1 TimeUnit: HOUR Goal: TargetLevel: 99.9

Service level indicator for latency

LatencySLI: Type: AWS::CloudWatch::ServiceLevelIndicator Properties: Name: !Sub "${AWS::StackName}-latency-sli" Monitor: MonitorName: !Sub "${AWS::StackName}-monitor" Metric: MetricName: Latency Namespace: !Sub "${AWS::StackName}/Application" Dimensions: - Name: Service Value: !Ref ServiceName OperationName: GetItem AccountId: !Ref AWS::AccountId

Monitor for application performance

ApplicationMonitor: Type: AWS::CloudWatch::ApplicationMonitor Properties: MonitorName: !Sub "${AWS::StackName}-app-monitor" MonitorType: CW_MONITOR Telemetry: - Type: APM Config: Eps: 100

Conditions and Transform

Conditions for Environment-Specific Resources

AWSTemplateFormatVersion: 2010-09-09 Description: CloudWatch with conditional resources

Parameters: Environment: Type: String Default: dev AllowedValues: - dev - staging - production Description: Deployment environment

Conditions: IsProduction: !Equals [!Ref Environment, production] IsStaging: !Equals [!Ref Environment, staging] CreateAnomalyDetection: !Or [!Equals [!Ref Environment, staging], !Equals [!Ref Environment, production]] CreateSLI: !Equals [!Ref Environment, production]

Resources:

Base alarm for all environments

BaseAlarm: Type: AWS::CloudWatch::Alarm Properties: AlarmName: !Sub "${AWS::StackName}-errors" MetricName: Errors Namespace: !Ref CustomNamespace Dimensions: - Name: Service Value: !Ref ServiceName Statistic: Sum Period: 60 EvaluationPeriods: 5 Threshold: 10 ComparisonOperator: GreaterThanThreshold

Alarm with different thresholds for production

ProductionAlarm: Type: AWS::CloudWatch::Alarm Condition: IsProduction Properties: AlarmName: !Sub "${AWS::StackName}-errors-production" MetricName: Errors Namespace: !Ref CustomNamespace Dimensions: - Name: Service Value: !Ref ServiceName Statistic: Sum Period: 60 EvaluationPeriods: 3 Threshold: 1 ComparisonOperator: GreaterThanThreshold AlarmActions: - !Ref ProductionAlarmTopic

Anomaly detector only for staging and production

AnomalyDetector: Type: AWS::CloudWatch::AnomalyDetector Condition: CreateAnomalyDetection Properties: MetricName: RequestCount Namespace: !Ref CustomNamespace Dimensions: - Name: Service Value: !Ref ServiceName Statistic: Sum

SLI only for production

ServiceLevelIndicator: Type: AWS::CloudWatch::ServiceLevelIndicator Condition: CreateSLI Properties: Name: !Sub "${AWS::StackName}-sli" Monitor: MonitorName: !Sub "${AWS::StackName}-monitor" Metric: MetricName: Availability Namespace: !Sub "${AWS::StackName}/Application"

Transform for Code Reuse

AWSTemplateFormatVersion: 2010-09-09 Transform: AWS::Serverless-2016-10-31

Description: Using SAM Transform for CloudWatch resources

Globals: Function: Timeout: 30 Runtime: python3.11 Environment: Variables: LOG_LEVEL: INFO LoggingConfiguration: LogGroup: Name: !Sub "/aws/lambda/${FunctionName}" RetentionInDays: 30

Resources:

Lambda function with automatic logging

MonitoredFunction: Type: AWS::Serverless::Function Properties: FunctionName: !Sub "${AWS::StackName}-monitored" Handler: app.handler CodeUri: functions/monitored/ Policies: - PolicyName: LogsLeastPrivilege PolicyDocument: Version: "2012-10-17" Statement: - Effect: Allow Action: - logs:CreateLogGroup - logs:CreateLogStream - logs:PutLogEvents - logs:DescribeLogStreams - logs:GetLogEvents - logs:FilterLogEvents Resource: !Sub "arn:aws:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/lambda/${AWS::StackName}-" - Effect: Allow Action: - logs:DescribeLogGroups Resource: !Sub "arn:aws:logs:${AWS::Region}:${AWS::AccountId}:log-group:" Events: Api: Type: Api Properties: Path: /health Method: get

Best Practices

Security

  • Encrypt log groups with KMS keys

  • Use resource-based policies for log access

  • Implement cross-account log aggregation with proper IAM

  • Configure log retention appropriate for compliance

  • Use VPC endpoints for CloudWatch to isolate traffic

  • Implement least privilege for IAM roles

Performance

  • Use appropriate metric periods (60s for alarms, 300s for dashboards)

  • Implement composite alarms to reduce alarm fatigue

  • Use anomaly detection for non-linear patterns

  • Configure dashboards with efficient widgets

  • Limit retention period for log groups

Monitoring

  • Implement SLI/SLO for service health

  • Use multi-region dashboards for global applications

  • Configure alarms with proper evaluation periods

  • Implement canaries for synthetic monitoring

  • Use Application Signals for APM

Deployment

  • Use change sets before deployment

  • Test templates with cfn-lint

  • Organize stacks by ownership (network, app, data)

  • Use nested stacks for modularity

  • Implement stack policies for protection

CloudFormation Best Practices

Stack Policies

Stack policies protect critical resources from unintentional updates. Use them to prevent modifications to production resources.

AWSTemplateFormatVersion: 2010-09-09 Description: CloudWatch stack with protection policies

Resources: CriticalAlarm: Type: AWS::CloudWatch::Alarm Properties: AlarmName: !Sub "${AWS::StackName}-critical" MetricName: Errors Namespace: AWS/Lambda Statistic: Sum Period: 60 EvaluationPeriods: 5 Threshold: 1 ComparisonOperator: GreaterThanThreshold

Metadata: AWS::CloudFormation::StackPolicy: Statement: - Effect: Deny Principal: "" Action: - Update:Delete - Update:Modify Resource: "" - Effect: Allow Principal: "" Action: - Update:Modify Resource: "" Condition: StringEquals: aws:RequestedOperation: - Describe* - List*

Termination Protection

Enable termination protection to prevent accidental stack deletion, especially for production monitoring stacks.

Via Console:

  • Select the stack

  • Go to Stack actions > Change termination protection

  • Enable termination protection

Via CLI:

aws cloudformation update-termination-protection
--stack-name my-monitoring-stack
--enable-termination-protection

Via CloudFormation (Stack Set):

Resources: MonitoringStack: Type: AWS::CloudFormation::Stack Properties: TemplateURL: !Sub "https://${BucketName}.s3.amazonaws.com/monitoring.yaml" TerminationProtection: true

Drift Detection

Detect when actual infrastructure differs from the CloudFormation template.

Detect drift on a single stack:

aws cloudformation detect-drift
--stack-name my-monitoring-stack

Get drift detection status:

aws cloudFormation describe-stack-drift-detection-process-status
--stack-drift-detection-id <detection-id>

Get resources that have drifted:

aws cloudformation list-stack-resources
--stack-name my-monitoring-stack
--query "StackResourceSummaries[?StackResourceDriftStatus!='IN_SYNC']"

Automation with Lambda:

AWSTemplateFormatVersion: 2010-09-09 Description: Automated drift detection scheduler

Resources: DriftDetectionRole: Type: AWS::IAM::Role Properties: AssumeRolePolicyDocument: Version: "2012-10-17" Statement: - Effect: Allow Principal: Service: lambda.amazonaws.com Action: sts:AssumeRole ManagedPolicyArns: - arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole Policies: - PolicyName: CloudWatchDrift PolicyDocument: Version: "2012-10-17" Statement: - Effect: Allow Action: - cloudformation:DetectStackDrift - cloudformation:DescribeStacks - cloudformation:ListStackResources Resource: "*" - Effect: Allow Action: - sns:Publish Resource: !Ref AlertTopic

DriftDetectionFunction: Type: AWS::Lambda::Function Properties: Runtime: python3.11 Handler: drift_detector.handler Code: S3Bucket: !Ref CodeBucket S3Key: functions/drift-detector.zip Role: !GetAtt DriftDetectionRole.Arn Environment: Variables: SNS_TOPIC_ARN: !Ref AlertTopic

DriftDetectionRule: Type: AWS::Events::Rule Properties: ScheduleExpression: "rate(1 day)" Targets: - Id: DriftDetection Arn: !GetAtt DriftDetectionFunction.Arn

AlertTopic: Type: AWS::SNS::Topic Properties: TopicName: !Sub "${AWS::StackName}-drift-alerts"

Change Sets

Use change sets to preview and review changes before applying them.

Create change set:

aws cloudformation create-change-set
--stack-name my-monitoring-stack
--template-body file://updated-template.yaml
--change-set-name my-changeset
--capabilities CAPABILITY_IAM

List change sets:

aws cloudformation list-change-sets
--stack-name my-monitoring-stack

Describe change set:

aws cloudformation describe-change-set
--stack-name my-monitoring-stack
--change-set-name my-changeset

Execute change set:

aws cloudformation execute-change-set
--stack-name my-monitoring-stack
--change-set-name my-changeset

Pipeline integration:

AWSTemplateFormatVersion: 2010-09-09 Description: CI/CD pipeline for CloudWatch stacks

Resources: Pipeline: Type: AWS::CodePipeline::Pipeline Properties: Name: !Sub "${AWS::StackName}-pipeline" RoleArn: !GetAtt PipelineRole.Arn Stages: - Name: Source Actions: - Name: SourceAction ActionTypeId: Category: Source Owner: AWS Provider: CodeCommit Version: "1" Configuration: RepositoryName: !Ref RepositoryName BranchName: main OutputArtifacts: - Name: SourceOutput - Name: Validate Actions: - Name: ValidateTemplate ActionTypeId: Category: Test Owner: AWS Provider: CloudFormation Version: "1" Configuration: ActionMode: VALIDATE_ONLY TemplatePath: SourceOutput::template.yaml InputArtifacts: - Name: SourceOutput - Name: Review Actions: - Name: CreateChangeSet ActionTypeId: Category: Deploy Owner: AWS Provider: CloudFormation Version: "1" Configuration: ActionMode: CHANGE_SET_REPLACE StackName: !Ref StackName ChangeSetName: !Sub "${StackName}-changeset" TemplatePath: SourceOutput::template.yaml Capabilities: CAPABILITY_IAM,CAPABILITY_NAMED_IAM InputArtifacts: - Name: SourceOutput - Name: Approval ActionTypeId: Category: Approval Owner: AWS Provider: Manual Version: "1" Configuration: CustomData: Review changes before deployment - Name: Deploy Actions: - Name: ExecuteChangeSet ActionTypeId: Category: Deploy Owner: AWS Provider: CloudFormation Version: "1" Configuration: ActionMode: CHANGE_SET_EXECUTE StackName: !Ref StackName ChangeSetName: !Sub "${StackName}-changeset"

PipelineRole: Type: AWS::IAM::Role Properties: AssumeRolePolicyDocument: Version: "2012-10-17" Statement: - Effect: Allow Principal: Service: codepipeline.amazonaws.com Action: sts:AssumeRole Policies: - PolicyName: PipelinePolicy PolicyDocument: Version: "2012-10-17" Statement: - Effect: Allow Action: - codecommit:Get* - codecommit:List* - codecommit:BatchGet* Resource: "" - Effect: Allow Action: - s3:GetObject - s3:PutObject Resource: !Sub "arn:aws:s3:::${ArtifactBucket}/" - Effect: Allow Action: - cloudformation:* - iam:PassRole Resource: "*" - Effect: Allow Action: - sns:Publish Resource: !Ref ApprovalTopic

ApprovalTopic: Type: AWS::SNS::Topic Properties: TopicName: !Sub "${AWS::StackName}-approval"

Related Resources

  • CloudWatch Documentation

  • AWS CloudFormation User Guide

  • CloudWatch Best Practices

  • Service Level Indicators

  • CloudFormation Drift Detection

  • CloudFormation Change Sets

Additional Files

For complete details on resources and their properties, consult:

  • REFERENCE.md - Detailed reference guide for all CloudFormation resources

  • EXAMPLES.md - Complete production-ready examples

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

shadcn-ui

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

tailwind-css-patterns

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

unit-test-bean-validation

No summary provided by upstream source.

Repository SourceNeeds Review