Monitoring Config Auditor
This skill provides a proactive audit of your "Observability" setup before it goes to production.
Capabilities
- Alarm Integrity Check
-
Scans for missing basic alarms (CPU, Error Rate, Disk Space).
-
Verifies that thresholds match the project's Non-Functional Requirements.
- Notification Audit
-
Ensures that every alarm has a defined and valid notification destination (SNS, Slack, PagerDuty).
-
Validates that high-severity alerts follow PagerDuty Best Practices (e.g., automated escalation, actionable context).
Usage
-
"Audit our current Terraform files for monitoring compliance."
-
"Are we missing any critical alerts for this new microservice deployment?"
Knowledge Protocol
-
This skill adheres to the knowledge/orchestration/knowledge-protocol.md .
-
References Monitoring Best Practices for New Relic, Datadog, and general observability standards.
-
References SLO & Dashboard Best Practices for service level management and visualization standards.
-
References Modern SRE Best Practices for IaC monitoring and synthetic testing standards.
-
References PagerDuty Best Practices for alerting standards.