AWS FinOps Skill for Opsy
Step 1: Cost Explorer First
Start with Cost Explorer — one call covers all regions and services:
-
Spend by service — identifies top cost drivers
-
Spend by region — shows where resources live
-
Daily trend — spots anomalies
Focus on services representing >5% of spend.
If Credits Mask Costs ($0 spend)
Check if Resource Explorer is enabled:
aws resource-explorer-2 list-indexes --region us-east-1
If enabled, use it — one call gets ALL resources:
aws resource-explorer-2 search --query-string "*" --region us-east-1
If NOT enabled, use resourcegroupstaggingapi to find all tagged resources:
aws resourcegroupstaggingapi get-resources --region us-east-1
Then query each active region for core services: EC2, RDS, EBS, Lambda, S3, ECS, EKS, NAT Gateways, Load Balancers.
Step 2: Deep Dive Each Resource
For every resource found, gather full details:
-
EC2: Instance type, state, launch time, CloudWatch CPU/memory
-
RDS: Instance class, connections (14d), storage, Multi-AZ, engine
-
EBS: Attachment status, volume type, size, snapshots
-
S3: Lifecycle policies, storage class, versioning
-
Lambda: Invocations (30d), memory, runtime, provisioned concurrency
-
ECS/EKS: Task definitions, service counts, cluster utilization
-
ECR: Repositories, image count, lifecycle policies
-
Load Balancers: Request count (14d), target groups
-
NAT Gateway: Data processed
-
Elastic IPs: Association status
-
CloudWatch Logs: Retention settings
-
Secrets Manager: Secret count
Check EVERY resource for optimization opportunities. Don't skip services.
Step 3: Check Commitment Coverage
-
Savings Plans utilization
-
Reserved Instance coverage gaps
-
Expiring commitments (next 30 days)
Safety Guardrails
Report findings with evidence, suggest investigation — not direct actions:
-
"Instance i-xxx averaged 3% CPU over 30 days — rightsizing candidate"
-
"Volume vol-xxx unattached since [date] — verify before removing"
-
"RDS db-xxx had 0 connections for 14 days — confirm if still needed"
Thresholds:
-
Idle: ~0% utilization for 14+ days
-
Underutilized: <10% average for 14+ days
-
Rightsizing candidate: <30% average
Smart Recommendation Rules
Only flag when action is possible:
Situation Action
Minimum size + in use (db.t3.micro with connections) Skip — already right-sized
Minimum size + idle (db.t3.micro, 0 connections) Flag as idle
Larger size + low utilization Flag for rightsizing with specific target
Tagged FinOps:Skip=true
Skip
Dev/staging with Environment=dev
Skip low utilization (expected)
Before flagging, verify:
-
Is this the minimum size?
-
Is it actually in use? (connections/invocations/requests)
-
Is there a smaller option?
Service Checklists
EC2: Utilization, stopped instances (EBS cost), previous-gen types, On-Demand 24/7 → SP/RI
Lambda: Zero invocations (30d), memory vs duration tradeoff, provisioned concurrency
ECS/EKS: Fargate vs EC2, resource requests vs usage, Spot for fault-tolerant
ECR: Lifecycle policies, image count, total size — old images accumulate
RDS: Connection count, Multi-AZ in dev, instance class utilization, storage, previous-gen
DynamoDB: Provisioned vs On-Demand fit, auto-scaling, TTL
ElastiCache/OpenSearch: Node utilization, reserved coverage
S3: Lifecycle policies, storage class, Intelligent-Tiering, incomplete multipart uploads
EBS: Unattached volumes, gp2→gp3, snapshot retention, IOPS necessity
Networking: Cross-AZ transfer, NAT Gateway → VPC endpoints, CloudFront caching
Load Balancers: Zero requests = orphaned, Classic→ALB/NLB
Elastic IPs: Unassociated = $3.60/month each
CloudWatch: Log retention (default infinite), high-res metrics necessity
Secrets Manager: $0.40/month vs free Parameter Store
API Gateway: HTTP API 70% cheaper than REST
Output Requirements
CSV (Required)
account_id,resource_name,status,recommendation_type,potential_savings_monthly,resource_id,region,resource_type,tags,description 123456789012,web-server-prod,Underutilized,Rightsizing to t3.small,45.00,i-0abc123def456,us-east-1,EC2 Instance,"Environment=prod,Team=platform","Avg CPU 8% over 30 days. Current: t3.large" 123456789012,,Unattached,Verify before removing,12.50,vol-0xyz789,us-east-1,EBS Volume,,"100GB gp2 volume unattached since 2024-12-01" 123456789012,raspberry,No-Lifecycle,Add ECR lifecycle policy,2.00,raspberry,us-east-1,ECR Repository,,"47 images totaling 12GB. No lifecycle policy configured"
Column Description
account_id
AWS account ID
resource_name
Name tag value (empty if untagged)
status
Idle , Underutilized , Oversized , Unattached , Previous-Gen , No-Lifecycle , Uncovered
recommendation_type
Specific action: Rightsizing to X , Verify before removing , Add ECR lifecycle policy , Switch to gp3 , Consider SP/RI
potential_savings_monthly
USD/month (conservative estimate)
resource_id
AWS resource ID
region
AWS region
resource_type
EC2 Instance , EBS Volume , RDS Instance , ECR Repository , S3 Bucket , etc.
tags
Resource tags as key=value pairs (from AWS tags, not Name)
description
Details: current state, metrics, why flagged, specific numbers
Summary (Markdown)
AWS FinOps Summary
Account: [id] | Date: [date]
Spend Overview
- Monthly spend: $X,XXX
- Top spenders: [service1] (X%), [service2] (X%), [service3] (X%)
Findings
- Total potential savings: $X,XXX/month
- Resources flagged: X
Top 5 Opportunities
- [resource] - [recommendation] - $X/month
Next Steps
- [investigation recommendations]
Save as finops-report-[account-id]-[date].csv