Capacity Planner
Forecast when your infrastructure will hit limits. Analyze historical metrics (CPU, memory, disk, network, database connections), project growth curves, identify approaching bottlenecks, and recommend right-sizing — so you scale proactively instead of reactively.
Use when: "when will we run out of space", "capacity forecast", "right-size our instances", "are we over-provisioned", "plan for traffic growth", "infrastructure scaling plan", "when do we need to upgrade", or before budget planning.
Commands
1. forecast — Project Resource Exhaustion
Step 1: Collect Historical Metrics
# Prometheus — CPU utilization over last 30 days
curl -s "$PROMETHEUS_URL/api/v1/query_range" \
--data-urlencode 'query=avg(rate(node_cpu_seconds_total{mode!="idle"}[5m])) by (instance)' \
--data-urlencode "start=$(date -d '30 days ago' +%s)" \
--data-urlencode "end=$(date +%s)" \
--data-urlencode 'step=1h' | python3 -c "
import json, sys
data = json.load(sys.stdin)
for result in data['data']['result']:
instance = result['metric']['instance']
values = [float(v[1]) for v in result['values']]
avg = sum(values) / len(values)
peak = max(values)
trend = (values[-1] - values[0]) / len(values) # slope per hour
print(f'{instance}: avg={avg:.1%} peak={peak:.1%} trend={trend:+.4%}/hr')
"
# Disk usage over time
df -h / /data /var 2>/dev/null
# Historical disk growth (if monitoring available)
curl -s "$PROMETHEUS_URL/api/v1/query_range" \
--data-urlencode 'query=node_filesystem_avail_bytes{mountpoint="/"}' \
--data-urlencode "start=$(date -d '30 days ago' +%s)" \
--data-urlencode "end=$(date +%s)" \
--data-urlencode 'step=1d'
# Memory usage
free -h
# Database connections
curl -s "$PROMETHEUS_URL/api/v1/query" \
--data-urlencode 'query=pg_stat_activity_count / pg_settings_max_connections'
If Prometheus unavailable, use CloudWatch, Datadog, or system tools:
# Last 30 days of CloudWatch CPU
aws cloudwatch get-metric-statistics --namespace AWS/EC2 \
--metric-name CPUUtilization --statistics Average Maximum \
--dimensions Name=InstanceId,Value=i-0abc123 \
--start-time $(date -d '30 days ago' -u +%Y-%m-%dT%H:%M:%SZ) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) --period 86400
Step 2: Fit Growth Model
For each resource, determine growth pattern:
- Linear: constant rate of increase (disk filling at 2GB/day)
- Exponential: accelerating growth (user base doubling quarterly)
- Seasonal: cyclical patterns (weekend dips, end-of-month spikes)
- Flat: no growth (stable, well-bounded workload)
Calculate days until exhaustion:
days_remaining = (capacity - current_usage) / daily_growth_rate
For exponential: use doubling time to project.
Step 3: Generate Forecast Report
# Capacity Forecast Report — [date]
## Critical (exhaustion < 30 days)
| Resource | Current | Capacity | Growth/day | Exhaustion | Action |
|----------|---------|----------|------------|------------|--------|
| Disk (/) | 45 GB | 50 GB | 180 MB/day | ~28 days | Expand volume or add cleanup cron |
| DB connections | 85/100 | 100 | +2/week | ~5 weeks | Increase max_connections or add pgbouncer |
## Warning (exhaustion 30-90 days)
| Resource | Current | Capacity | Growth/day | Exhaustion |
|----------|---------|----------|------------|------------|
| Memory | 12/16 GB | 16 GB | 50 MB/day | ~82 days |
## Healthy (>90 days or no growth)
- CPU: avg 35%, peak 72%, flat trend — no action needed
- Network: avg 200 Mbps of 1 Gbps — no concern
## Over-Provisioned (wasting money)
| Resource | Used | Provisioned | Utilization | Savings |
|----------|------|-------------|-------------|---------|
| worker-pool-3 | 2 vCPU avg | 8 vCPU | 25% | Downsize to 4 vCPU, save ~$150/mo |
| Redis cluster | 512 MB | 8 GB | 6% | Downsize to 2 GB, save ~$80/mo |
2. rightsize — Recommend Instance Sizes
Given current utilization and growth projections:
- Map workload to optimal instance family (compute, memory, storage-optimized)
- Factor in reserved instance / savings plan pricing
- Account for headroom (recommend 60-70% target utilization, not 95%)
- Compare across cloud providers if multi-cloud
3. cost-model — Project Infrastructure Costs
Given the capacity forecast:
- Calculate current monthly spend
- Project spend at 3, 6, 12 months based on growth
- Identify the biggest cost drivers
- Suggest cost optimization levers (spot instances, reserved pricing, auto-scaling, compression, archival)
4. bottleneck — Identify Scaling Bottlenecks
Analyze the system for the component that will fail first under load:
- Database (connections, IOPS, lock contention)
- Application (CPU-bound, memory-bound, thread pool exhaustion)
- Network (bandwidth, DNS resolution, TLS handshake overhead)
- External dependencies (rate limits, API quotas, third-party SLAs)
Rank bottlenecks by "time to impact" and recommend mitigation order.