scalability-playbook

Systematic approach to identifying and resolving scalability bottlenecks.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "scalability-playbook" with this command: npx skills add monkey1sai/openai-cli/monkey1sai-openai-cli-scalability-playbook

Scalability Playbook

Systematic approach to identifying and resolving scalability bottlenecks.

Bottleneck Analysis

Current System Profile

Traffic: 1,000 req/min Users: 10,000 active Data: 100GB database Response time: p95 = 500ms

Identified Bottlenecks

  1. Database Queries

Symptom: Slow page loads (2-3s) Measurement: Query time p95 = 800ms Impact: HIGH - affects all reads Trigger: When p95 >500ms

  1. Single Server

Symptom: High CPU (>80%) Measurement: Load average >4 Impact: MEDIUM - intermittent slowdowns Trigger: When CPU >70%

  1. No Caching

Symptom: Repeated DB queries Measurement: Cache hit rate = 0% Impact: MEDIUM - unnecessary load Trigger: When query volume >10k/min

Scaling Strategies (Ordered)

Level 1: Quick Wins (Days)

1.1 Add Database Indexes

Problem: Slow queries Solution:

CREATE INDEX idx_users_email ON users(email); CREATE INDEX idx_orders_user_created ON orders(user_id, created_at);

Expected Impact: 80% faster queries Cost: $0 Effort: 1 day

1.2 Enable Query Caching

Problem: Repeated queries Solution: Redis cache layer

const cached = await redis.get(user:${userId}); if (cached) return JSON.parse(cached);

const user = await db.users.findById(userId); await redis.setex(user:${userId}, 3600, JSON.stringify(user));

Expected Impact: 60% reduction in DB load Cost: $50/month Effort: 2 days

Level 2: Horizontal Scaling (Weeks)

2.1 Add Read Replicas

Problem: Read-heavy workload Solution: Route reads to replicas

Write Load: Primary DB Read Load: 3x Read Replicas

Expected Impact: 3x read capacity Cost: $300/month Effort: 1 week

2.2 Load Balancer + Multiple Servers

Problem: Single point of failure Solution:

ALB ├── Server 1 ├── Server 2 └── Server 3

Expected Impact: 3x throughput Cost: $400/month Effort: 1 week

Level 3: Architecture Changes (Months)

3.1 CDN for Static Assets

Problem: Slow asset delivery Solution: CloudFront CDN Expected Impact: 90% faster asset loads Cost: $100/month Effort: 1 week

3.2 Async Processing

Problem: Slow sync operations Solution: Background job queues

// Before: Sync await sendEmail(user); await processPayment(order); await updateAnalytics(event); return response; // Waits 5+ seconds

// After: Async await queue.add("send-email", { userId }); await queue.add("process-payment", { orderId }); await queue.add("update-analytics", { event }); return response; // Returns immediately

Expected Impact: 80% faster responses Cost: $50/month (SQS) Effort: 2 weeks

Level 4: Data Layer Optimization (Months)

4.1 Database Sharding

Problem: Single DB too large Solution: Shard by user_id

Shard 1: user_id 0-24999 Shard 2: user_id 25000-49999 Shard 3: user_id 50000-74999 Shard 4: user_id 75000-99999

Expected Impact: 4x capacity Cost: $1,200/month Effort: 2 months

4.2 Event-Driven Architecture

Problem: Tight coupling, cascading failures Solution: Message broker (Kafka)

Service A → Kafka → Service B ↘ ↗ Service C

Expected Impact: Better isolation, resilience Cost: $500/month Effort: 3 months

Scaling Triggers

MetricCurrentWarningCriticalAction
CPU40%70%85%Add servers
Memory50%75%90%Upgrade instances
DB Connections204050Add read replicas
Query Time (p95)200ms500ms1000msAdd indexes
Queue Depth10010005000Add workers
Error Rate0.1%1%5%Investigate immediately

Phased Scaling Plan

Phase 1: Current → 10x (0-3 months)

Target: 10,000 req/min, 100K users

Actions:

  • Add database indexes (Week 1)

  • Implement Redis caching (Week 2)

  • Add 3x read replicas (Week 4)

  • Horizontal scale app servers (Week 6)

  • CDN for static assets (Week 8)

Cost: $500 → $1,000/month

Phase 2: 10x → 100x (3-12 months)

Target: 100,000 req/min, 1M users

Actions:

  • Database sharding (Month 4-6)

  • Multi-region deployment (Month 6-8)

  • Microservices extraction (Month 8-12)

  • Event-driven architecture (Month 10-12)

Cost: $1,000 → $10,000/month

Phase 3: 100x → 1000x (12-24 months)

Target: 1M req/min, 10M users

Actions:

  • Global CDN (Month 13)

  • Advanced caching (L1/L2) (Month 14-15)

  • Custom DB solutions (Month 16-18)

  • Edge computing (Month 18-20)

Cost: $10,000 → $100,000/month

Load Testing Plan

Current baseline

hey -n 10000 -c 100 https://api.example.com/users

Target 10x

hey -n 100000 -c 1000 https://api.example.com/users

Measure:

- Requests/sec

- p50, p95, p99 latency

- Error rate

- Resource utilization

Cost-Benefit Analysis

StrategyCost/MonthExpected ImpactROIPriority
DB Indexes$080% faster queriesHIGH
Redis Cache$5060% less DB load12xHIGH
Read Replicas$3003x capacity10xMEDIUM
Load Balancer$4003x throughput7xMEDIUM
DB Sharding$1,2004x capacity3xLOW

Best Practices

  • Measure first: Don't optimize blindly

  • Low-hanging fruit: Start with easy wins

  • Load test: Validate before production

  • Monitor continuously: Set up alerts

  • Plan ahead: Scale before hitting limits

  • Cost-conscious: ROI-driven decisions

  • Incremental: Small, safe changes

Output Checklist

  • Current system profile

  • Bottlenecks identified and measured

  • Scaling strategies ordered by effort

  • Triggers defined for each action

  • Phased plan (1x → 10x → 100x)

  • Cost estimates per phase

  • Load testing plan

  • Monitoring dashboard

  • Rollback procedures

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

bruno-collection-generator

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

responsive-design-system

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

multi-tenant-safety-checker

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

redis-patterns

No summary provided by upstream source.

Repository SourceNeeds Review