System Design Generator

Create comprehensive system architecture plans from requirements.

System Design Document Template

System Design: [Feature/Product Name]

Overview

Brief description of what we're building and why.

Requirements

Functional

User can upload videos (max 1GB)
System processes video within 5 minutes
User receives notification when complete

Non-Functional

Handle 1000 uploads/day
99.9% uptime
Process videos in <5 minutes (p95)
Cost: <$0.50 per video

High-Level Architecture

┌─────────┐ ┌──────────┐ ┌─────────────┐ │ Client │─────▶│ API │─────▶│ Upload │ │ │ │ Gateway │ │ Service │ └─────────┘ └──────────┘ └─────────────┘ │ ▼ ┌─────────────┐ │ Storage │ │ (S3) │ └─────────────┘ │ ▼ ┌─────────────┐ │ Processing │◀─┐ │ Queue │ │ └─────────────┘ │ │ │ ▼ │ ┌─────────────┐ │ │ Processor │─┘ │ Workers │ └─────────────┘ │ ▼ ┌─────────────┐ │Notification │ │ Service │ └─────────────┘

Components

1. API Gateway

Responsibilities:

Authentication
Rate limiting
Request routing

Technology: Kong/AWS API Gateway Scaling: Auto-scale based on requests/sec

2. Upload Service

Responsibilities:

Generate pre-signed S3 URLs
Validate file metadata
Enqueue processing jobs

API:

POST /uploads Request: { filename, size, content_type } Response: { upload_url, upload_id }

Technology: Node.js + Express Scaling: Horizontal (stateless)

3. Storage (S3)

Responsibilities:

Store raw videos
Store processed outputs
Serve content via CDN

Structure:

/uploads/{user_id}/{upload_id}/original.mp4 /processed/{user_id}/{upload_id}/output.mp4

4. Processing Queue

Responsibilities:

Buffer processing jobs
Ensure at-least-once delivery
DLQ for failed jobs

Technology: AWS SQS Configuration:

Visibility timeout: 15 minutes
DLQ after 3 retries

5. Processor Workers

Responsibilities:

Transcode videos
Generate thumbnails
Update database

Technology: Python + FFmpeg Scaling: Auto-scale on queue depth

Data Flow

Upload Flow

Client requests upload URL from Upload Service
Upload Service generates pre-signed S3 URL
Client uploads directly to S3
Client notifies Upload Service of completion
Upload Service enqueues processing job
Returns upload_id to client

Processing Flow

Worker polls queue for jobs
Downloads video from S3
Processes video (transcode, thumbnail)
Uploads results to S3
Updates database status
Sends notification
Deletes message from queue

Data Model

interface Upload {
  id: string;
  user_id: string;
  filename: string;
  size: number;
  status: 'pending' | 'processing' | 'complete' | 'failed';
  original_url: string;
  processed_url?: string;
  created_at: Date;
  processed_at?: Date;
}

interface ProcessingJob {
  upload_id: string;
  attempts: number;
  error?: string;
}

API Contract

Upload Endpoints

POST   /uploads           - Request upload URL
GET    /uploads/:id       - Get upload status
DELETE /uploads/:id       - Cancel upload
GET    /uploads           - List user uploads

Webhooks

POST {webhook_url}
{
  "event": "upload.completed",
  "upload_id": "...",
  "status": "complete",
  "processed_url": "..."
}

Scaling Considerations

Current Capacity

- 1000 uploads/day = ~1 per minute

- Single worker can process 1 video every 5 minutes

- Need 5 workers for current load

10x Scale (10,000/day)

- ~10 uploads per minute

- Need 50 workers

- Use spot instances for cost savings

- Add Redis cache for status checks

100x Scale (100,000/day)

- ~100 uploads per minute

- Partition by region

- Use Kafka instead of SQS

- Database sharding by user_id

Failure Modes

S3 Unavailable

- Impact: Uploads fail

- Mitigation: Multi-region S3 replication

Queue Backed Up

- Impact: Processing delays

- Mitigation: Auto-scale workers faster

Worker Crash During Processing

- Impact: Job retried

- Mitigation: Idempotent processing

Cost Estimate

Monthly (1000 uploads/day):

- S3 Storage: $50

- S3 Transfer: $100

- SQS: $10

- Workers (EC2): $300

- Database: $100
Total: ~$560/month

Security

- Pre-signed URLs expire in 1 hour

- Videos in private S3 buckets

- CloudFront signed URLs for delivery

- Rate limiting per user

Monitoring

Metrics:

- Upload success rate

- Processing time (p50, p95, p99)

- Queue depth

- Worker CPU/memory

- Error rate by type

Alerts:

- Queue depth >1000

- Processing time p95 >10 minutes

- Error rate >5%

Open Questions

-  Video retention policy? (30 days? 1 year?)

-  Maximum video duration? (affects processing time)

-  Regional data residency requirements?

## Component Template

```markdown
### Component Name

**Responsibilities:**
- Primary responsibility
- Secondary responsibility

**Technology Stack:**
- Language: [Python/Node/Go]
- Framework: [Express/FastAPI/Gin]
- Database: [PostgreSQL/MongoDB]

**API/Interface:**
```typescript
interface ComponentAPI {
  method(params): ReturnType;
}

Scaling Strategy:

- Horizontal: Stateless, load balanced

- Vertical: Cache layer, connection pooling

Dependencies:

- Service A (for X)

- Database B (for persistence)

Failure Handling:

- Retry with exponential backoff

- Circuit breaker for downstream services

- Fallback to cached data

## Best Practices

1. **Start with requirements**: Functional + non-functional
2. **Draw diagrams first**: Visual clarity
3. **Define boundaries**: What's in scope vs out
4. **Document tradeoffs**: Every choice has costs
5. **Plan for failure**: What breaks and how to handle
6. **Consider scale**: Current, 10x, 100x
7. **Estimate costs**: Build vs buy decisions
8. **Leave open questions**: Don't pretend to know everything

## Output Checklist

- [ ] Requirements documented (functional + non-functional)
- [ ] High-level architecture diagram
- [ ] Component breakdown (3-7 components)
- [ ] Data flow documented
- [ ] Data model defined
- [ ] API contracts specified
- [ ] Scaling considerations (1x, 10x, 100x)
- [ ] Failure modes identified
- [ ] Cost estimate provided
- [ ] Security considerations
- [ ] Monitoring plan

system-design-generator

Safety Notice

Copy this and send it to your AI assistant to learn

System Design: [Feature/Product Name]

Overview

Requirements

Functional

Non-Functional

High-Level Architecture

Components

1. API Gateway

2. Upload Service

3. Storage (S3)

4. Processing Queue

5. Processor Workers

Data Flow

Upload Flow

Processing Flow

Data Model

Source Transparency

Related Skills

responsive-design-system

rate-limiting-abuse-protection

bruno-collection-generator