system-design-generator

System Design Generator

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "system-design-generator" with this command: npx skills add monkey1sai/openai-cli/monkey1sai-openai-cli-system-design-generator

System Design Generator

Create comprehensive system architecture plans from requirements.

System Design Document Template

System Design: [Feature/Product Name]

Overview

Brief description of what we're building and why.

Requirements

Functional

  • User can upload videos (max 1GB)
  • System processes video within 5 minutes
  • User receives notification when complete

Non-Functional

  • Handle 1000 uploads/day
  • 99.9% uptime
  • Process videos in <5 minutes (p95)
  • Cost: <$0.50 per video

High-Level Architecture

┌─────────┐ ┌──────────┐ ┌─────────────┐ │ Client │─────▶│ API │─────▶│ Upload │ │ │ │ Gateway │ │ Service │ └─────────┘ └──────────┘ └─────────────┘ │ ▼ ┌─────────────┐ │ Storage │ │ (S3) │ └─────────────┘ │ ▼ ┌─────────────┐ │ Processing │◀─┐ │ Queue │ │ └─────────────┘ │ │ │ ▼ │ ┌─────────────┐ │ │ Processor │─┘ │ Workers │ └─────────────┘ │ ▼ ┌─────────────┐ │Notification │ │ Service │ └─────────────┘

Components

1. API Gateway

Responsibilities:

  • Authentication
  • Rate limiting
  • Request routing

Technology: Kong/AWS API Gateway Scaling: Auto-scale based on requests/sec

2. Upload Service

Responsibilities:

  • Generate pre-signed S3 URLs
  • Validate file metadata
  • Enqueue processing jobs

API:

POST /uploads Request: { filename, size, content_type } Response: { upload_url, upload_id }

Technology: Node.js + Express Scaling: Horizontal (stateless)

3. Storage (S3)

Responsibilities:

  • Store raw videos
  • Store processed outputs
  • Serve content via CDN

Structure:

/uploads/{user_id}/{upload_id}/original.mp4 /processed/{user_id}/{upload_id}/output.mp4

4. Processing Queue

Responsibilities:

  • Buffer processing jobs
  • Ensure at-least-once delivery
  • DLQ for failed jobs

Technology: AWS SQS Configuration:

  • Visibility timeout: 15 minutes
  • DLQ after 3 retries

5. Processor Workers

Responsibilities:

  • Transcode videos
  • Generate thumbnails
  • Update database

Technology: Python + FFmpeg Scaling: Auto-scale on queue depth

Data Flow

Upload Flow

  1. Client requests upload URL from Upload Service
  2. Upload Service generates pre-signed S3 URL
  3. Client uploads directly to S3
  4. Client notifies Upload Service of completion
  5. Upload Service enqueues processing job
  6. Returns upload_id to client

Processing Flow

  1. Worker polls queue for jobs
  2. Downloads video from S3
  3. Processes video (transcode, thumbnail)
  4. Uploads results to S3
  5. Updates database status
  6. Sends notification
  7. Deletes message from queue

Data Model

interface Upload {
  id: string;
  user_id: string;
  filename: string;
  size: number;
  status: 'pending' | 'processing' | 'complete' | 'failed';
  original_url: string;
  processed_url?: string;
  created_at: Date;
  processed_at?: Date;
}

interface ProcessingJob {
  upload_id: string;
  attempts: number;
  error?: string;
}

API Contract

Upload Endpoints

POST   /uploads           - Request upload URL
GET    /uploads/:id       - Get upload status
DELETE /uploads/:id       - Cancel upload
GET    /uploads           - List user uploads

Webhooks

POST {webhook_url}
{
  "event": "upload.completed",
  "upload_id": "...",
  "status": "complete",
  "processed_url": "..."
}

Scaling Considerations

Current Capacity

- 1000 uploads/day = ~1 per minute

- Single worker can process 1 video every 5 minutes

- Need 5 workers for current load

10x Scale (10,000/day)

- ~10 uploads per minute

- Need 50 workers

- Use spot instances for cost savings

- Add Redis cache for status checks

100x Scale (100,000/day)

- ~100 uploads per minute

- Partition by region

- Use Kafka instead of SQS

- Database sharding by user_id

Failure Modes

S3 Unavailable

- Impact: Uploads fail

- Mitigation: Multi-region S3 replication

Queue Backed Up

- Impact: Processing delays

- Mitigation: Auto-scale workers faster

Worker Crash During Processing

- Impact: Job retried

- Mitigation: Idempotent processing

Cost Estimate

Monthly (1000 uploads/day):

- S3 Storage: $50

- S3 Transfer: $100

- SQS: $10

- Workers (EC2): $300

- Database: $100
Total: ~$560/month

Security

- Pre-signed URLs expire in 1 hour

- Videos in private S3 buckets

- CloudFront signed URLs for delivery

- Rate limiting per user

Monitoring

Metrics:

- Upload success rate

- Processing time (p50, p95, p99)

- Queue depth

- Worker CPU/memory

- Error rate by type

Alerts:

- Queue depth >1000

- Processing time p95 >10 minutes

- Error rate >5%

Open Questions

-  Video retention policy? (30 days? 1 year?)

-  Maximum video duration? (affects processing time)

-  Regional data residency requirements?

## Component Template

```markdown
### Component Name

**Responsibilities:**
- Primary responsibility
- Secondary responsibility

**Technology Stack:**
- Language: [Python/Node/Go]
- Framework: [Express/FastAPI/Gin]
- Database: [PostgreSQL/MongoDB]

**API/Interface:**
```typescript
interface ComponentAPI {
  method(params): ReturnType;
}

Scaling Strategy:

- Horizontal: Stateless, load balanced

- Vertical: Cache layer, connection pooling

Dependencies:

- Service A (for X)

- Database B (for persistence)

Failure Handling:

- Retry with exponential backoff

- Circuit breaker for downstream services

- Fallback to cached data

## Best Practices

1. **Start with requirements**: Functional + non-functional
2. **Draw diagrams first**: Visual clarity
3. **Define boundaries**: What's in scope vs out
4. **Document tradeoffs**: Every choice has costs
5. **Plan for failure**: What breaks and how to handle
6. **Consider scale**: Current, 10x, 100x
7. **Estimate costs**: Build vs buy decisions
8. **Leave open questions**: Don't pretend to know everything

## Output Checklist

- [ ] Requirements documented (functional + non-functional)
- [ ] High-level architecture diagram
- [ ] Component breakdown (3-7 components)
- [ ] Data flow documented
- [ ] Data model defined
- [ ] API contracts specified
- [ ] Scaling considerations (1x, 10x, 100x)
- [ ] Failure modes identified
- [ ] Cost estimate provided
- [ ] Security considerations
- [ ] Monitoring plan

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

responsive-design-system

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

rate-limiting-abuse-protection

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

bruno-collection-generator

No summary provided by upstream source.

Repository SourceNeeds Review