system-design-patterns

System Design Patterns

Design scalable, reliable, and performant systems with proven patterns.

When to Use

Designing new systems or features
Evaluating architecture trade-offs
Planning for scale
Improving system reliability
Making infrastructure decisions

Core Principles

CAP Theorem

Property Meaning Trade-off

Consistency All nodes see the same data Higher latency

Availability System responds to every request May return stale data

Partition Tolerance System works despite network failures Must sacrifice C or A

Choose 2:

CP: Banking, inventory (consistency critical)
AP: Social media, caching (availability critical)
CA: Single-node systems only (no network partitions)

ACID vs BASE

ACID (Traditional RDBMS) BASE (Distributed)

Atomicity Basically Available

Consistency Soft state

Isolation Eventually consistent

Durability

Scalability Patterns

Horizontal vs Vertical Scaling

Vertical Scaling (Scale Up) Horizontal Scaling (Scale Out) ┌─────────────────────────┐ ┌──────┐ ┌──────┐ ┌──────┐ │ │ │ │ │ │ │ │ │ Bigger Server │ vs │Server│ │Server│ │Server│ │ │ │ │ │ │ │ │ │ More CPU, RAM, Storage │ │ │ │ │ │ │ └─────────────────────────┘ └──────┘ └──────┘ └──────┘

Pros: Pros:

Simple to implement - Near-infinite scale
No code changes - Fault tolerant
Lower operational complexity - Cost effective at scale

Cons: Cons:

Hardware limits - Distributed complexity
Single point of failure - Data consistency challenges
Expensive at scale - More operational overhead

Load Balancing Strategies

// Strategy selection based on use case

public enum LoadBalancingStrategy { // Simple, stateless services RoundRobin,

// Varying server capacities
WeightedRoundRobin,

// Session affinity needed
IpHash,

// Optimal resource utilization
LeastConnections,

// Latency-sensitive applications
LeastResponseTime,

// Geographic distribution
GeographicBased

}

Strategy Use Case Trade-off

Round Robin Stateless, homogeneous No health awareness

Weighted Different server sizes Manual configuration

IP Hash Session stickiness Uneven distribution

Least Connections Long-lived connections Overhead tracking

Geographic Global users Complexity

Database Scaling

Read Replicas

┌─────────────────────────────────────────────────────┐ │ Application │ └──────────────────────┬──────────────────────────────┘ │ ┌──────────────┴──────────────┐ │ │ ▼ ▼ ┌───────────────┐ ┌─────────────────┐ │ Primary DB │──────────►│ Read Replica 1 │ │ (Writes) │ Async ├─────────────────┤ │ │──────────►│ Read Replica 2 │ └───────────────┘ Repl └─────────────────┘ ▲ │ Read Queries

// Read/Write splitting in ABP public class PatientAppService : ApplicationService { private readonly IReadOnlyRepository<Patient, Guid> _readRepository; private readonly IRepository<Patient, Guid> _writeRepository;

// Reads go to replicas
public async Task&#x3C;PatientDto> GetAsync(Guid id)
{
    var patient = await _readRepository.GetAsync(id);
    return ObjectMapper.Map&#x3C;Patient, PatientDto>(patient);
}

// Writes go to primary
public async Task&#x3C;PatientDto> CreateAsync(CreatePatientDto input)
{
    var patient = new Patient(GuidGenerator.Create(), input.Name);
    await _writeRepository.InsertAsync(patient);
    return ObjectMapper.Map&#x3C;Patient, PatientDto>(patient);
}

}

Database Sharding

┌─────────────────────────────────────────────────────────────┐ │ Shard Router │ │ (Routes queries based on shard key) │ └────────────┬──────────────┬──────────────┬─────────────────┘ │ │ │ ▼ ▼ ▼ ┌───────────┐ ┌───────────┐ ┌───────────┐ │ Shard 1 │ │ Shard 2 │ │ Shard 3 │ │ A - H │ │ I - P │ │ Q - Z │ │ (Users) │ │ (Users) │ │ (Users) │ └───────────┘ └───────────┘ └───────────┘

Sharding Strategy Pros Cons

Range-based Simple, range queries work Hotspots possible

Hash-based Even distribution Range queries need scatter-gather

Directory-based Flexible Lookup overhead, SPOF

Geographic Data locality Cross-region queries slow

Caching Patterns

Cache-Aside (Lazy Loading)

public class PatientService { private readonly IDistributedCache _cache; private readonly IPatientRepository _repository;

public async Task&#x3C;PatientDto> GetAsync(Guid id)
{
    var cacheKey = $"patient:{id}";

    // 1. Check cache
    var cached = await _cache.GetStringAsync(cacheKey);
    if (cached != null)
    {
        return JsonSerializer.Deserialize&#x3C;PatientDto>(cached);
    }

    // 2. Cache miss - load from DB
    var patient = await _repository.GetAsync(id);
    var dto = ObjectMapper.Map&#x3C;Patient, PatientDto>(patient);

    // 3. Populate cache
    await _cache.SetStringAsync(
        cacheKey,
        JsonSerializer.Serialize(dto),
        new DistributedCacheEntryOptions
        {
            AbsoluteExpirationRelativeToNow = TimeSpan.FromMinutes(10)
        });

    return dto;
}

public async Task UpdateAsync(Guid id, UpdatePatientDto input)
{
    // Update database
    var patient = await _repository.GetAsync(id);
    patient.Update(input.Name, input.Email);
    await _repository.UpdateAsync(patient);

    // Invalidate cache
    await _cache.RemoveAsync($"patient:{id}");
}

}

Write-Through Cache

public async Task<PatientDto> CreateAsync(CreatePatientDto input) { // 1. Write to database var patient = new Patient(GuidGenerator.Create(), input.Name); await _repository.InsertAsync(patient);

// 2. Write to cache synchronously
var dto = ObjectMapper.Map&#x3C;Patient, PatientDto>(patient);
await _cache.SetStringAsync(
    $"patient:{patient.Id}",
    JsonSerializer.Serialize(dto),
    new DistributedCacheEntryOptions
    {
        AbsoluteExpirationRelativeToNow = TimeSpan.FromMinutes(10)
    });

return dto;

}

Cache Strategies Comparison

Pattern Consistency Performance Use Case

Cache-Aside Eventual Read-heavy User profiles

Write-Through Strong Write + Read Financial data

Write-Behind Eventual Write-heavy Analytics, logs

Read-Through Eventual Read-heavy Reference data

Reliability Patterns

Circuit Breaker

// Using Polly public class ExternalServiceClient { private readonly HttpClient _client; private readonly AsyncCircuitBreakerPolicy _circuitBreaker;

public ExternalServiceClient(HttpClient client)
{
    _client = client;
    _circuitBreaker = Policy
        .Handle&#x3C;HttpRequestException>()
        .CircuitBreakerAsync(
            exceptionsAllowedBeforeBreaking: 5,
            durationOfBreak: TimeSpan.FromSeconds(30),
            onBreak: (ex, duration) =>
                Log.Warning("Circuit opened for {Duration}s", duration.TotalSeconds),
            onReset: () =>
                Log.Information("Circuit closed"),
            onHalfOpen: () =>
                Log.Information("Circuit half-open, testing...")
        );
}

public async Task&#x3C;T> GetAsync&#x3C;T>(string endpoint)
{
    return await _circuitBreaker.ExecuteAsync(async () =>
    {
        var response = await _client.GetAsync(endpoint);
        response.EnsureSuccessStatusCode();
        return await response.Content.ReadFromJsonAsync&#x3C;T>();
    });
}

}

Retry with Exponential Backoff

var retryPolicy = Policy .Handle<HttpRequestException>() .WaitAndRetryAsync( retryCount: 3, sleepDurationProvider: attempt => TimeSpan.FromSeconds(Math.Pow(2, attempt)), // 2, 4, 8 seconds onRetry: (ex, delay, attempt, context) => Log.Warning("Retry {Attempt} after {Delay}s: {Error}", attempt, delay.TotalSeconds, ex.Message) );

Bulkhead Pattern

// Isolate failures to prevent cascade var bulkhead = Policy.BulkheadAsync( maxParallelization: 10, // Max concurrent executions maxQueuingActions: 20, // Max queued requests onBulkheadRejectedAsync: context => { Log.Warning("Bulkhead rejected request"); return Task.CompletedTask; } );

Event-Driven Architecture

Message Queue Pattern

┌─────────┐ ┌─────────────┐ ┌─────────────┐ │ Service │───►│ Message │───►│ Consumer │ │ A │ │ Queue │ │ Service B │ └─────────┘ │ │ └─────────────┘ │ (RabbitMQ, │ │ Kafka, │ ┌─────────────┐ │ Azure SB) │───►│ Consumer │ └─────────────┘ │ Service C │ └─────────────┘

Event Sourcing

// Store events, not state public class PatientAggregate { private readonly List<IDomainEvent> _events = new();

public Guid Id { get; private set; }
public string Name { get; private set; }
public PatientStatus Status { get; private set; }

public void Apply(PatientCreated @event)
{
    Id = @event.PatientId;
    Name = @event.Name;
    Status = PatientStatus.Active;
    _events.Add(@event);
}

public void Apply(PatientNameChanged @event)
{
    Name = @event.NewName;
    _events.Add(@event);
}

// Rebuild state from events
public static PatientAggregate FromEvents(IEnumerable&#x3C;IDomainEvent> events)
{
    var patient = new PatientAggregate();
    foreach (var @event in events)
    {
        patient.Apply((dynamic)@event);
    }
    return patient;
}

}

Quick Reference: Design Trade-offs

Decision Option A Option B Consider

Storage SQL NoSQL Data structure, consistency needs

Caching Redis In-memory Distributed needs, size

Communication Sync (HTTP) Async (Queue) Coupling, latency tolerance

Consistency Strong Eventual Business requirements

Scaling Vertical Horizontal Cost, complexity, limits

System Design Checklist

Requirements: Functional + Non-functional defined
Scale: Expected users, requests/sec, data volume
Availability: Uptime target (99.9% = 8.76h downtime/year)
Latency: P50, P95, P99 targets
Data: Storage type, retention, backup strategy
Caching: What to cache, invalidation strategy
Security: Auth, encryption, compliance
Monitoring: Metrics, logging, alerting
Failure modes: What happens when X fails?
Cost: Infrastructure, operational overhead

Related Skills

technical-design-patterns
Document designs
api-design-principles
API architecture
distributed-events-advanced
Event patterns

system-design-patterns

Safety Notice

Copy this and send it to your AI assistant to learn

Source Transparency

Related Skills

abp-infrastructure-patterns

abp-entity-patterns

abp-api-implementation

abp-service-patterns