websocket-engineer

WebSocket & Real-Time Engineer

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "websocket-engineer" with this command: npx skills add neversight/skills_feed/neversight-skills-feed-websocket-engineer

WebSocket & Real-Time Engineer

Purpose

Provides real-time communication expertise specializing in WebSocket architecture, Socket.IO, and event-driven systems. Builds low-latency, bidirectional communication systems scaling to millions of concurrent connections.

When to Use

  • Building chat apps, live dashboards, or multiplayer games

  • Scaling WebSocket servers horizontally (Redis Adapter)

  • Implementing "Server-Sent Events" (SSE) for one-way updates

  • Troubleshooting connection drops, heartbeat failures, or CORS issues

  • Designing stateful connection architectures

  • Migrating from polling to push technology

Examples

Example 1: Real-Time Chat Application

Scenario: Building a scalable chat platform for enterprise use.

Implementation:

  • Designed WebSocket architecture with Socket.IO

  • Implemented Redis Adapter for horizontal scaling

  • Created room-based message routing

  • Added message persistence and history

  • Implemented presence system (online/offline)

Results:

  • Supports 100,000+ concurrent connections

  • 50ms average message delivery

  • 99.99% connection stability

  • Seamless horizontal scaling

Example 2: Live Dashboard System

Scenario: Real-time analytics dashboard with sub-second updates.

Implementation:

  • Implemented WebSocket server with low latency

  • Created efficient message batching strategy

  • Added Redis pub/sub for multi-server support

  • Implemented client-side update coalescing

  • Added compression for large payloads

Results:

  • Dashboard updates in under 100ms

  • Handles 10,000 concurrent dashboard views

  • 80% reduction in server load vs polling

  • Zero data loss during reconnections

Example 3: Multiplayer Game Backend

Scenario: Low-latency multiplayer game server.

Implementation:

  • Implemented WebSocket server with binary protocols

  • Created authoritative server architecture

  • Added client-side prediction and reconciliation

  • Implemented lag compensation algorithms

  • Set up server-side physics and collision detection

Results:

  • 30ms end-to-end latency

  • Supports 1000 concurrent players per server

  • Smooth gameplay despite network variations

  • Cheat-resistant server authority

Best Practices

Connection Management

  • Heartbeats: Implement ping/pong for connection health

  • Reconnection: Automatic reconnection with backoff

  • State Cleanup: Proper cleanup on disconnect

  • Connection Limits: Prevent resource exhaustion

Scaling

  • Horizontal Scaling: Use Redis Adapter for multi-server

  • Sticky Sessions: Proper load balancer configuration

  • Message Routing: Efficient routing for broadcast/unicast

  • Rate Limiting: Prevent abuse and overload

Performance

  • Message Batching: Batch messages where appropriate

  • Compression: Compress messages (permessage-deflate)

  • Binary Protocols: Use binary for performance-critical data

  • Connection Pooling: Efficient client connection reuse

Security

  • Authentication: Validate on handshake

  • TLS: Always use WSS

  • Input Validation: Validate all incoming messages

  • Rate Limiting: Limit connection/message rates

  1. Decision Framework

Protocol Selection

What is the communication pattern? │ ├─ Bi-directional (Chat/Game) │ ├─ Low Latency needed? → WebSockets (Raw) │ ├─ Fallbacks/Auto-reconnect needed? → Socket.IO │ └─ P2P Video/Audio? → WebRTC │ ├─ One-way (Server → Client) │ ├─ Stock Ticker / Notifications? → Server-Sent Events (SSE) │ └─ Large File Download? → HTTP Stream │ └─ High Frequency (IoT) └─ Constrained device? → MQTT (over TCP/WS)

Scaling Strategy

Scale Architecture Backend

< 10k Users Monolith Node.js Single Instance

10k - 100k Clustering Node.js Cluster + Redis Adapter

100k - 1M Microservices Go/Elixir/Rust + NATS/Kafka

Global Edge Cloudflare Workers / PubNub / Pusher

Load Balancer Config

  • Sticky Sessions: REQUIRED for Socket.IO (handshake phase).

  • Timeouts: Increase idle timeouts (e.g., 60s+).

  • Headers: Upgrade: websocket , Connection: Upgrade .

Red Flags → Escalate to security-engineer :

  • Accepting connections from any Origin (* ) with credentials

  • No Rate Limiting on connection requests (DoS risk)

  • Sending JWTs in URL query params (Logged in proxy logs) - Use Cookie or Initial Message instead

  1. Core Workflows

Workflow 1: Scalable Socket.IO Server (Node.js)

Goal: Chat server capable of scaling across multiple cores/instances.

Steps:

Install Dependencies

npm install socket.io redis @socket.io/redis-adapter

Implementation (server.js )

const { Server } = require("socket.io"); const { createClient } = require("redis"); const { createAdapter } = require("@socket.io/redis-adapter");

const pubClient = createClient({ url: "redis://localhost:6379" }); const subClient = pubClient.duplicate();

Promise.all([pubClient.connect(), subClient.connect()]).then(() => { const io = new Server(3000, { adapter: createAdapter(pubClient, subClient), cors: { origin: "https://myapp.com", methods: ["GET", "POST"] } });

io.on("connection", (socket) => { // User joins a room (e.g., "chat-123") socket.on("join", (room) => { socket.join(room); });

// Send message to room (propagates via Redis to all nodes)
socket.on("message", (data) => {
  io.to(data.room).emit("chat", data.text);
});

}); });

Workflow 3: Production Tuning (Linux)

Goal: Handle 50k concurrent connections on a single server.

Steps:

File Descriptors

  • Increase limit: ulimit -n 65535 .

  • Edit /etc/security/limits.conf .

Ephemeral Ports

  • Increase range: sysctl -w net.ipv4.ip_local_port_range="1024 65535" .

Memory Optimization

  • Use ws (lighter) instead of Socket.IO if features not needed.

  • Disable "Per-Message Deflate" (Compression) if CPU is high.

  1. Anti-Patterns & Gotchas

❌ Anti-Pattern 1: Stateful Monolith

What it looks like:

  • Storing users = [] array in Node.js memory.

Why it fails:

  • When you scale to 2 servers, User A on Server 1 cannot talk to User B on Server 2.

  • Memory leaks crash the process.

Correct approach:

  • Use Redis as the state store (Adapter).

  • Stateless servers, Stateful backend (Redis).

❌ Anti-Pattern 2: The "Thundering Herd"

What it looks like:

  • Server restarts. 100,000 clients reconnect instantly.

  • Server crashes again due to CPU spike.

Why it fails:

  • Connection handshakes are expensive (TLS + Auth).

Correct approach:

  • Randomized Jitter: Clients wait random(0, 10s) before reconnecting.

  • Exponential Backoff: Wait 1s, then 2s, then 4s...

❌ Anti-Pattern 3: Blocking the Event Loop

What it looks like:

  • socket.on('message', () => { heavyCalculation(); })

Why it fails:

  • Node.js is single-threaded. One heavy task blocks all 10,000 connections.

Correct approach:

  • Offload work to a Worker Thread or Message Queue (RabbitMQ/Bull).
  1. Quality Checklist

Scalability:

  • Adapter: Redis/NATS adapter configured for multi-node.

  • Load Balancer: Sticky sessions enabled (if using polling fallback).

  • OS Limits: File descriptors limit increased.

Resilience:

  • Reconnection: Exponential backoff + Jitter implemented.

  • Heartbeat: Ping/Pong interval configured (< LB timeout).

  • Fallback: Socket.IO fallbacks (HTTP Long Polling) enabled/tested.

Security:

  • WSS: TLS enabled (Secure WebSockets).

  • Auth: Handshake validates credentials properly.

  • Rate Limit: Connection rate limiting active.

Anti-Patterns

Connection Management Anti-Patterns

  • No Heartbeats: Not detecting dead connections - implement ping/pong

  • Memory Leaks: Not cleaning up closed connections - implement proper cleanup

  • Infinite Reconnects: Reloop without backoff - implement exponential backoff

  • Sticky Sessions Required: Not designing for stateless - use Redis for state

Scaling Anti-Patterns

  • Single Server: Not scaling beyond one instance - use Redis adapter

  • No Load Balancing: Direct connections to servers - use proper load balancer

  • Broadcast Storm: Sending to all connections blindly - target specific connections

  • Connection Saturation: Too many connections per server - scale horizontally

Performance Anti-Patterns

  • Message Bloat: Large unstructured messages - use efficient message formats

  • No Throttling: Unlimited send rates - implement rate limiting

  • Blocking Operations: Synchronous processing - use async processing

  • No Monitoring: Operating blind - implement connection metrics

Security Anti-Patterns

  • No TLS: Using unencrypted connections - always use WSS

  • Weak Auth: Simple token validation - implement proper authentication

  • No Rate Limits: Vulnerable to abuse - implement connection/message limits

  • CORS Exposed: Open cross-origin access - configure proper CORS

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

ui-designer

No summary provided by upstream source.

Repository SourceNeeds Review
General

react-best-practices

No summary provided by upstream source.

Repository SourceNeeds Review
General

ai-image-generation

No summary provided by upstream source.

Repository SourceNeeds Review
General

tool-discovery

No summary provided by upstream source.

Repository SourceNeeds Review