LLM Streaming Response Handler
Expert in building production-grade streaming interfaces for LLM responses that feel instant and responsive.
When to Use
✅ Use for:
-
Chat interfaces with typing animation
-
Real-time AI assistants
-
Code generation with live preview
-
Document summarization with progressive display
-
Any UI where users expect immediate feedback from LLMs
❌ NOT for:
-
Batch document processing (no user watching)
-
APIs that don't support streaming
-
WebSocket-based bidirectional chat (use Socket.IO)
-
Simple request/response (fetch is fine)
Quick Decision Tree
Does your LLM interaction: ├── Need immediate visual feedback? → Streaming ├── Display long-form content (>100 words)? → Streaming ├── User expects typewriter effect? → Streaming ├── Short response (<50 words)? → Regular fetch └── Background processing? → Regular fetch
Technology Selection
Server-Sent Events (SSE) - Recommended
Why SSE over WebSockets for LLM streaming:
-
Simplicity: HTTP-based, works with existing infrastructure
-
Auto-reconnect: Built-in reconnection logic
-
Firewall-friendly: Easier than WebSockets through proxies
-
One-way perfect: LLMs only stream server → client
Timeline:
-
2015-2020: WebSockets for everything
-
2020: SSE adoption for streaming APIs
-
2023+: SSE standard for LLM streaming (OpenAI, Anthropic)
-
2024: Vercel AI SDK popularizes SSE patterns
Streaming APIs
Provider Streaming Method Response Format
OpenAI SSE data: {"choices":[{"delta":{"content":"token"}}]}
Anthropic SSE data: {"type":"content_block_delta","delta":{"text":"token"}}
Claude (API) SSE data: {"delta":{"text":"token"}}
Vercel AI SDK SSE Normalized across providers
Common Anti-Patterns
Anti-Pattern 1: Buffering Before Display
Novice thinking: "Collect all tokens, then show complete response"
Problem: Defeats the entire purpose of streaming.
Wrong approach:
// ❌ Waits for entire response before showing anything const response = await fetch('/api/chat', { method: 'POST', body: prompt }); const fullText = await response.text(); setMessage(fullText); // User sees nothing until done
Correct approach:
// ✅ Display tokens as they arrive const response = await fetch('/api/chat', { method: 'POST', body: JSON.stringify({ prompt }) });
const reader = response.body.getReader(); const decoder = new TextDecoder();
while (true) { const { done, value } = await reader.read(); if (done) break;
const chunk = decoder.decode(value); const lines = chunk.split('\n').filter(line => line.trim());
for (const line of lines) { if (line.startsWith('data: ')) { const data = JSON.parse(line.slice(6)); setMessage(prev => prev + data.content); // Update immediately } } }
Timeline:
-
Pre-2023: Many apps buffered entire response
-
2023+: Token-by-token display expected
Anti-Pattern 2: No Stream Cancellation
Problem: User can't stop generation, wasting tokens and money.
Symptom: "Stop" button doesn't work or doesn't exist.
Correct approach:
// ✅ AbortController for cancellation const [abortController, setAbortController] = useState<AbortController | null>(null);
const streamResponse = async () => { const controller = new AbortController(); setAbortController(controller);
try { const response = await fetch('/api/chat', { signal: controller.signal, method: 'POST', body: JSON.stringify({ prompt }) });
// Stream handling...
} catch (error) { if (error.name === 'AbortError') { console.log('Stream cancelled by user'); } } finally { setAbortController(null); } };
const cancelStream = () => { abortController?.abort(); };
return ( <button onClick={cancelStream} disabled={!abortController}> Stop Generating </button> );
Anti-Pattern 3: No Error Recovery
Problem: Stream fails mid-response, user sees partial text with no indication of failure.
Correct approach:
// ✅ Error states and recovery const [streamState, setStreamState] = useState<'idle' | 'streaming' | 'error' | 'complete'>('idle'); const [errorMessage, setErrorMessage] = useState<string | null>(null);
try { setStreamState('streaming');
// Streaming logic...
setStreamState('complete'); } catch (error) { setStreamState('error');
if (error.name === 'AbortError') { setErrorMessage('Generation stopped'); } else if (error.message.includes('429')) { setErrorMessage('Rate limit exceeded. Try again in a moment.'); } else { setErrorMessage('Something went wrong. Please retry.'); } }
// UI feedback {streamState === 'error' && ( <div className="error-banner"> {errorMessage} <button onClick={retryStream}>Retry</button> </div> )}
Anti-Pattern 4: Memory Leaks from Unclosed Streams
Problem: Streams not cleaned up, causing memory leaks.
Symptom: Browser slows down after multiple requests.
Correct approach:
// ✅ Cleanup with useEffect useEffect(() => { let reader: ReadableStreamDefaultReader | null = null;
const streamResponse = async () => { const response = await fetch('/api/chat', { ... }); reader = response.body.getReader();
// Streaming...
};
streamResponse();
// Cleanup on unmount return () => { reader?.cancel(); }; }, [prompt]);
Anti-Pattern 5: No Typing Indicator Between Tokens
Problem: UI feels frozen between slow tokens.
Correct approach:
// ✅ Animated cursor during generation <div className="message"> {content} {isStreaming && <span className="typing-cursor">▊</span>} </div>
.typing-cursor { animation: blink 1s step-end infinite; }
@keyframes blink { 50% { opacity: 0; } }
Implementation Patterns
Pattern 1: Basic SSE Stream Handler
async function* streamCompletion(prompt: string) { const response = await fetch('/api/chat', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ prompt }) });
const reader = response.body!.getReader(); const decoder = new TextDecoder();
while (true) { const { done, value } = await reader.read(); if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n');
for (const line of lines) {
if (line.startsWith('data: ')) {
const data = JSON.parse(line.slice(6));
if (data.content) {
yield data.content;
}
if (data.done) {
return;
}
}
}
} }
// Usage for await (const token of streamCompletion('Hello')) { console.log(token); }
Pattern 2: React Hook for Streaming
import { useState, useCallback } from 'react';
interface UseStreamingOptions { onToken?: (token: string) => void; onComplete?: (fullText: string) => void; onError?: (error: Error) => void; }
export function useStreaming(options: UseStreamingOptions = {}) { const [content, setContent] = useState(''); const [isStreaming, setIsStreaming] = useState(false); const [error, setError] = useState<Error | null>(null); const [abortController, setAbortController] = useState<AbortController | null>(null);
const stream = useCallback(async (prompt: string) => { const controller = new AbortController(); setAbortController(controller); setIsStreaming(true); setError(null); setContent('');
try {
const response = await fetch('/api/chat', {
method: 'POST',
signal: controller.signal,
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ prompt })
});
const reader = response.body!.getReader();
const decoder = new TextDecoder();
let accumulated = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n').filter(line => line.trim());
for (const line of lines) {
if (line.startsWith('data: ')) {
const data = JSON.parse(line.slice(6));
if (data.content) {
accumulated += data.content;
setContent(accumulated);
options.onToken?.(data.content);
}
}
}
}
options.onComplete?.(accumulated);
} catch (err) {
if (err.name !== 'AbortError') {
setError(err as Error);
options.onError?.(err as Error);
}
} finally {
setIsStreaming(false);
setAbortController(null);
}
}, [options]);
const cancel = useCallback(() => { abortController?.abort(); }, [abortController]);
return { content, isStreaming, error, stream, cancel }; }
// Usage in component function ChatInterface() { const { content, isStreaming, stream, cancel } = useStreaming({ onToken: (token) => console.log('New token:', token), onComplete: (text) => console.log('Done:', text) });
return ( <div> <div className="message"> {content} {isStreaming && <span className="cursor">▊</span>} </div>
<button onClick={() => stream('Tell me a story')} disabled={isStreaming}>
Generate
</button>
{isStreaming && <button onClick={cancel}>Stop</button>}
</div>
); }
Pattern 3: Server-Side Streaming (Next.js)
// app/api/chat/route.ts import { OpenAI } from 'openai';
export const runtime = 'edge'; // Required for streaming
export async function POST(req: Request) { const { prompt } = await req.json();
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const stream = await openai.chat.completions.create({ model: 'gpt-4', messages: [{ role: 'user', content: prompt }], stream: true });
// Convert OpenAI stream to SSE format const encoder = new TextEncoder();
const readable = new ReadableStream({ async start(controller) { try { for await (const chunk of stream) { const content = chunk.choices[0]?.delta?.content;
if (content) {
const sseMessage = `data: ${JSON.stringify({ content })}\n\n`;
controller.enqueue(encoder.encode(sseMessage));
}
}
// Send completion signal
controller.enqueue(encoder.encode('data: {"done":true}\n\n'));
controller.close();
} catch (error) {
controller.error(error);
}
}
});
return new Response(readable, { headers: { 'Content-Type': 'text/event-stream', 'Cache-Control': 'no-cache', 'Connection': 'keep-alive' } }); }
Production Checklist
□ AbortController for cancellation □ Error states with retry capability □ Typing indicator during generation □ Cleanup on component unmount □ Rate limiting on API route □ Token usage tracking □ Streaming fallback (if API fails) □ Accessibility (screen reader announces updates) □ Mobile-friendly (touch targets for stop button) □ Network error recovery (auto-retry on disconnect) □ Max response length enforcement □ Cost estimation before generation
When to Use vs Avoid
Scenario Use Streaming?
Chat interface ✅ Yes
Long-form content generation ✅ Yes
Code generation with preview ✅ Yes
Short completions (<50 words) ❌ No - regular fetch
Background jobs ❌ No - use job queue
Bidirectional chat ⚠️ Use WebSockets instead
Technology Comparison
Feature SSE WebSockets Long Polling
Complexity Low Medium High
Auto-reconnect ✅ ❌ ❌
Bidirectional ❌ ✅ ❌
Firewall-friendly ✅ ⚠️ ✅
Browser support ✅ All modern ✅ All modern ✅ Universal
LLM API support ✅ Standard ❌ Rare ❌ Not used
References
-
/references/sse-protocol.md
-
Server-Sent Events specification details
-
/references/vercel-ai-sdk.md
-
Vercel AI SDK integration patterns
-
/references/error-recovery.md
-
Stream error handling strategies
Scripts
-
scripts/stream_tester.ts
-
Test SSE endpoints locally
-
scripts/token_counter.ts
-
Estimate costs before generation
This skill guides: LLM streaming implementation | SSE protocol | Real-time UI updates | Cancellation | Error recovery | Token-by-token display