Performance Optimization
Deep performance optimization covering web performance (Lighthouse, Core Web Vitals) and LLM API cost optimization. Focuses on loading speed, runtime efficiency, resource optimization, and intelligent model routing.
How it works
- Identify performance bottlenecks in code, assets, and API usage
- Prioritize by impact on Core Web Vitals and cost
- Provide specific optimizations with code examples
- Measure improvement with before/after metrics
Performance budget
| Resource | Budget | Rationale |
|---|---|---|
| Total page weight | < 1.5 MB | 3G loads in ~4s |
| JavaScript (compressed) | < 300 KB | Parsing + execution time |
| CSS (compressed) | < 100 KB | Render blocking |
| Images (above-fold) | < 500 KB | LCP impact |
| Fonts | < 100 KB | FOIT/FOUT prevention |
| Third-party | < 200 KB | Uncontrolled latency |
Critical rendering path
Server response
- TTFB < 800ms. Time to First Byte should be fast. Use CDN, caching, and efficient backends.
- Enable compression. Gzip or Brotli for text assets. Brotli preferred (15-20% smaller).
- HTTP/2 or HTTP/3. Multiplexing reduces connection overhead.
- Edge caching. Cache HTML at CDN edge when possible.
Resource loading
Preconnect to required origins:
<link rel="preconnect" href="https://fonts.googleapis.com">
<link rel="preconnect" href="https://cdn.example.com" crossorigin>
Preload critical resources:
<!-- LCP image -->
<link rel="preload" href="/hero.webp" as="image" fetchpriority="high">
<!-- Critical font -->
<link rel="preload" href="/font.woff2" as="font" type="font/woff2" crossorigin>
Defer non-critical CSS:
<!-- Critical CSS inlined -->
<style>/* Above-fold styles */</style>
<!-- Non-critical CSS -->
<link rel="preload" href="/styles.css" as="style" onload="this.onload=null;this.rel='stylesheet'">
<noscript><link rel="stylesheet" href="/styles.css"></noscript>
JavaScript optimization
Defer non-essential scripts:
<!-- Parser-blocking (avoid) -->
<script src="/critical.js"></script>
<!-- Deferred (preferred) -->
<script defer src="/app.js"></script>
<!-- Async (for independent scripts) -->
<script async src="/analytics.js"></script>
<!-- Module (deferred by default) -->
<script type="module" src="/app.mjs"></script>
Code splitting patterns:
// Route-based splitting
const Dashboard = lazy(() => import('./Dashboard'));
// Component-based splitting
const HeavyChart = lazy(() => import('./HeavyChart'));
// Feature-based splitting
if (user.isPremium) {
const PremiumFeatures = await import('./PremiumFeatures');
}
Tree shaking best practices:
// ❌ Imports entire library
import _ from 'lodash';
_.debounce(fn, 300);
// ✅ Imports only what's needed
import debounce from 'lodash/debounce';
debounce(fn, 300);
Image optimization
Format selection
| Format | Use case | Browser support |
|---|---|---|
| AVIF | Photos, best compression | 92%+ |
| WebP | Photos, good fallback | 97%+ |
| PNG | Graphics with transparency | Universal |
| SVG | Icons, logos, illustrations | Universal |
Responsive images
<picture>
<!-- AVIF for modern browsers -->
<source
type="image/avif"
srcset="hero-400.avif 400w,
hero-800.avif 800w,
hero-1200.avif 1200w"
sizes="(max-width: 600px) 100vw, 50vw">
<!-- WebP fallback -->
<source
type="image/webp"
srcset="hero-400.webp 400w,
hero-800.webp 800w,
hero-1200.webp 1200w"
sizes="(max-width: 600px) 100vw, 50vw">
<!-- JPEG fallback -->
<img
src="hero-800.jpg"
srcset="hero-400.jpg 400w,
hero-800.jpg 800w,
hero-1200.jpg 1200w"
sizes="(max-width: 600px) 100vw, 50vw"
width="1200"
height="600"
alt="Hero image"
loading="lazy"
decoding="async">
</picture>
LCP image priority
<!-- Above-fold LCP image: eager loading, high priority -->
<img
src="hero.webp"
fetchpriority="high"
loading="eager"
decoding="sync"
alt="Hero">
<!-- Below-fold images: lazy loading -->
<img
src="product.webp"
loading="lazy"
decoding="async"
alt="Product">
Font optimization
Loading strategy
/* System font stack as fallback */
body {
font-family: 'Custom Font', -apple-system, BlinkMacSystemFont,
'Segoe UI', Roboto, sans-serif;
}
/* Prevent invisible text */
@font-face {
font-family: 'Custom Font';
src: url('/fonts/custom.woff2') format('woff2');
font-display: swap; /* or optional for non-critical */
font-weight: 400;
font-style: normal;
unicode-range: U+0000-00FF; /* Subset to Latin */
}
Preloading critical fonts
<link rel="preload" href="/fonts/heading.woff2" as="font" type="font/woff2" crossorigin>
Variable fonts
/* One file instead of multiple weights */
@font-face {
font-family: 'Inter';
src: url('/fonts/Inter-Variable.woff2') format('woff2-variations');
font-weight: 100 900;
font-display: swap;
}
Caching strategy
Cache-Control headers
# HTML (short or no cache)
Cache-Control: no-cache, must-revalidate
# Static assets with hash (immutable)
Cache-Control: public, max-age=31536000, immutable
# Static assets without hash
Cache-Control: public, max-age=86400, stale-while-revalidate=604800
# API responses
Cache-Control: private, max-age=0, must-revalidate
Service worker caching
// Cache-first for static assets
self.addEventListener('fetch', (event) => {
if (event.request.destination === 'image' ||
event.request.destination === 'style' ||
event.request.destination === 'script') {
event.respondWith(
caches.match(event.request).then((cached) => {
return cached || fetch(event.request).then((response) => {
const clone = response.clone();
caches.open('static-v1').then((cache) => cache.put(event.request, clone));
return response;
});
})
);
}
});
Runtime performance
Avoid layout thrashing
// ❌ Forces multiple reflows
elements.forEach(el => {
const height = el.offsetHeight; // Read
el.style.height = height + 10 + 'px'; // Write
});
// ✅ Batch reads, then batch writes
const heights = elements.map(el => el.offsetHeight); // All reads
elements.forEach((el, i) => {
el.style.height = heights[i] + 10 + 'px'; // All writes
});
Debounce expensive operations
function debounce(fn, delay) {
let timeout;
return (...args) => {
clearTimeout(timeout);
timeout = setTimeout(() => fn(...args), delay);
};
}
// Debounce scroll/resize handlers
window.addEventListener('scroll', debounce(handleScroll, 100));
Use requestAnimationFrame
// ❌ May cause jank
setInterval(animate, 16);
// ✅ Synced with display refresh
function animate() {
// Animation logic
requestAnimationFrame(animate);
}
requestAnimationFrame(animate);
Virtualize long lists
// For lists > 100 items, render only visible items
// Use libraries like react-window, vue-virtual-scroller, or native CSS:
.virtual-list {
content-visibility: auto;
contain-intrinsic-size: 0 50px; /* Estimated item height */
}
Third-party scripts
Load strategies
// ❌ Blocks main thread
<script src="https://analytics.example.com/script.js"></script>
// ✅ Async loading
<script async src="https://analytics.example.com/script.js"></script>
// ✅ Delay until interaction
<script>
document.addEventListener('DOMContentLoaded', () => {
const observer = new IntersectionObserver((entries) => {
if (entries[0].isIntersecting) {
const script = document.createElement('script');
script.src = 'https://widget.example.com/embed.js';
document.body.appendChild(script);
observer.disconnect();
}
});
observer.observe(document.querySelector('#widget-container'));
});
</script>
Facade pattern
<!-- Show static placeholder until interaction -->
<div class="youtube-facade"
data-video-id="abc123"
onclick="loadYouTube(this)">
<img src="/thumbnails/abc123.jpg" alt="Video title">
<button aria-label="Play video">▶</button>
</div>
Measurement
Key metrics
| Metric | Target | Tool |
|---|---|---|
| LCP | < 2.5s | Lighthouse, CrUX |
| FCP | < 1.8s | Lighthouse |
| Speed Index | < 3.4s | Lighthouse |
| TBT | < 200ms | Lighthouse |
| TTI | < 3.8s | Lighthouse |
Testing commands
# Lighthouse CLI
npx lighthouse https://example.com --output html --output-path report.html
# Web Vitals library
import {onLCP, onINP, onCLS} from 'web-vitals';
onLCP(console.log);
onINP(console.log);
onCLS(console.log);
References
For Core Web Vitals specific optimizations, see Core Web Vitals.
LLM Cost Optimization
Patterns for controlling LLM API costs while maintaining quality. Combines model routing, budget tracking, retry logic, and prompt caching into a composable pipeline.
When to Use
- Building applications that call LLM APIs (Claude, GPT, etc.)
- Processing batches of items with varying complexity
- Need to stay within a budget for API spend
- Optimizing cost without sacrificing quality on complex tasks
Core Concepts
1. Model Routing by Task Complexity
Automatically select cheaper models for simple tasks, reserving expensive models for complex ones.
const MODEL_SONNET = "claude-sonnet-4-6";
const MODEL_HAIKU = "claude-haiku-4-5-20251001";
const SONNET_TEXT_THRESHOLD = 10000; // chars
const SONNET_ITEM_THRESHOLD = 30; // items
function selectModel(
textLength: number,
itemCount: number,
forceModel?: string
): string {
if (forceModel) return forceModel;
if (textLength >= SONNET_TEXT_THRESHOLD || itemCount >= SONNET_ITEM_THRESHOLD) {
return MODEL_SONNET; // Complex task
}
return MODEL_HAIKU; // Simple task (3-4x cheaper)
}
2. Immutable Cost Tracking
Track cumulative spend with frozen records. Each API call returns a new tracker — never mutates state.
interface CostRecord {
model: string;
inputTokens: number;
outputTokens: number;
costUsd: number;
}
interface CostTracker {
budgetLimit: number;
records: CostRecord[];
totalCost: number;
overBudget: boolean;
}
function createTracker(budgetLimit = 1.00): CostTracker {
return {
budgetLimit,
records: [],
totalCost: 0,
overBudget: false
};
}
function addCost(tracker: CostTracker, record: CostRecord): CostTracker {
const newTotal = tracker.totalCost + record.costUsd;
return {
...tracker,
records: [...tracker.records, record],
totalCost: newTotal,
overBudget: newTotal > tracker.budgetLimit
};
}
3. Narrow Retry Logic
Retry only on transient errors. Fail fast on authentication or bad request errors.
const RETRYABLE_ERRORS = [
"APIConnectionError",
"RateLimitError",
"InternalServerError"
];
const MAX_RETRIES = 3;
async function callWithRetry<T>(
fn: () => Promise<T>,
maxRetries = MAX_RETRIES
): Promise<T> {
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
return await fn();
} catch (error) {
const errorName = error.constructor.name;
if (!RETRYABLE_ERRORS.includes(errorName) || attempt === maxRetries - 1) {
throw error;
}
await sleep(Math.pow(2, attempt) * 1000); // Exponential backoff
}
}
throw new Error("Max retries exceeded");
}
4. Prompt Caching
Cache long system prompts to avoid resending them on every request.
interface CachedMessage {
role: "user";
content: Array<{
type: "text";
text: string;
cache_control?: { type: "ephemeral" };
}>;
}
function buildCachedMessages(
systemPrompt: string,
userInput: string
): CachedMessage {
return {
role: "user",
content: [
{
type: "text",
text: systemPrompt,
cache_control: { type: "ephemeral" } // Cache this
},
{
type: "text",
text: userInput // Variable part
}
]
};
}
Complete Pipeline Example
async function processWithCostControl(
text: string,
systemPrompt: string,
tracker: CostTracker
): Promise<{ result: string; tracker: CostTracker }> {
// 1. Route model based on complexity
const model = selectModel(text.length, estimateItems(text));
// 2. Check budget
if (tracker.overBudget) {
throw new Error(`Budget exceeded: $${tracker.totalCost.toFixed(2)}`);
}
// 3. Call with retry + caching
const response = await callWithRetry(() =>
anthropic.messages.create({
model,
messages: [buildCachedMessages(systemPrompt, text)]
})
);
// 4. Track cost (immutable)
const record: CostRecord = {
model,
inputTokens: response.usage.input_tokens,
outputTokens: response.usage.output_tokens,
costUsd: calculateCost(model, response.usage)
};
const newTracker = addCost(tracker, record);
return { result: response.content[0].text, tracker: newTracker };
}
Pricing Reference (2025-2026)
| Model | Input ($/1M tokens) | Output ($/1M tokens) | Relative Cost |
|---|---|---|---|
| Haiku 4.5 | $0.80 | $4.00 | 1x |
| Sonnet 4.6 | $3.00 | $15.00 | ~4x |
| Opus 4.5 | $15.00 | $75.00 | ~19x |
Best Practices
- Start with the cheapest model and only route to expensive models when complexity thresholds are met
- Set explicit budget limits before processing batches — fail early rather than overspend
- Log model selection decisions so you can tune thresholds based on real data
- Use prompt caching for system prompts over 1024 tokens — saves both cost and latency
- Never retry on authentication or validation errors — only transient failures (network, rate limit, server error)
Anti-Patterns to Avoid
- Using the most expensive model for all requests regardless of complexity
- Retrying on all errors (wastes budget on permanent failures)
- Mutating cost tracking state (makes debugging and auditing difficult)
- Hardcoding model names throughout the codebase (use constants or config)
- Ignoring prompt caching for repetitive system prompts
Batch Processing Pattern
async function processBatch(
items: string[],
systemPrompt: string,
budgetLimit = 10.00
): Promise<{ results: string[]; tracker: CostTracker }> {
let tracker = createTracker(budgetLimit);
const results: string[] = [];
for (const item of items) {
try {
const { result, tracker: newTracker } = await processWithCostControl(
item,
systemPrompt,
tracker
);
results.push(result);
tracker = newTracker;
} catch (error) {
if (error.message.includes("Budget exceeded")) {
console.warn(`Stopped at item ${results.length}/${items.length} due to budget`);
break;
}
throw error;
}
}
return { results, tracker };
}
Optimize both web performance and API costs for maximum efficiency.