Cost Accrual Tracker
Real-time tracking of API costs during LLM execution with support for partial costs on abort.
When to Use
✅ Use for:
-
Implementing real-time cost tracking during execution
-
Capturing partial costs when executions are aborted
-
Building cost display widgets for execution UIs
-
Integrating token counting into execution pipelines
-
Adding budget thresholds with auto-stop
❌ NOT for:
-
Cost estimation before execution (use pricing calculators)
-
Billing system design (use billing-system skill)
-
Price tier management or discounts
-
Historical cost analytics dashboards
Core Patterns
- Token-Based Cost Calculation
interface TokenUsage { inputTokens: number; outputTokens: number; cacheReadTokens?: number; // Prompt caching hits cacheWriteTokens?: number; // Prompt caching misses }
interface CostCalculation { inputCostUsd: number; outputCostUsd: number; cacheSavingsUsd?: number; totalCostUsd: number; }
function calculateCost(usage: TokenUsage, model: string): CostCalculation { const pricing = MODEL_PRICING[model];
const inputCostUsd = (usage.inputTokens / 1_000_000) * pricing.inputPerMTok; const outputCostUsd = (usage.outputTokens / 1_000_000) * pricing.outputPerMTok;
return { inputCostUsd, outputCostUsd, totalCostUsd: inputCostUsd + outputCostUsd, }; }
- Incremental Accrual Pattern
Track costs as they accrue, not just at completion:
class CostAccrualTracker { private totalInputTokens = 0; private totalOutputTokens = 0; private accruedCostUsd = 0; private readonly model: string;
constructor(model: string) { this.model = model; }
/**
- Called after each API response (streaming or complete) */ recordUsage(usage: TokenUsage): void { this.totalInputTokens += usage.inputTokens; this.totalOutputTokens += usage.outputTokens;
const cost = calculateCost(usage, this.model);
this.accruedCostUsd += cost.totalCostUsd;
}
/**
- Get current accrued cost (for real-time display) */ getCurrentCost(): number { return this.accruedCostUsd; }
/**
- Finalize on completion or abort */ finalize(reason: 'completed' | 'aborted' | 'failed'): CostReport { return { totalInputTokens: this.totalInputTokens, totalOutputTokens: this.totalOutputTokens, totalCostUsd: this.accruedCostUsd, completionReason: reason, finalizedAt: Date.now(), }; } }
- Abort-Aware Cost Capture
Critical: Always capture partial costs on abort:
// In execution handler const tracker = new CostAccrualTracker(model);
try { for await (const chunk of executeStream(request)) { if (abortSignal.aborted) { // CRITICAL: Capture cost BEFORE throwing const partialCost = tracker.finalize('aborted'); onCostUpdate(partialCost); throw new AbortError('Execution aborted'); }
tracker.recordUsage(chunk.usage);
onCostUpdate(tracker.getCurrentCost());
}
return tracker.finalize('completed'); } catch (error) { if (error instanceof AbortError) { throw error; // Already handled } return tracker.finalize('failed'); }
- Budget Threshold Pattern
Auto-stop execution when budget is exceeded:
interface BudgetConfig { maxCostUsd: number; warnAtPercentage: number; // e.g., 0.8 for 80% onWarn?: (current: number, max: number) => void; onExceed?: (current: number, max: number) => void; }
function createBudgetGuard(config: BudgetConfig) { return { check(currentCostUsd: number): 'ok' | 'warn' | 'exceed' { const percentage = currentCostUsd / config.maxCostUsd;
if (percentage >= 1.0) {
config.onExceed?.(currentCostUsd, config.maxCostUsd);
return 'exceed';
}
if (percentage >= config.warnAtPercentage) {
config.onWarn?.(currentCostUsd, config.maxCostUsd);
return 'warn';
}
return 'ok';
}
}; }
Anti-Patterns
Lost Costs on Abort
Novice thinking: "Just throw an error when aborted"
Reality: If you don't capture costs before aborting, you lose:
-
Token usage data for partial execution
-
Accurate cost reporting for billing
-
Audit trail for debugging
Timeline: Always been an issue, but became critical with expensive models (GPT-4, Claude Opus)
Correct approach: Always call finalize() with partial data BEFORE throwing abort errors.
Polling Without Debounce
Novice thinking: "Poll cost endpoint every 100ms for real-time updates"
Reality:
-
Wastes bandwidth and CPU
-
Cost updates only happen after API responses
-
Polling faster than response rate is pointless
Correct approach: Poll at 1-2 second intervals, or use event-driven updates from the execution stream.
Ignoring Prompt Caching
Novice thinking: "Just multiply tokens by price per token"
Reality: Claude's prompt caching changes the cost model:
-
Cache reads are 90% cheaper
-
Cache writes cost extra on first use
-
Ignoring caching leads to inaccurate costs
Timeline:
-
Pre-2024: No caching, simple calculation
-
2024+: Claude prompt caching requires separate tracking
Correct approach: Track cache_read_input_tokens and cache_creation_input_tokens separately.
Per-Request Cost Objects
Novice thinking: "Create new tracker for each request"
Reality: For DAG execution with multiple nodes:
-
Need aggregate cost across all nodes
-
Need to attribute costs to specific nodes
-
Need rollup for parent execution
Correct approach: Hierarchical tracking - per-node trackers that roll up to execution-level.
State Flow
┌─────────────────────────────────────────┐
│ CostAccrualTracker │
└─────────────────────────────────────────┘
│
┌─────────────────────────┼─────────────────────────┐
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ recordUsage() │ │ getCurrentCost()│ │ finalize() │
│ │ │ │ │ │
│ After each API │ │ For real-time │ │ On completion, │
│ response │ │ display │ │ abort, or fail │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────────────────────────────────┐
│ CostReport │
│ { inputTokens, outputTokens, totalCostUsd, completionReason } │
└─────────────────────────────────────────────────────────────────┘
UI Display Pattern
For real-time cost display in execution UIs:
// Poll every 2 seconds while executing useEffect(() => { if (status !== 'running') return;
const interval = setInterval(async () => {
const response = await fetch(/api/execute/${executionId});
const data = await response.json();
setAccruedCost(data.cost.accruedUsd);
setTokens({
input: data.cost.inputTokens,
output: data.cost.outputTokens,
});
}, 2000);
return () => clearInterval(interval); }, [executionId, status]);
// Display format <div className="cost-display"> <span className="cost-amount">${accruedCost.toFixed(4)}</span> <span className="token-count"> {tokens.input.toLocaleString()} in / {tokens.output.toLocaleString()} out </span> </div>
Integration Points
Component Responsibility
CostAccrualTracker
Per-execution token counting and cost calculation
ExecutionManager
Aggregates costs across DAG executions
BudgetGuard
Threshold monitoring and auto-stop
/api/execute/:id
Exposes current cost via polling
Cost Display Widget Real-time UI rendering
References
See /references/claude-api-pricing.md for current Claude API pricing.