cost-accrual-tracker

Cost Accrual Tracker

Real-time tracking of API costs during LLM execution with support for partial costs on abort.

When to Use

✅ Use for:

Implementing real-time cost tracking during execution
Capturing partial costs when executions are aborted
Building cost display widgets for execution UIs
Integrating token counting into execution pipelines
Adding budget thresholds with auto-stop

❌ NOT for:

Cost estimation before execution (use pricing calculators)
Billing system design (use billing-system skill)
Price tier management or discounts
Historical cost analytics dashboards

Core Patterns

Token-Based Cost Calculation

interface TokenUsage { inputTokens: number; outputTokens: number; cacheReadTokens?: number; // Prompt caching hits cacheWriteTokens?: number; // Prompt caching misses }

interface CostCalculation { inputCostUsd: number; outputCostUsd: number; cacheSavingsUsd?: number; totalCostUsd: number; }

function calculateCost(usage: TokenUsage, model: string): CostCalculation { const pricing = MODEL_PRICING[model];

const inputCostUsd = (usage.inputTokens / 1_000_000) * pricing.inputPerMTok; const outputCostUsd = (usage.outputTokens / 1_000_000) * pricing.outputPerMTok;

return { inputCostUsd, outputCostUsd, totalCostUsd: inputCostUsd + outputCostUsd, }; }

Incremental Accrual Pattern

Track costs as they accrue, not just at completion:

class CostAccrualTracker { private totalInputTokens = 0; private totalOutputTokens = 0; private accruedCostUsd = 0; private readonly model: string;

constructor(model: string) { this.model = model; }

/**

Called after each API response (streaming or complete) */ recordUsage(usage: TokenUsage): void { this.totalInputTokens += usage.inputTokens; this.totalOutputTokens += usage.outputTokens;

const cost = calculateCost(usage, this.model);

this.accruedCostUsd += cost.totalCostUsd;

}

/**

Get current accrued cost (for real-time display) */ getCurrentCost(): number { return this.accruedCostUsd; }

/**

Finalize on completion or abort */ finalize(reason: 'completed' | 'aborted' | 'failed'): CostReport { return { totalInputTokens: this.totalInputTokens, totalOutputTokens: this.totalOutputTokens, totalCostUsd: this.accruedCostUsd, completionReason: reason, finalizedAt: Date.now(), }; } }

Abort-Aware Cost Capture

Critical: Always capture partial costs on abort:

// In execution handler const tracker = new CostAccrualTracker(model);

try { for await (const chunk of executeStream(request)) { if (abortSignal.aborted) { // CRITICAL: Capture cost BEFORE throwing const partialCost = tracker.finalize('aborted'); onCostUpdate(partialCost); throw new AbortError('Execution aborted'); }

tracker.recordUsage(chunk.usage);
onCostUpdate(tracker.getCurrentCost());

}

return tracker.finalize('completed'); } catch (error) { if (error instanceof AbortError) { throw error; // Already handled } return tracker.finalize('failed'); }

Budget Threshold Pattern

Auto-stop execution when budget is exceeded:

interface BudgetConfig { maxCostUsd: number; warnAtPercentage: number; // e.g., 0.8 for 80% onWarn?: (current: number, max: number) => void; onExceed?: (current: number, max: number) => void; }

function createBudgetGuard(config: BudgetConfig) { return { check(currentCostUsd: number): 'ok' | 'warn' | 'exceed' { const percentage = currentCostUsd / config.maxCostUsd;

  if (percentage >= 1.0) {
    config.onExceed?.(currentCostUsd, config.maxCostUsd);
    return 'exceed';
  }

  if (percentage >= config.warnAtPercentage) {
    config.onWarn?.(currentCostUsd, config.maxCostUsd);
    return 'warn';
  }

  return 'ok';
}

}; }

Anti-Patterns

Lost Costs on Abort

Novice thinking: "Just throw an error when aborted"

Reality: If you don't capture costs before aborting, you lose:

Token usage data for partial execution
Accurate cost reporting for billing
Audit trail for debugging

Timeline: Always been an issue, but became critical with expensive models (GPT-4, Claude Opus)

Correct approach: Always call finalize() with partial data BEFORE throwing abort errors.

Polling Without Debounce

Novice thinking: "Poll cost endpoint every 100ms for real-time updates"

Reality:

Wastes bandwidth and CPU
Cost updates only happen after API responses
Polling faster than response rate is pointless

Correct approach: Poll at 1-2 second intervals, or use event-driven updates from the execution stream.

Ignoring Prompt Caching

Novice thinking: "Just multiply tokens by price per token"

Reality: Claude's prompt caching changes the cost model:

Cache reads are 90% cheaper
Cache writes cost extra on first use
Ignoring caching leads to inaccurate costs

Timeline:

Pre-2024: No caching, simple calculation
2024+: Claude prompt caching requires separate tracking

Correct approach: Track cache_read_input_tokens and cache_creation_input_tokens separately.

Per-Request Cost Objects

Novice thinking: "Create new tracker for each request"

Reality: For DAG execution with multiple nodes:

Need aggregate cost across all nodes
Need to attribute costs to specific nodes
Need rollup for parent execution

Correct approach: Hierarchical tracking - per-node trackers that roll up to execution-level.

State Flow

                ┌─────────────────────────────────────────┐
                │           CostAccrualTracker            │
                └─────────────────────────────────────────┘
                                    │
          ┌─────────────────────────┼─────────────────────────┐
          │                         │                         │
          ▼                         ▼                         ▼
┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│  recordUsage()  │     │ getCurrentCost()│     │   finalize()    │
│                 │     │                 │     │                 │
│ After each API  │     │ For real-time   │     │ On completion,  │
│ response        │     │ display         │     │ abort, or fail  │
└─────────────────┘     └─────────────────┘     └─────────────────┘
          │                         │                         │
          │                         │                         │
          ▼                         ▼                         ▼
┌─────────────────────────────────────────────────────────────────┐
│                        CostReport                               │
│  { inputTokens, outputTokens, totalCostUsd, completionReason }  │
└─────────────────────────────────────────────────────────────────┘

UI Display Pattern

For real-time cost display in execution UIs:

// Poll every 2 seconds while executing useEffect(() => { if (status !== 'running') return;

return () => clearInterval(interval); }, [executionId, status]);

// Display format <div className="cost-display"> <span className="cost-amount">${accruedCost.toFixed(4)}</span> <span className="token-count"> {tokens.input.toLocaleString()} in / {tokens.output.toLocaleString()} out </span> </div>

Integration Points

Component Responsibility

CostAccrualTracker

Per-execution token counting and cost calculation

ExecutionManager

Aggregates costs across DAG executions

BudgetGuard

Threshold monitoring and auto-stop

/api/execute/:id

Exposes current cost via polling

Cost Display Widget Real-time UI rendering

References

See /references/claude-api-pricing.md for current Claude API pricing.

cost-accrual-tracker

Safety Notice

Copy this and send it to your AI assistant to learn

Source Transparency

Related Skills

video-processing-editing

cv-creator

mobile-ux-optimizer

personal-finance-coach