Fermi Estimation

Core principle: Almost any quantity can be estimated to within an order of magnitude by decomposing it into components you can estimate, then multiplying through. The goal is not precision — it's getting the right number of zeros. A 10× error is informative. A 1000× error changes the decision.

"It is better to be approximately right than precisely wrong." — Carveth Read

Why Fermi Estimation

You're not always in a position to measure. But you often need to decide. Fermi estimation gives you:

A defensible, traceable quantitative estimate from reasoning alone
A sanity check on received numbers ("does this feel right order-of-magnitude?")
Identification of the key driver — the variable that dominates the result
A structured way to surface which sub-estimates need validation most

The Core Process

Step 1: Define the Target Quantity Precisely

Ambiguous questions produce useless estimates. Before decomposing, specify:

What exactly is being estimated?
Over what time period?
For what population or scope?
In what units?

"How many tokens does this use?" → "What is the total token count of a single Constellation pipeline run, processing a medium-complexity feature request, across all agent turns?"

Step 2: Decompose into Estimable Factors

Break the unknown quantity into a product of factors you can estimate:

Target = Factor_1 × Factor_2 × Factor_3 × ...

Look for a decomposition where:

Each factor can be estimated independently
The factors multiply to the target (check the units — they should cancel correctly)
No factor is itself the original unknown in disguise

Useful decomposition patterns:

Rate × Time: (requests/hour) × (hours)
Count × Average: (number of items) × (average size per item)
Population × Fraction: (total users) × (% who do X)
Flow × Duration: (throughput) × (duration)

Step 3: Estimate Each Factor

For each factor, make an explicit estimate with reasoning:

What do you know about this?
What analogies can you draw on?
What's a plausible range?

Use round numbers — the goal is order of magnitude, not false precision.

Step 4: Compute and Sanity-Check

Multiply through. Then:

Does the result feel right? Does it pass common sense?
Does it match any reference points you know?
Which factor, if wrong, would most change the result?

Step 5: Bound the Estimate

Low estimate: Use the low end of each factor's range
Central estimate: Use your best guess for each
High estimate: Use the high end

If high/low are within a factor of 3× each side, the estimate is well-bounded. If they're orders of magnitude apart, one factor is too uncertain — that's what to validate.

Output Format

🎯 Target Quantity

Estimating: [Precisely defined quantity]
Units: [What we're counting in]

🧮 Decomposition

Factor	Estimate	Reasoning
[Factor 1]	[Value]	[Why you believe this]
[Factor 2]	[Value]	[Why you believe this]
[Factor 3]	[Value]	[Why you believe this]
Product	= [Result]

📊 Range

Scenario	Estimate	Key driver
Low	[Value]	[Which factor is at its low]
Central	[Value]	Best guess
High	[Value]	[Which factor is at its high]

🔍 Key Driver

Which factor contributes most to the result?
If you could only validate one factor, which would it be?
A 2× error in [key factor] produces a [2×] error in the result — worth checking.

✅ Sanity Checks

Reference point comparison: [Known comparable value]
Does this pass common sense? [Yes / No — if no, which factor is suspect?]
Order-of-magnitude conclusion: [The number of zeros that matter here]

Common Useful Reference Points

Keep these in rough memory for decomposition:

Time

Person-hour of engineering work: ~1–4 hours of focused effort
Working hours per week: ~40 (effective: ~25–30)
Days per month: ~22 working days

Compute / LLM

Typical prose token density: ~750 words per 1,000 tokens
GPT-4-class input cost: ~$2–10 per million tokens (varies by model)
Average LLM response time: 1–10 seconds depending on length and model
Code file: 50–500 lines typical; ~100–2,000 tokens

Scale

Small SaaS: 1k–10k MAU
Mid-size product: 100k–1M MAU
Large platform: 10M+ MAU

Money

Fully-loaded engineer cost (EU/US): €80k–€200k/year
Per-hour cost: €40–€100/hour
AWS small instance: ~$10–50/month

Estimation Anti-Patterns

False precision: Reporting "42,381 tokens" when the estimate is order-of-magnitude. Use round numbers.

Single-path decomposition: Only one way to decompose the problem. Cross-check with an independent decomposition — if they agree, confidence is higher.

Forgetting to check units: If units don't cancel correctly, the decomposition is wrong.

Treating the estimate as the answer: A Fermi estimate is a starting point and a sanity check, not a substitute for measurement when measurement is possible and the decision warrants it.

Refusing to estimate: "I don't have enough data" is rarely the right answer when a decision needs to be made. Decompose what you can; flag what you can't.

Thinking Triggers

"What does this quantity equal, as a product of things I can estimate?"
"What's the right number of zeros here?"
"Which single factor, if wrong by 10×, changes my conclusion?"
"What reference point can I use to sanity-check this?"
"If the estimate is off by 2×, does it change the decision? By 10×?"

Example: Token Budget for an Agent Pipeline

Question: How many tokens does one Constellation run consume?

Factor	Estimate	Reasoning
Number of agent turns	8	6 agents + 1 orchestrator + 1 review
Avg input tokens per turn	4,000	System prompt ~1k + context ~2k + task ~1k
Avg output tokens per turn	1,000	Structured response, not prose-heavy
Total per run	= 8 × 5,000 = 40,000 tokens

Range: 20k (simple task, short context) to 120k (complex task, full history passed through)

Key driver: Input context size — it dominates. Compressing context is highest-leverage.

Sanity check: A 40k-token run at $5/M tokens = $0.20/run. At 100 runs/day = $20/day = ~$600/month. Plausible for a dev tool.