probabilistic-thinking

Apply probabilistic and Bayesian thinking whenever the user needs to reason under uncertainty, compare risks, prioritize between options, update beliefs based on new evidence, or make decisions without complete information. Triggers on phrases like "what are the odds?", "how likely is this?", "should I be worried about X?", "which risk is bigger?", "does this data change anything?", "is this a signal or noise?", "what's the probability?", "how confident are we?", or any situation where decisions are being made based on incomplete or ambiguous evidence. Also trigger when someone is treating uncertain outcomes as certainties, or when probability language is being used loosely ("probably", "unlikely", "very likely") without quantification. Don't leave uncertainty unexamined.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "probabilistic-thinking" with this command: npx skills add andurilcode/skills/andurilcode-skills-probabilistic-thinking

Probabilistic & Bayesian Thinking

Core principle: Most real decisions happen under uncertainty. Probabilistic thinking replaces vague confidence with calibrated estimates. Bayesian thinking adds the discipline of updating those estimates as new evidence arrives — neither clinging to prior beliefs nor overreacting to new data.


Core Concepts

Probability as Degree of Belief

Probability isn't just for coin flips. It's a measure of how confident we are in any claim, given current evidence.

  • "This will probably work" → What probability? 60%? 90%? The difference matters.
  • Forcing a number exposes vague confidence and creates a baseline for updating.

Base Rates

Before estimating the probability of a specific event, find the base rate — how often does this type of event occur in a reference class?

"Will this feature succeed?" → What % of similar features in similar products succeeded?

Ignoring base rates (the base rate fallacy) is one of the most common reasoning errors.

Bayesian Updating

When new evidence arrives, update beliefs proportionally — not by ignoring prior beliefs, and not by overwriting them entirely.

New Belief = Prior Belief × Weight of New Evidence

Key questions:

  • Prior: What did we believe before this evidence?
  • Likelihood: How probable is this evidence if the hypothesis is true? If it's false?
  • Posterior: What should we believe now?

Expected Value

When choosing between options under uncertainty, compare expected values:

EV = Probability of outcome × Value of outcome

A 10% chance of +€100 (EV = €10) is better than a 90% chance of +€5 (EV = €4.50).

Confidence Intervals

Point estimates are almost always wrong. Ranges are more honest.

  • Instead of "this will take 4 weeks" → "this will take 3–7 weeks (80% confidence)"
  • Wide intervals are not weakness — they're calibration. Narrow intervals on uncertain things are overconfidence.

Output Format

📊 Probability Estimates

For each key claim or outcome:

ClaimPrior probabilityEvidenceUpdated probabilityConfidence
"Feature will succeed"30% (base rate)Strong user signal55%Medium
"This will ship on time"40% (historical)Team is experienced50%Low

🔢 Base Rate Check

  • What is the reference class for this situation?
  • What is the historical base rate for this type of outcome?
  • How does this specific case differ from the base rate (and does that justify adjusting up or down)?

🔄 Bayesian Update

When new evidence has arrived:

  • Prior belief: What did we think before?
  • New evidence: What do we now know?
  • Likelihood ratio: Is this evidence more consistent with the hypothesis being true or false?
  • Posterior belief: What should we believe now?
  • Update size: Did this evidence move the needle significantly? (Strong evidence = large update. Weak evidence = small update.)

⚖️ Expected Value Comparison

When choosing between options:

OptionProbabilityValue if succeedsValue if failsExpected Value
Option A70%+€50k-€10k+€32k
Option B30%+€200k-€20k+€46k

📏 Confidence Ranges

Replace point estimates with ranges:

  • Optimistic case (10th percentile): [value]
  • Expected case (50th percentile): [value]
  • Pessimistic case (90th percentile): [value]
  • Black swan scenario: [What happens in the tail?]

⚠️ Probability Hygiene Flags

  • Are any probabilities being treated as certainties (0% or 100%)? Almost nothing is certain.
  • Is base rate being ignored in favor of the specific case?
  • Is new evidence causing overreaction (anchoring to latest data)?
  • Is there a conjunction fallacy? (P(A and B) < P(A) always — the more specific the scenario, the lower its probability)

Calibration Heuristics

Fermi Estimation — For unknown quantities, break into smaller estimable parts:

  • Instead of "how many users will we get?" → estimate: market size × awareness % × conversion % × retention %

Reference Class Forecasting — Use historical data from similar projects:

  • "This type of feature took 4–8 weeks for 80% of teams in our reference class"

Outside View vs. Inside View:

  • Inside view: "Our situation is special, we'll beat the average"
  • Outside view: "What does the data say for projects like this?"
  • Default to the outside view. Adjust only with specific, strong evidence.

Pre-commit to what would change your mind:

  • "If we see X, I will update my probability from 60% to below 30%"
  • This prevents post-hoc rationalization of new evidence

Thinking Triggers

  • "What's the base rate for this?"
  • "Are we treating a 70% probability like a certainty?"
  • "What's the expected value of each option, not just the upside?"
  • "How much should this new evidence actually move our belief?"
  • "What would we need to see to change our mind significantly?"
  • "Are we in the reference class we think we're in?"
  • "What's the downside scenario, and are we weighting it correctly?"

Example Applications

  • "Should we build this feature?" → What % of similar features drove meaningful retention? What's the cost if it fails?
  • "This A/B test showed a lift" → Is the sample size sufficient? What's the prior for this type of change?
  • "We'll ship in 2 weeks" → What's the historical distribution for similar tasks? What's the 80th percentile?
  • "The agent failed once — is it a bug?" → What's the base rate of one-off failures? What evidence would confirm it's systematic?

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

inversion-premortem

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

red-teaming

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

cognitive-bias-detection

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

cynefin-framework

No summary provided by upstream source.

Repository SourceNeeds Review