Credibility Evidence Selector
When to Use
You have a claim a skeptical audience might reject, and a bag of possible proof points (stats, quotes, stories, demos, details). You need to decide which evidence to lead with — and which to cut. This skill ranks evidence across six named credibility sources and applies the Sinatra Test as a pass/fail filter to kill weak chains before they reach the audience.
Run this skill when the user is about to ship a claim without thinking hard about why the audience would believe it. Run it BEFORE writing the final copy, not after — the choice of evidence type reshapes the whole paragraph.
Preconditions to verify before starting:
- You know the specific claim. If missing, ask: "What is the exact one-sentence claim you need the audience to believe?"
- You have an evidence inventory (even a rough list). If missing, ask: "What proof points do you have — stats, stories, customers, experts, demos, details?"
- You know the audience's default skepticism. If missing, ask: "Who is judging this, and what is the one thing they'd say to dismiss it?"
Do NOT use this skill for:
- Fabricating evidence. This skill ranks what exists; it never invents sources.
- Designing experiments or user studies.
- Extracting the core message (use
core-message-extractorfirst — credibility is applied to a core, not in place of one).
Context & Input Gathering
Required Context (must have — ask if missing)
- The claim: the exact sentence the audience must believe.
-> Check prompt for: a quoted claim, a pitch paragraph, a headline, a research finding.
-> Check environment for:
claim.md,draft.md,pitch.md,core-message.md. -> If still missing, ask: "What is the one-sentence claim — the thing the audience must believe?" - Evidence inventory: the candidate proof points available.
-> Check prompt for: lists of customers, stats, quotes, case studies, demos, testimonials.
-> Check environment for:
evidence.md,proof-points.md,case-studies/,testimonials.md,customer-stories/, adata/dir. -> If still missing, ask: "What evidence do you already have? Dump a rough list — customers, numbers, experts, stories, demos, details — I will rank them." - Audience skepticism profile: who judges the claim, and their default objection. -> Check prompt for: role, domain, familiarity, prior objections. -> If missing, ask: "Who will read this, and what would they say to dismiss the claim?"
Observable Context (gather from environment)
- Prior credibility drafts: existing "why trust us" copy, testimonial pages, case study PDFs.
-> Look for:
about.md,trust.md,social-proof.md,testimonials.md,case-studies/. -> If present: mine for candidate evidence; treat as raw material, not final answer. - Brand voice constraints: legal review rules, claims guidelines.
-> Look for:
brand-guide.md,legal/claims.md. -> If present: note which evidence types are disallowed (e.g., "no unverified customer quotes").
Default Assumptions
- Medium: short-form text (one paragraph, one slide, one email). State this assumption if used.
- Audience default: mildly skeptical professional, not a hostile expert. Upgrade to hostile if the user says so.
- Evidence claims are truthful — the skill trusts the user's inventory and does not fact-check.
Sufficiency Threshold
- SUFFICIENT: claim + evidence inventory + audience known.
- PROCEED WITH DEFAULTS: claim + evidence known, audience inferable from context (state the inference).
- MUST ASK: claim is not a single sentence, OR evidence inventory is completely empty (cannot rank nothing).
Process
Step 1: Lock the Claim to a Single Sentence
ACTION: Write the claim as one sentence with a clear subject, verb, and measurable predicate. Reject hedges ("might," "could help," "in some cases"). Save it under a ## Claim heading in credibility-plan.md.
WHY: Credibility attaches to a specific assertion, not a vibe. If the claim is "we help teams ship faster," no evidence can be strong or weak — the claim is too fuzzy to judge. Locking the sentence forces the later steps to pick evidence that actually answers the specific assertion. Fuzzy claims are where weak evidence hides; tightening the claim is half the work of ranking the proof.
IF the claim is still multi-part ("we are fast, cheap, and reliable") -> split into one file per claim and run this skill once per claim. ELSE -> proceed.
Step 2: Inventory Evidence Into the Six Categories
ACTION: Take every proof point the user has and sort it into one of six credibility categories. Some evidence fits multiple — list it under the strongest fit only. Write the result as a ## Evidence Inventory section with six sub-headings.
The Six Credibility Sources (from Chapter 4 CREDIBLE):
- External Authority — a recognized expert, institution, or celebrity endorsement. Example: a Nobel laureate, an industry analyst, a peer-reviewed study. Strength: borrows the audience's existing trust. Weakness: cheap to fake and audiences are now skeptical of expert voices; "analyst-quoted" pitches blur together.
- Antiauthority (Credible Victim / Lived Experience) — an ordinary person who lived the consequences. Not an expert; their biography IS the proof. Example: Pam Laffin — age 29, mother of two, started smoking at 10, emphysema at 24, failed lung transplant — used for anti-smoking messaging because no statistic lands like a face that earned the cost. Strength: inverts expert fatigue. Weakness: emotionally heavy; must be ethical and consensual.
- Testable Credentials ("Try It Yourself") — the audience can verify the claim in real time, with their own senses, before committing. Example: Wendy's "Where's the beef?" (you can see the tiny patty). Snickers "satisfies" (you can feel it). Barry Marshall drinking a beaker of H. pylori bacteria, inducing gastritis, then curing himself — a self-experiment as a public, falsifiable demonstration that ulcers are bacterial. Strength: collapses the skeptic's objection into a single observable act. Weakness: only works when real-time verification is possible.
- Vivid, Convincing Details — specific sensory or biographical detail that an inventor could not have bothered to fake. Example: the juror study where jurors believed a mother who testified her child had a "Goofy toothbrush" he loved to brush with — the trivial detail signaled an observer telling the truth, not a litigant pitching a case. Strength: non-obvious signal that the speaker has been close to the thing. Weakness: easy to confuse with purple prose; the detail must be load-bearing, not decorative.
- Sinatra Test (One Overwhelming Hero Example) — a single reference so dominant that the audience cannot doubt you afterward. Named after "If you can make it there, you'll make it anywhere." Example: Safexpress, an Indian logistics firm, won enterprise contracts by saying "we delivered the final Harry Potter book to 6,000 Indian bookstores in one day, sealed, no leaks" — one sentence that obliterates the due-diligence conversation. Fort Knox security contractor = you're in the running for every security contract. Strength: replaces a wall of credentials with one unforgettable story. Weakness: only works when the hero case is genuinely the hardest case in the domain.
- Statistics-as-Illustration (Human Scale) — a number converted into a concrete, sensory comparison so the audience can feel the magnitude. Example: explaining US/Soviet nuclear arsenals not as "thousands of warheads" but as "if a BB represents the Hiroshima bomb, the world's nuclear stockpile is a garbage can full of BBs — and one BB can level a city." Stats as raw numbers are forgettable; stats as illustration are sticky. Strength: anchors an abstract scale to a physical image. Weakness: only works if the illustration is honest — if the analogy flatters the number, the audience will catch you.
WHY: Sorting forces you to see that you usually do not have six types of evidence — you have two or three, and one slot is empty. The empty slots are signals: if you have zero testable credentials, ask whether you can manufacture one (a free trial, a public benchmark, a demo). The inventory is also the cut list; evidence that does not fit any category is usually rationalization, not proof.
Anti-pattern to flag: credentials wall. If the entire inventory lands under "External Authority," the user is leaning on a stack of expert quotes. Audiences skim credentials walls and remember nothing. Go find one testable credential or one Sinatra hero and lead with that instead.
Step 3: Rank by the Default Preference Order
ACTION: Apply the default preference order from strongest to weakest and rank the candidates. Write the result as a ## Ranked Evidence section.
Default preference order (Chapter 4 CREDIBLE):
- Testable Credentials — the audience verifies with their own senses.
- Sinatra Test Hero Example — one overwhelming case.
- Vivid Convincing Details — truthful trivia that signals firsthand knowledge.
- Antiauthority (Credible Victim) — lived experience beats expertise for behavior change.
- Statistics-as-Illustration — numbers rescued by human-scale analogy.
- External Authority — expert quote as last resort, not first move.
WHY: The ordering is not arbitrary — it tracks how audiences actually process claims. Testable credentials short-circuit skepticism ("I just saw it"); Sinatra heroes override it ("if they did THAT, I believe the rest"); vivid details bypass it ("no one would bother to invent that"); antiauthorities reframe it ("a doctor would lie; this mother wouldn't"); illustrated stats make it tangible; and authorities — cheapest to fake, most diluted by overuse — come last. Inverting this order is the most common credibility mistake: leading with a quote from a Gartner report when you have a customer who would pass the Sinatra Test.
Override rules (apply when defaults don't fit):
- IF the audience is a domain of credentialed peers (scientists reviewing research, doctors reading a trial) -> External Authority moves up to #1 or #2, because the audience's own credential norms require it.
- IF the claim is a behavior-change ask aimed at the people who resist expert messaging (teens on smoking, developers on technical debt, users on security hygiene) -> Antiauthority moves to #1.
- IF the claim is about scale / magnitude / cost / risk that audiences cannot feel intuitively -> Statistics-as-Illustration moves up (nuclear-warheads-as-BBs was the only way to make that scale real).
Step 4: Run the Sinatra Test as a Pass/Fail Gate
ACTION: Take the strongest candidate from Step 3 (especially if it's a case study, customer example, or hero story) and run the Sinatra Test as a binary pass/fail. Write the result as a ## Sinatra Test section with verdict, reasoning, and — if failed — a replacement plan.
The Sinatra Test, stated formally:
An example passes the Sinatra Test when ONE example alone is enough to establish credibility in a given domain — because the example represents the hardest case the domain can throw at you.
The three questions:
- Is this the hardest case? If the audience heard only this one example and nothing else, would they concede the full category? (Fort Knox security = yes, any security contract follows. Harry Potter delivery for 6,000 bookstores in one day, sealed = yes, any logistics contract follows.)
- Is the hard part legible? Can the audience see why it was hard without a paragraph of explanation? Sinatra examples are self-evidently difficult — "Fort Knox" and "Harry Potter launch day" need no setup. If you must explain why the case was hard, it is not Sinatra material.
- Is it verifiable? Can the audience check the claim if they wanted to? A Sinatra hero that cannot survive a two-minute Google search is worse than no Sinatra at all — it becomes a credibility liability on discovery.
WHY: The Sinatra Test exists because one dominant example is cheaper and stickier than a capabilities deck, but a weak hero example is worse than a deck — the audience senses the cherry-picking and punishes you. The three questions are the diagnostic: hardest case, legibly hard, checkable. Any failure means replace, don't patch. The Marshall ulcer story passes because drinking a beaker of bacteria is unambiguous; a vendor's "we helped a Fortune 500 save 15%" fails because it is neither the hardest case nor self-evidently hard.
IF the candidate passes all three questions -> mark PASS, make it the lead evidence. ELSE IF it fails one question -> mark FAIL, cut it from the lead, and either find a stronger example or fall back to the next category in the ranking (vivid details or testable credentials). ELSE IF no candidate passes -> do NOT fabricate one. Accept that the claim has no Sinatra-grade proof and move to multi-source credibility (testable demo + vivid detail + illustrated stat).
Anti-pattern to flag: Sinatra inflation. The user wants their best customer to pass the Sinatra Test and is willing to argue for it. If you have to argue, it fails — the test is whether the audience recognizes the hardness instantly. Three extra sentences of framing is a flunk.
Step 5: Check for the "Sound Bite / Credentials Wall" Failure Modes
ACTION: Review the ranked evidence against two named failure modes from the book and note any hits in a ## Diagnostic Notes section.
Failure mode A — The Credentials Wall. Symptom: the evidence is a list of logos, titles, and expert quotes with no single piece of proof an audience can remember. Consequence: audience forgets everything and defaults to their prior belief. Fix: replace the wall with one Sinatra hero or one testable credential. If you cannot, your claim is stronger than your evidence — shrink the claim.
Failure mode B — The Raw Statistic. Symptom: a scary or impressive number without a human-scale translation ("we prevent 3.2M attacks per day"). Consequence: the number washes past the audience. Fix: apply the nuclear-warheads-as-BBs move — convert to a sensory comparison. "3.2M attacks per day = 37 attacks every second — one every time you blink."
WHY: Credibility is a negative-space problem. The right evidence makes the claim stick, but the wrong evidence actively weakens it — a credentials wall signals insecurity, and a raw statistic signals that the speaker never tested their own number on a human. Running these checks after ranking catches cases where the strongest evidence in the inventory still triggers a failure mode, and the fix is not ranking-higher but rewriting the evidence into a stronger form.
Step 6: Write the Output Artifact
ACTION: Assemble the final credibility-plan.md with these sections, in order:
# Credibility Plan
## Claim
<one-sentence claim>
## Audience Skepticism
- Who: <role / segment>
- Default objection: <the one thing they'd say to dismiss this>
## Evidence Inventory (Six Sources)
### External Authority
- <evidence> — <one-line note>
### Antiauthority (Credible Victim)
- ...
### Testable Credentials
- ...
### Vivid Convincing Details
- ...
### Sinatra Test Hero Example
- ...
### Statistics-as-Illustration
- ...
## Ranked Evidence
1. <category> — <evidence> — <why it leads>
2. ...
## Sinatra Test
- Candidate: <the hero example tested>
- Hardest case? <yes/no + reasoning>
- Hard part legible? <yes/no + reasoning>
- Verifiable? <yes/no + reasoning>
- VERDICT: PASS | FAIL
- IF FAIL: <replacement plan>
## Diagnostic Notes
- Credentials wall? <yes/no + fix>
- Raw statistic? <yes/no + illustration>
## Recommendation
Lead with: <the one piece of evidence to put first>
Support with: <1-2 backup pieces>
Cut: <evidence that weakens the claim and should be removed>
## Assumptions
- <any inferred audience/claim/evidence assumption the user should confirm>
WHY: The artifact separates ranking (what I have) from recommendation (what to lead with) from cut list (what to remove). Users resist cuts; seeing them attached to specific failure modes ("this is a credentials-wall entry") makes the cut negotiable point-by-point rather than an all-or-nothing argument. The Sinatra Test section is kept verbatim (three questions, verdict, replacement) because it is the single most defensible step — a downstream reviewer who disagrees with the lead evidence has to disagree with a named, checkable test.
Inputs
- The one-sentence claim that needs credibility.
- The evidence inventory (rough dump is fine — customers, stats, experts, stories, demos, details).
- The audience skepticism profile (who judges; their default objection).
- Optional: medium constraints (character limit, slide vs paragraph vs email), legal/brand guidelines on allowed evidence.
Outputs
credibility-plan.md— the artifact defined in Step 6, containing:- The locked claim.
- Evidence sorted into the six categories.
- Ranked evidence with an explicit lead.
- A Sinatra Test block with pass/fail and reasoning.
- Diagnostic notes on credentials-wall and raw-statistic failure modes.
- A cut list of evidence that should be removed.
- An assumptions block for anything inferred.
Key Principles
- One overwhelming example beats a wall of credentials. — The Sinatra Test exists because audiences remember one hero case and forget ten expert quotes. When you have a genuine hardest-case reference, lead with it and cut the quote stack; when you don't, do not manufacture one — fall back to testable credentials or vivid details.
- Testable credentials collapse skepticism into a single observable act. — "Where's the beef?" works because the audience verifies with their own eyes. Barry Marshall drinking H. pylori works because the demonstration is public and falsifiable. When you can give the audience a way to check the claim themselves, nothing else in the evidence stack matters nearly as much.
- Antiauthority beats authority for behavior change. — The people most resistant to expert messaging — teens on smoking, developers on technical debt, users on security — are often best reached by someone who lived the consequences, not someone who studied them. Pam Laffin's biography outperformed any Surgeon General report because her body was the proof.
- Raw statistics are forgettable; illustrated statistics are sticky. — A number alone ("3.2M attacks per day") washes past. The same number translated to human scale ("37 every second — one every time you blink") lands. The nuclear-warheads-as-BBs move is not decoration; it is how abstract magnitude becomes emotionally real.
- Vivid details signal firsthand observation. — The juror who believed a mother because she mentioned a "Goofy toothbrush" did not decide on the toothbrush — she decided on the fact that no liar would bother inventing that detail. Load-bearing trivia is a credibility signal; decorative trivia is just prose.
- Evidence ordering is a trust hierarchy, not a taste preference. — The default preference order (testable > Sinatra > details > antiauthority > illustrated stats > authority) tracks how skepticism actually resolves. Inverting it — leading with a credentials wall — is the most common and most damaging credibility mistake in this book.
- When the evidence fails, shrink the claim. — If no evidence category produces a defensible lead, the claim is stronger than the proof supports. The fix is not to pad with weaker evidence; it is to tighten the claim until the available evidence is genuinely sufficient.
Examples
Scenario: B2B security startup with a credentials wall
Trigger: Founder says "our homepage lists 14 analyst quotes and 40 customer logos but nobody books a demo. How do I make the security claim more credible?"
Process: (1) Lock claim: "Our system blocks every known credential-stuffing attack pattern." (2) Inventory — 14 entries under External Authority, 2 under Vivid Details (an incident-response write-up mentioning a specific attacker TTP), 1 under Sinatra (they run security for a top-3 global bank's consumer login). (3) Rank: the bank reference is Sinatra-grade; testable credentials missing; authority dominates inventory (credentials wall). (4) Sinatra Test on the bank reference — Hardest case? Yes (top-3 global bank, consumer login = highest-volume attack surface). Legibly hard? Yes (audience instantly recognizes). Verifiable? Partial — the bank will not be named publicly. VERDICT: PASS with anonymization ("a top-3 global bank, consumer login, 400M accounts, zero successful credential-stuffing incidents in 18 months"). (5) Diagnostic flags credentials-wall (14 analyst quotes). Fix: cut to 3. (6) Recommendation: lead with the anonymized bank Sinatra hero; back with two incident-response details; cut 11 of 14 analyst quotes.
Output: credibility-plan.md with the bank Sinatra as lead, the quote wall shrunk, and a note that the founder should push for a testable credential (open benchmark or public red-team) to upgrade the credibility stack further.
Scenario: Nonprofit fundraising email, abstract statistics
Trigger: Fundraiser has a draft saying "3.2 million children are affected by food insecurity each year" and asks "why isn't this landing?"
Process: (1) Lock claim: "Child food insecurity is a scale emergency in our region this year." (2) Inventory — 1 raw statistic (the 3.2M number), 1 External Authority (a public-health study), 0 Testable, 0 Sinatra, 0 Antiauthority, 0 Vivid Details. (3) Rank: only two entries, both weak; the statistic is a Raw Statistic failure mode and needs illustration. (4) Sinatra Test: no hero candidate exists. Do NOT fabricate. Instead: convert the statistic. "3.2M children = every elementary-school seat in the five largest school districts in the country, empty at breakfast." (5) Diagnostic: Raw Statistic HIT. Fix: the BB-translation above. Also: add one Antiauthority — a single named family from the region, their story, their specific morning (vivid-details pairing). (6) Recommendation: lead with the illustrated statistic AND the one-family Antiauthority story side-by-side (Mother Teresa effect — identifiable victim paired with illustrated scale). Cut the authority-study citation to a footnote.
Output: credibility-plan.md recommending a two-piece lead (illustrated stat + identifiable family) with a note that the fundraiser should recruit an antiauthority spokesperson BEFORE next campaign.
Scenario: Academic publishing a counterintuitive research finding
Trigger: Researcher has a finding that contradicts 20 years of prior consensus. Asks "how do I make this credible to reviewers who will dismiss it on sight?"
Process: (1) Lock claim: "Finding X contradicts prior consensus Y because of mechanism Z." (2) Inventory — External Authority is dominant (peer-reviewed replication); Testable Credentials possible (open data + reproducible pipeline); one Antiauthority candidate (a respected prior defender of Y who reversed position); no Sinatra hero. (3) Rank: override default — audience is credentialed peers, so External Authority upgrades to #1. But lead with the Testable Credentials (open data) because a reviewer who can reproduce the result in an hour is more convinced than one reading a methods section. Antiauthority (the reversed defender) comes next — a Barry-Marshall-pattern Sinatra hero by analogy: someone who publicly reversed position is the "hardest case" this audience responds to. (4) Sinatra Test on the reversed defender: Hardest case? Yes (they were a public critic). Legibly hard? Yes (the reversal is documented). Verifiable? Yes. VERDICT: PASS. (5) Diagnostic: no credentials wall, no raw-stat failure. (6) Recommendation: lead with the open-data link (testable), follow with the reversed-defender statement (Sinatra-pattern antiauthority), support with the peer-reviewed replication. Cut nothing.
Output: credibility-plan.md with a testable-credential lead, a named Sinatra-pattern antiauthority, and a note that the open dataset should be ready BEFORE submission — the credibility plan depends on it.
References
- For long-form worked examples of each of the six credibility sources, see credibility-sources-catalog.md
- For the Sinatra Test worksheet and replacement decision tree, see sinatra-test-worksheet.md
License
This skill is licensed under CC-BY-SA-4.0. Source: BookForge — Made to Stick: Why Some Ideas Survive and Others Die by Chip Heath and Dan Heath.
Related BookForge Skills
This skill is standalone (a Level 0 foundation skill in the Made to Stick skill set). It pairs well with core-message-extractor (run first to lock the claim) and with downstream Concrete, Emotional, and Story skills (which apply stickiness techniques once credibility is established).
Browse more BookForge skills: bookforge-skills