ai-stopping-hallucinations

Stop your AI from making things up. Use when your AI hallucinates, fabricates facts, isn't grounded in real data, doesn't cite sources, makes unsupported claims, or you need to verify AI responses against source material. Covers citation enforcement, faithfulness verification, grounding via retrieval, confidence thresholds, and evaluation of anti-hallucination quality.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "ai-stopping-hallucinations" with this command: npx skills add lebsral/dspy-programming-not-prompting-lms-skills/lebsral-dspy-programming-not-prompting-lms-skills-ai-stopping-hallucinations

Stop Your AI From Making Things Up

Guide the user through making their AI factually grounded. The core principle: never trust a bare LM output — always verify against sources.

Why AI hallucinates

LMs generate plausible-sounding text, not verified facts. Hallucination happens when:

  • The model has no source material to ground its answer
  • The prompt doesn't enforce citations or evidence
  • There's no verification step after generation
  • Temperature is too high for factual tasks

The fix isn't better prompting — it's programmatic constraints that force grounding.

Step 1: Understand the grounding situation

Ask the user:

  1. Do you have source documents? (knowledge base, docs, database) → use retrieval-grounded answers
  2. Is it general knowledge? (no docs, just the model's knowledge) → use self-consistency checks
  3. How bad is a hallucination? (annoying vs. dangerous) → determines how strict the checks should be

Step 2: Citation enforcement

Force the AI to cite sources for every claim. Uses dspy.Assert to reject answers without citations.

import dspy
import re

class CitedAnswer(dspy.Signature):
    """Answer the question using the provided sources. Cite every claim with [1], [2], etc."""
    context: list[str] = dspy.InputField(desc="Numbered source documents")
    question: str = dspy.InputField()
    answer: str = dspy.OutputField(desc="Answer with inline citations like [1], [2]")

class CitationEnforcer(dspy.Module):
    def __init__(self):
        self.answer = dspy.ChainOfThought(CitedAnswer)

    def forward(self, context, question):
        result = self.answer(context=context, question=question)

        # Every 1-2 sentences must have a citation
        sentences = [s.strip() for s in result.answer.split(".") if s.strip()]
        citations_found = [bool(re.search(r"\[\d+\]", s)) for s in sentences]

        # Check that at least half the sentences have citations
        citation_ratio = sum(citations_found) / max(len(sentences), 1)
        dspy.Assert(
            citation_ratio >= 0.5,
            "Answer must cite sources. Use [1], [2], etc. after claims. "
            f"Only {citation_ratio:.0%} of sentences have citations."
        )

        # Check that cited numbers actually exist in the context
        cited_nums = set(int(n) for n in re.findall(r"\[(\d+)\]", result.answer))
        valid_nums = set(range(1, len(context) + 1))
        invalid = cited_nums - valid_nums
        dspy.Assert(
            len(invalid) == 0,
            f"Citations {invalid} don't match any source. Valid sources: [1] to [{len(context)}]."
        )

        return result

Step 3: Faithfulness verification

After generating an answer, use a second LM call to check if it's actually supported by the sources.

class CheckFaithfulness(dspy.Signature):
    """Check if every claim in the answer is supported by the context."""
    context: list[str] = dspy.InputField(desc="Source documents")
    answer: str = dspy.InputField(desc="Generated answer to verify")
    is_faithful: bool = dspy.OutputField(desc="Is every claim supported by the context?")
    unsupported_claims: list[str] = dspy.OutputField(desc="Claims not found in context")

class FaithfulResponder(dspy.Module):
    def __init__(self):
        self.retrieve = dspy.Retrieve(k=5)
        self.answer = dspy.ChainOfThought(CitedAnswer)
        self.verify = dspy.Predict(CheckFaithfulness)

    def forward(self, question):
        context = self.retrieve(question).passages
        result = self.answer(context=context, question=question)

        check = self.verify(context=context, answer=result.answer)
        dspy.Assert(
            check.is_faithful,
            f"Answer contains unsupported claims: {check.unsupported_claims}. "
            "Rewrite using only information from the provided sources."
        )

        return result

When dspy.Assert fails, DSPy automatically retries the LM call, feeding back the error message so the model can self-correct. This retry loop (called backtracking) runs up to max_backtrack_attempts times (default: 2).

Step 4: Self-check pattern

Generate an answer, then ask the model to verify its own claims against the sources. Lightweight and good for most cases.

class SelfCheckedAnswer(dspy.Module):
    def __init__(self):
        self.answer = dspy.ChainOfThought("context, question -> answer")
        self.check = dspy.ChainOfThought(CheckFaithfulness)

    def forward(self, context, question):
        result = self.answer(context=context, question=question)

        verification = self.check(context=context, answer=result.answer)
        dspy.Suggest(
            verification.is_faithful,
            f"Some claims may not be supported: {verification.unsupported_claims}. "
            "Consider revising to stick closer to the sources."
        )

        return dspy.Prediction(
            answer=result.answer,
            is_verified=verification.is_faithful,
            unsupported=verification.unsupported_claims,
        )

Use dspy.Suggest (soft) instead of dspy.Assert (hard) when you want to flag issues without blocking the response.

Step 5: Cross-check pattern

Generate the answer twice independently, then compare. If two independent generations disagree, something is probably made up.

class CrossChecked(dspy.Module):
    def __init__(self):
        self.gen_a = dspy.ChainOfThought("context, question -> answer")
        self.gen_b = dspy.ChainOfThought("context, question -> answer")
        self.compare = dspy.Predict(CompareAnswers)

    def forward(self, context, question):
        a = self.gen_a(context=context, question=question)
        b = self.gen_b(context=context, question=question)

        check = self.compare(answer_a=a.answer, answer_b=b.answer)
        dspy.Assert(
            check.agree,
            f"Two independent answers disagree: {check.discrepancy}. "
            "This suggests hallucination. Regenerate with closer attention to sources."
        )

        return a

class CompareAnswers(dspy.Signature):
    """Check if two independently generated answers agree on the facts."""
    answer_a: str = dspy.InputField()
    answer_b: str = dspy.InputField()
    agree: bool = dspy.OutputField(desc="Do they agree on all factual claims?")
    discrepancy: str = dspy.OutputField(desc="What they disagree on, if anything")

Best for high-stakes outputs where the cost of hallucination is high. Doubles your LM calls but catches inconsistencies.

Step 6: Confidence thresholds

Flag low-confidence outputs for human review instead of showing them to users.

class ConfidenceGated(dspy.Signature):
    """Answer the question and rate your confidence."""
    context: list[str] = dspy.InputField()
    question: str = dspy.InputField()
    answer: str = dspy.OutputField()
    confidence: float = dspy.OutputField(desc="0.0 to 1.0, how confident are you?")
    reasoning: str = dspy.OutputField(desc="Why this confidence level?")

class GatedResponder(dspy.Module):
    def __init__(self, threshold=0.7):
        self.respond = dspy.ChainOfThought(ConfidenceGated)
        self.threshold = threshold

    def forward(self, context, question):
        result = self.respond(context=context, question=question)

        if result.confidence < self.threshold:
            return dspy.Prediction(
                answer=result.answer,
                needs_review=True,
                confidence=result.confidence,
                reason=result.reasoning,
            )

        return dspy.Prediction(
            answer=result.answer,
            needs_review=False,
            confidence=result.confidence,
        )

Step 7: Loading source data for verification

Anti-hallucination patterns need source documents. Here's how to load common formats:

From transcript files

import json, re

def load_vtt(path):
    """Extract text from a VTT transcript, stripping timestamps and cues."""
    text = open(path).read()
    lines = [line.strip() for line in text.split("\n")
             if line.strip() and not line.startswith("WEBVTT")
             and not re.match(r"\d{2}:\d{2}", line)
             and not line.strip().isdigit()]
    return " ".join(lines)

def load_livekit_transcript(path):
    """Extract text from a LiveKit transcript JSON export."""
    data = json.load(open(path))
    segments = data.get("segments", data.get("results", []))
    return " ".join(seg.get("text", "") for seg in segments)

def load_recall_transcript(transcript_data):
    """Extract text from a Recall.ai transcript response."""
    return " ".join(
        entry["words"] for entry in transcript_data if entry.get("words")
    )

From Langfuse traces

from langfuse import Langfuse

def load_langfuse_generations(trace_id):
    """Load LM generations from a Langfuse trace for verification."""
    langfuse = Langfuse()
    trace = langfuse.get_trace(trace_id)
    generations = []
    for obs in trace.observations:
        if obs.type == "GENERATION" and obs.output:
            generations.append({
                "input": obs.input,
                "output": obs.output,
                "model": obs.model,
            })
    return generations

Breaking source documents into numbered passages

Most patterns here expect context: list[str] — numbered source passages. Split long documents into chunks so citations are meaningful:

def chunk_document(text, max_chars=500):
    """Split a document into numbered passages for citation."""
    paragraphs = [p.strip() for p in text.split("\n\n") if p.strip()]
    chunks = []
    current = ""
    for para in paragraphs:
        if len(current) + len(para) > max_chars and current:
            chunks.append(current.strip())
            current = para
        else:
            current = current + "\n\n" + para if current else para
    if current:
        chunks.append(current.strip())
    return [f"[{i+1}] {chunk}" for i, chunk in enumerate(chunks)]

# Use with any source
transcript_text = load_vtt("meeting.vtt")
context = chunk_document(transcript_text)
result = citation_enforcer(context=context, question="What was decided about the timeline?")

Step 8: Evaluating anti-hallucination quality

You need metrics to know if your verification actually works. The key question: does the system catch hallucinations and produce faithful answers?

Faithfulness metric

def faithfulness_metric(example, prediction, trace=None):
    """Score: does the answer stick to the sources?"""
    verifier = dspy.Predict(CheckFaithfulness)
    check = verifier(context=example.context, answer=prediction.answer)

    # Binary: is it faithful?
    if not check.is_faithful:
        return 0.0

    # Bonus: does it actually answer the question?
    relevance = dspy.Predict("question, answer -> is_relevant: bool")
    rel = relevance(question=example.question, answer=prediction.answer)
    return 1.0 if rel.is_relevant else 0.5

evaluator = dspy.Evaluate(devset=devset, metric=faithfulness_metric, num_threads=4)
score = evaluator(my_grounded_qa)

Citation coverage metric

def citation_metric(example, prediction, trace=None):
    """Score citation quality: coverage + validity."""
    answer = prediction.answer
    sentences = [s.strip() for s in answer.split(".") if s.strip()]
    cited = [bool(re.search(r"\[\d+\]", s)) for s in sentences]
    coverage = sum(cited) / max(len(sentences), 1)

    # Check all cited sources exist
    cited_nums = set(int(n) for n in re.findall(r"\[(\d+)\]", answer))
    valid_nums = set(range(1, len(example.context) + 1))
    all_valid = cited_nums.issubset(valid_nums)

    if not all_valid:
        return 0.0
    return coverage  # 0.0 to 1.0

Optimizing the verification pipeline

# Create training data: questions with source context and gold answers
trainset = [
    dspy.Example(
        context=["[1] The meeting is on March 5.", "[2] Budget is $50k."],
        question="When is the meeting?",
        answer="The meeting is on March 5 [1]."
    ).with_inputs("context", "question"),
    # ... more examples
]

# Optimize the citation enforcer
optimizer = dspy.BootstrapFewShot(metric=faithfulness_metric, max_bootstrapped_demos=4)
optimized = optimizer.compile(CitationEnforcer(), trainset=trainset)
optimized.save("optimized_citation_enforcer.json")

# Load later
enforcer = CitationEnforcer()
enforcer.load("optimized_citation_enforcer.json")

Using a cheap LM for verification

The verification step doesn't need an expensive model — a smaller model checking claims against sources works well and cuts costs:

class CostEfficientVerifier(dspy.Module):
    def __init__(self):
        self.answer = dspy.ChainOfThought(CitedAnswer)
        self.verify = dspy.Predict(CheckFaithfulness)

        # Use a cheaper model for the verification step
        cheap_lm = dspy.LM("openai/gpt-4o-mini")
        self.verify.set_lm(cheap_lm)

    def forward(self, context, question):
        result = self.answer(context=context, question=question)
        check = self.verify(context=context, answer=result.answer)
        dspy.Assert(
            check.is_faithful,
            f"Unsupported claims: {check.unsupported_claims}. "
            "Only use information from the provided sources."
        )
        return result

Step 9: Batch verification

When you need to verify many responses at once (e.g., auditing a transcript Q&A system):

import json

def verify_batch(qa_pairs, context, output_path="verification_results.json"):
    """Verify a batch of question-answer pairs against source context."""
    verifier = dspy.Predict(CheckFaithfulness)
    results = []

    for qa in qa_pairs:
        check = verifier(context=context, answer=qa["answer"])
        results.append({
            "question": qa["question"],
            "answer": qa["answer"],
            "is_faithful": check.is_faithful,
            "unsupported_claims": check.unsupported_claims,
        })

    # Summary
    faithful_count = sum(1 for r in results if r["is_faithful"])
    print(f"Faithful: {faithful_count}/{len(results)} "
          f"({faithful_count/len(results):.0%})")

    with open(output_path, "w") as f:
        json.dump(results, f, indent=2)

    return results

How backtracking works

When dspy.Assert fails:

  1. DSPy catches the assertion failure
  2. The error message is fed back to the LM as additional context
  3. The LM retries generation with the feedback ("your answer had unsupported claims X, Y")
  4. This repeats up to max_backtrack_attempts times
  5. If all retries fail, the assertion raises an error

This is why good error messages matter — they're literally the feedback the model uses to improve.

Choosing the right pattern

PatternCostLatencyBest for
Citation enforcement1 LM callLowWhen you have numbered sources
Faithfulness verification2 LM callsMediumRAG systems, doc Q&A
Self-check2 LM callsMediumGeneral fact-checking
Cross-check3 LM callsHighHigh-stakes, critical outputs
Confidence gating1 LM callLowHuman-in-the-loop systems
Cheap verifier1 expensive + 1 cheapLow-MediumCost-sensitive production

Key principles

  • Grounding beats prompting. Giving the AI sources to cite is more effective than asking it to "be accurate."
  • Assert for critical facts. Use dspy.Assert when hallucination is unacceptable (medical, legal, financial).
  • Suggest for nice-to-haves. Use dspy.Suggest when you want to flag but not block.
  • Layer your defenses. Combine retrieval + citation + verification for the strongest protection.
  • Good error messages help. The Assert message becomes the model's self-correction prompt.
  • Measure faithfulness. Use the evaluation patterns above to quantify how well your verification works — don't guess.

Additional resources

  • Use /ai-searching-docs for retrieval-augmented generation (RAG) setup
  • Use /ai-checking-outputs for general output validation (format, safety, quality)
  • Use /ai-following-rules for enforcing business rules and content policies
  • See examples.md for complete worked examples

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Coding

ai-switching-models

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

ai-building-chatbots

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

ai-taking-actions

No summary provided by upstream source.

Repository SourceNeeds Review
Coding

ai-fixing-errors

No summary provided by upstream source.

Repository SourceNeeds Review