Eval Accuracy
Use this skill to evaluate how factually accurate an assistant response is.
Inputs
Require:
- The assistant response text to evaluate.
Internal Rubric (1–5)
5 = Factually correct, no misleading claims, no hallucinations, claims are well-supported or appropriately qualified
4 = Mostly correct, minor imprecision that does not materially affect meaning
3 = Partially correct, contains one significant inaccuracy or unsupported claim
2 = Multiple inaccuracies or misleading statements
1 = Fundamentally incorrect, fabricated, or contradicts known facts
Workflow
- Evaluate factual claims in the response.
- Compare them against widely accepted knowledge.
- Score accuracy on a 1-5 integer scale using the rubric only.
- Write concise rationale tied directly to rubric criteria.
- Produce actionable suggestions that improve factual correctness.
Output Contract
Return JSON only. Do not include markdown, backticks, prose, or extra keys.
Use exactly this schema:
{ "dimension": "accuracy", "score": 1, "rationale": "...", "improvement_suggestions": [ "..." ] }
Hard Rules
dimensionmust always equal"accuracy".scoremust be an integer from 1 to 5.rationalemust be concise (max 3 sentences).- Do not include step-by-step reasoning.
improvement_suggestionsmust be a non-empty array of concrete edits.- Never output text outside the JSON object.