sadd:judge

The evaluation is report-only - findings are presented without automatic changes.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "sadd:judge" with this command: npx skills add neolabhq/context-engineering-kit/neolabhq-context-engineering-kit-sadd-judge

Judge Command

The evaluation is report-only - findings are presented without automatic changes.

Your Workflow

Phase 1: Context Extraction

Before launching the judge, identify what needs evaluation:

Identify the work to evaluate:

  • Review conversation history for completed work

  • If arguments provided: Use them to focus on specific aspects

  • If unclear: Ask user "What work should I evaluate? (code changes, analysis, documentation, etc.)"

Extract evaluation context:

  • Original task or request that prompted the work

  • The actual output/result produced

  • Files created or modified (with brief descriptions)

  • Any constraints, requirements, or acceptance criteria mentioned

Provide scope for user:

Evaluation Scope:

  • Original request: [summary]
  • Work produced: [description]
  • Files involved: [list]
  • Evaluation focus: [from arguments or "general quality"]

Launching judge sub-agent...

IMPORTANT: Pass only the extracted context to the judge - not the entire conversation. This prevents context pollution and enables focused assessment.

Phase 2: Launch Judge Sub-Agent

Use the Task tool to spawn a single judge agent with the following prompt and context. Adjust criteria rubric and weights to match solution type and complexity, for example:

  • Code Quality

  • Documentation Quality

  • Test Coverage

  • Security

  • Performance

  • Usability

  • Reliability

  • Maintainability

  • Scalability

  • Cost-effectiveness

  • Compliance

  • Accessibility

  • Performance

Judge Agent Prompt:

You are an Expert Judge evaluating the quality of work produced in a development session.

Work Under Evaluation

[ORIGINAL TASK] {paste the original request/task} [/ORIGINAL TASK]

[WORK OUTPUT] {summary of what was created/modified} [/WORK OUTPUT]

[FILES INVOLVED] {list of files with brief descriptions} [/FILES INVOLVED]

[EVALUATION FOCUS] {from arguments, or "General quality assessment"} [/EVALUATION FOCUS]

Read ${CLAUDE_PLUGIN_ROOT}/tasks/judge.md and execute.

Evaluation Criteria

Criterion 1: Instruction Following (weight: 0.30)

Does the work follow all explicit instructions and requirements?

Guiding Questions:

  • Does the output fulfill the original request?
  • Were all explicit requirements addressed?
  • Are there gaps or unexpected deviations?
LevelScoreDescription
Excellent5All instructions followed precisely, no deviations
Good4Minor deviations that do not affect outcome
Adequate3Major instructions followed, minor ones missed
Poor2Significant instructions ignored
Failed1Fundamentally misunderstood the task

Criterion 2: Output Completeness (weight: 0.25)

Are all requested aspects thoroughly covered?

Guiding Questions:

  • Are all components of the request addressed?
  • Is there appropriate depth for each component?
  • Are there obvious gaps or missing pieces?
LevelScoreDescription
Excellent5All aspects thoroughly covered with appropriate depth
Good4Most aspects covered with minor gaps
Adequate3Key aspects covered, some notable gaps
Poor2Major aspects missing
Failed1Fundamental aspects not addressed

Criterion 3: Solution Quality (weight: 0.25)

Is the approach appropriate and well-implemented?

Guiding Questions:

  • Is the chosen approach sound and appropriate?
  • Does the implementation follow best practices?
  • Are there correctness issues or errors?
LevelScoreDescription
Excellent5Optimal approach, clean implementation, best practices followed
Good4Good approach with minor issues
Adequate3Reasonable approach, some quality concerns
Poor2Problematic approach or significant quality issues
Failed1Fundamentally flawed approach

Criterion 4: Reasoning Quality (weight: 0.10)

Is the reasoning clear, logical, and well-documented?

Guiding Questions:

  • Is the decision-making transparent?
  • Were appropriate methods/tools used?
  • Can someone understand why this approach was taken?
LevelScoreDescription
Excellent5Clear, logical reasoning throughout
Good4Generally sound reasoning with minor gaps
Adequate3Basic reasoning present
Poor2Reasoning unclear or flawed
Failed1No apparent reasoning

Criterion 5: Response Coherence (weight: 0.10)

Is the output well-structured and easy to understand?

Guiding Questions:

  • Is the output organized logically?
  • Can someone unfamiliar with the task understand it?
  • Is it professionally presented?
LevelScoreDescription
Excellent5Well-structured, clear, professional
Good4Generally coherent with minor issues
Adequate3Understandable but could be clearer
Poor2Difficult to follow
Failed1Incoherent or confusing

Phase 3: Process and Present Results

After receiving the judge's evaluation:

Validate the evaluation:

  • Check that all criteria have scores in valid range (1-5)

  • Verify each score has supporting justification with evidence

  • Confirm weighted total calculation is correct

  • Check for contradictions between justification and score

  • Verify self-verification was completed with documented adjustments

If validation fails:

  • Note the specific issue

  • Request clarification or re-evaluation if needed

Present results to user:

  • Display the full evaluation report

  • Highlight the verdict and key findings

  • Offer follow-up options:

  • Address specific improvements

  • Request clarification on any judgment

  • Proceed with the work as-is

Scoring Interpretation

Score Range Verdict Interpretation Recommendation

4.50 - 5.00 EXCELLENT Exceptional quality, exceeds expectations Ready as-is

4.00 - 4.49 GOOD Solid quality, meets professional standards Minor improvements optional

3.50 - 3.99 ACCEPTABLE Adequate but has room for improvement Improvements recommended

3.00 - 3.49 NEEDS IMPROVEMENT Below standard, requires work Address issues before use

1.00 - 2.99 INSUFFICIENT Does not meet basic requirements Significant rework needed

Important Guidelines

  • Context Isolation: Pass only relevant context to the judge - not the entire conversation

  • Justification First: Always require evidence and reasoning BEFORE the score

  • Evidence-Based: Every score must cite specific evidence (file paths, line numbers, quotes)

  • Bias Mitigation: Explicitly warn against length bias, verbosity bias, and authority bias

  • Be Objective: Base assessments on evidence and rubric definitions, not preferences

  • Be Specific: Cite exact locations, not vague observations

  • Be Constructive: Frame criticism as opportunities for improvement with impact context

  • Consider Context: Account for stated constraints, complexity, and requirements

  • Report Confidence: Lower confidence when evidence is ambiguous or criteria unclear

  • Single Judge: This command uses one focused judge for context isolation

Notes

  • This is a report-only command - it evaluates but does not modify work

  • The judge operates with fresh context for unbiased assessment

  • Scores are calibrated to professional development standards

  • Low scores indicate improvement opportunities, not failures

  • Use the evaluation to inform next steps and iterations

  • Pass threshold (3.5/5.0) represents acceptable quality for general use

  • Adjust threshold based on criticality (4.0+ for critical operations)

  • Low confidence evaluations may warrant human review

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

ddd:software-architecture

No summary provided by upstream source.

Repository SourceNeeds Review
General

sdd:plan

No summary provided by upstream source.

Repository SourceNeeds Review
General

sdd:implement

No summary provided by upstream source.

Repository SourceNeeds Review
General

sdd:brainstorm

No summary provided by upstream source.

Repository SourceNeeds Review