evaluate-report

/evaluate:report

View evaluation results and benchmark reports for a skill or plugin.

When to Use This Skill

Use this skill when... Use alternative when...

Want to see results from a past evaluation Need to run new evaluations -> /evaluate:skill

Comparing benchmark runs over time Want improvement suggestions -> /evaluate:improve

Reviewing quality trends for a plugin Need structural compliance -> plugin-compliance-check.sh

Parameters

Parse these from $ARGUMENTS :

Parameter Default Description

required plugin/skill-name for skill or plugin-name for plugin

--latest

true Show most recent benchmark only

--history

false Show all benchmark iterations

--compare

false Compare current vs previous benchmark

Execution

Step 1: Resolve target

Determine if $ARGUMENTS refers to a skill or plugin:

Contains / : treat as plugin-name/skill-name , look for <plugin>/skills/<skill>/eval-results/benchmark.json
No / : treat as plugin-name , look for <plugin>/eval-results/plugin-benchmark.json

If the target file does not exist, report that no evaluation results are available and suggest running /evaluate:skill or /evaluate:plugin .

Step 2: Load results

Skill-level report (--latest or default): Read benchmark.json and display:

Evaluation Report: <plugin/skill-name>

Last run: <timestamp> Runs per eval: N

Metric	With Skill	Baseline	Delta
Pass Rate	85%	42%	+43%
Duration	14s	12s	+2s

Per-Eval Results

Eval ID	Description	Pass Rate	Failures
eval-001	Basic usage	100%	—
eval-002	Edge case	67%	assertion X

Skill-level history (--history ): Read history.json and display improvement iterations:

Evaluation History: <plugin/skill-name>

Version	Date	Pass Rate	Changes
v3 (current)	2026-03-04	85%	Improved step 3 instructions
v2	2026-03-01	72%	Added error handling
v1	2026-02-28	55%	Initial evaluation

Skill-level comparison (--compare ): Read current and previous benchmarks, show delta:

Comparison: <plugin/skill-name>

Metric	Previous	Current	Delta
Pass Rate	72%	85%	+13%
Duration	16s	14s	-2s

Improved evals: eval-002, eval-004 Regressed evals: none

Plugin-level report: Read plugin-benchmark.json and display:

Plugin Report: <plugin-name>

Skill	Evals	Pass Rate	Status
skill-a	4	100%	PASS
skill-b	3	67%	NEEDS WORK

Overall: 82% pass rate Skills evaluated: N / M total

Agentic Optimizations

Context Command

Find skill benchmark cat <plugin>/skills/<skill>/eval-results/benchmark.json | jq .summary

Find plugin benchmark cat <plugin>/eval-results/plugin-benchmark.json | jq .summary

Check for history cat <plugin>/skills/<skill>/eval-results/history.json | jq '.iterations | length'

List available results find <plugin> -name benchmark.json -o -name plugin-benchmark.json

Quick Reference

Flag Description

--latest

Most recent benchmark (default)

--history

Show all iterations

--compare

Compare current vs previous

evaluate-report

Safety Notice

Copy this and send it to your AI assistant to learn

Evaluation Report: <plugin/skill-name>

Per-Eval Results

Evaluation History: <plugin/skill-name>

Comparison: <plugin/skill-name>

Plugin Report: <plugin-name>

Source Transparency

Related Skills

ruff linting

imagemagick-conversion

jq json processing

api-testing