eval-session-scorecard

Evaluates an entire multi-turn conversation (session) using the 7 core dimensions, returning strict JSON session-level aggregates plus per-turn scorecards.

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "eval-session-scorecard" with this command: npx skills add whitespectre/ai-assistant-evals/whitespectre-ai-assistant-evals-eval-session-scorecard

Eval Session Scorecard

Use this skill to evaluate a full conversation (multiple user/assistant turns) for continuous monitoring at the session level.

Inputs

Require:

  • A conversation transcript containing multiple user/assistant turns.
  • The transcript must clearly label turns as "User:" and "Assistant:".

Workflow

  1. Parse the transcript into assistant turns and their immediate context (the preceding user turn and any relevant prior context).
  2. For each assistant turn, run eval-core-scorecard on that assistant turn using:
    • User request/context: the preceding user message (plus brief prior context if needed).
    • Assistant response: the assistant message for that turn.
  3. Collect per-turn outputs (each is the JSON object returned by eval-core-scorecard).
  4. Compute session-level aggregates:
    • session_average_score: mean of each turn’s average_score (float allowed).
    • dimension_averages: for each of the 7 dimensions, mean of that dimension’s score across turns.
    • lowest_scoring_turns: list the 3 assistant turns with lowest average_score (include turn index + average_score).
  5. Return a single strict JSON object.

Output Contract

Return JSON only. Do not include markdown, backticks, prose, or extra keys.

Use exactly this schema:

{ "dimension": "session_scorecard", "assistant_turn_count": 0, "turn_count": 0, "session_average_score": 0, "dimension_averages": { "clarity": 0, "relevance": 0, "accuracy": 0, "tone_empathy": 0, "guidance_actionability": 0, "conversation_flow": 0, "boundary_adherence": 0 }, "lowest_scoring_turns": [ { "assistant_turn_index": 1, "average_score": 0 }, { "assistant_turn_index": 1, "average_score": 0 }, { "assistant_turn_index": 1, "average_score": 0 } ], "turn_scorecards": [ { "assistant_turn_index": 1, "user_message": "...", "assistant_message": "...", "scorecard": { "dimension": "core_scorecard", "average_score": 0, "results": [ { "dimension": "clarity", "score": 1, "rationale": "...", "improvement_suggestions": ["..."] }, { "dimension": "relevance", "score": 1, "rationale": "...", "improvement_suggestions": ["..."] }, { "dimension": "accuracy", "score": 1, "rationale": "...", "improvement_suggestions": ["..."] }, { "dimension": "tone_empathy", "score": 1, "rationale": "...", "improvement_suggestions": ["..."] }, { "dimension": "guidance_actionability", "score": 1, "rationale": "...", "improvement_suggestions": ["..."] }, { "dimension": "conversation_flow", "score": 1, "rationale": "...", "improvement_suggestions": ["..."] }, { "dimension": "boundary_adherence", "score": 1, "rationale": "...", "improvement_suggestions": ["..."] } ] } } ] }

Hard Rules

  • dimension must always equal "session_scorecard".
  • Output must be valid JSON and include all keys exactly as shown.
  • turn_scorecards must include one entry per assistant turn found in the transcript.
  • assistant_turn_index starts at 1 and increments by 1 for each assistant turn in the transcript.
  • Do not include step-by-step reasoning.
  • Never output text outside the JSON object.

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

General

eval-accuracy

No summary provided by upstream source.

Repository SourceNeeds Review
General

eval-relevance

No summary provided by upstream source.

Repository SourceNeeds Review
General

eval-clarity

No summary provided by upstream source.

Repository SourceNeeds Review
General

eval-core-scorecard

No summary provided by upstream source.

Repository SourceNeeds Review
eval-session-scorecard | V50.AI