subagent-testing

Subagent Testing - TDD for Skills

Safety Notice

This listing is imported from skills.sh public index metadata. Review upstream SKILL.md and repository scripts before running.

Copy this and send it to your AI assistant to learn

Install skill "subagent-testing" with this command: npx skills add athola/claude-night-market/athola-claude-night-market-subagent-testing

Subagent Testing - TDD for Skills

Test skills with fresh subagent instances to prevent priming bias and validate effectiveness.

Table of Contents

  • Overview

  • Why Fresh Instances Matter

  • Testing Methodology

  • Quick Start

  • Detailed Testing Guide

  • Success Criteria

Overview

Fresh instances prevent priming: Each test uses a new Claude conversation to verify the skill's impact is measured, not conversation history effects.

Why Fresh Instances Matter

The Priming Problem

Running tests in the same conversation creates bias:

  • Prior context influences responses

  • Skill effects get mixed with conversation history

  • Can't isolate skill's true impact

Fresh Instance Benefits

  • Isolation: Each test starts clean

  • Reproducibility: Consistent baseline state

  • Measurement: Clear before/after comparison

  • Validation: Proves skill effectiveness, not priming

Testing Methodology

Three-phase TDD-style approach:

Phase 1: Baseline Testing (RED)

Test without skill to establish baseline behavior.

Phase 2: With-Skill Testing (GREEN)

Test with skill loaded to measure improvements.

Phase 3: Rationalization Testing (REFACTOR)

Test skill's anti-rationalization guardrails.

Quick Start

1. Create baseline tests (without skill)

Use 5 diverse scenarios

Document full responses

2. Create with-skill tests (fresh instances)

Load skill explicitly

Use identical prompts

Compare to baseline

3. Create rationalization tests

Test anti-rationalization patterns

Verify guardrails work

Detailed Testing Guide

For complete testing patterns, examples, and templates:

  • Testing Patterns - Full TDD methodology

  • Test Examples - Baseline, with-skill, rationalization tests

  • Analysis Templates - Scoring and comparison frameworks

Success Criteria

  • Baseline: Document 5+ diverse baseline scenarios

  • Improvement: ≥50% improvement in skill-related metrics

  • Consistency: Results reproducible across fresh instances

  • Rationalization Defense: Guardrails prevent ≥80% of rationalization attempts

See Also

  • skill-authoring: Creating effective skills

  • bulletproof-skill: Anti-rationalization patterns

  • test-skill: Automated skill testing command

Source Transparency

This detail page is rendered from real SKILL.md content. Trust labels are metadata-based hints, not a safety guarantee.

Related Skills

Related by shared tags or category signals.

Automation

workflow-monitor

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

workflow-setup

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

workflow-improvement

No summary provided by upstream source.

Repository SourceNeeds Review
Automation

agent-teams

No summary provided by upstream source.

Repository SourceNeeds Review