jury-system

Jury System Skill

Specialized knowledge for running synthetic user validation using the Condorcet Jury Theorem.

When to Use

Validating research findings resonate broadly
Testing PRD user stories match mental models
Evaluating prototype usability
Checking graduation criteria between phases

Core Principle

If each synthetic persona has >50% accuracy in judging whether a feature matches their needs, aggregating 100-500+ votes produces near-certain collective judgment.

Stratified Sampling

Dimension Distribution

Role Sales Rep: 40%, Sales Leader: 25%, CSM: 20%, RevOps: 15%

Tech Proficiency Novice: 25%, Intermediate: 50%, Advanced: 25%

AI Adoption Skeptic: 15% (min), Curious: 40%, Early Adopter: 35%, Power User: 10%

Critical: Always include 15% skeptics. They catch issues optimists miss.

Validation by Phase

Research Validation

Sample: 100-200 personas Pass: >60% rate resonance 4+

PRD Validation

Sample: 200-300 personas Pass: >70% rate relevance 4+

Prototype Evaluation

Sample: 300-500 personas Pass: >70% combined pass rate

Aggregation Rules

Verdict Threshold

Validated

60% rate 4+

Contested 40-60% rate 4+

Rejected <40% rate 4+

Evaluation Prompt Templates

Research Validation Prompt

You ARE {persona.name}, a {persona.role} at a {persona.company_size} company.

Your context:

Tech comfort: {persona.tech_literacy}
AI attitude: {persona.ai_adoption_stage}
Your primary pain: {persona.primary_pain}

A product team has identified this pain point from customer research:

PAIN POINT: {extracted_pain_point} SUPPORTING QUOTE: "{supporting_quote}"

As yourself, respond in JSON:

{ "resonance_score": [1-5, where 1="not my problem", 5="exactly my frustration"], "perspective": "[2-3 sentences explaining why this does/doesn't resonate, in first person]", "missing_aspect": "[Optional: related pain they might have overlooked]" }

PRD User Story Validation Prompt

You ARE {persona.name}. A product team proposes this user story:

USER STORY: As a {story.persona}, I want to {story.action} so that {story.benefit}.

ACCEPTANCE CRITERIA: {story.criteria}

Evaluate as yourself, respond in JSON:

{ "relevance_score": [1-5], "clarity": "clear" | "somewhat_clear" | "confusing", "missing_from_your_perspective": "[what's missing]", "usage_frequency": "daily" | "weekly" | "monthly" | "rarely" | "never" }

Prototype Evaluation Prompt

You ARE {persona.name}, a {persona.role} at a {persona.company_size} company.

YOUR CONTEXT:

Tech comfort: {persona.tech_literacy}
AI trust: {persona.trust_in_ai}
Patience for learning: {persona.patience_for_learning}

SCENARIO: {scenario.description} YOUR TASK: {scenario.task.primary_goal}

PROTOTYPE: {prototype_description}

THE COMPLETE EXPERIENCE JOURNEY:

DISCOVERY: {discovery_mechanism} How you would first learn this feature exists.
ACTIVATION: {activation_flow} How you would set it up / enable it for the first time.
USAGE: {usage_description} What your first interaction looks like.
ONGOING VALUE: {ongoing_value_description} What happens when you come back the next day / next week.
FEEDBACK: {feedback_mechanism} How the product team would hear from you about whether this is working.

Evaluate this prototype AND the full experience in JSON:

{ "first_impression": "[What you notice first, what's unclear]", "task_walkthrough": { "steps_you_would_try": ["step 1", "step 2"], "hesitation_points": ["where you'd pause"], "would_give_up": true | false, "give_up_reason": "[if true, why]" }, "experience_journey_scores": { "discovery": { "score": [1-5], "reason": "Would I actually find this?" }, "activation": { "score": [1-5], "reason": "Could I set this up alone?" }, "usage": { "score": [1-5], "reason": "Does the first interaction make sense?" }, "ongoing_value": { "score": [1-5], "reason": "Would I come back to this?" }, "feedback_loop": { "score": [1-5], "reason": "Would I bother giving feedback?" }, "experience_coherence": [1-5, "Does the full journey feel connected?"] }, "weakest_experience_step": "[which of the 5 steps is weakest and why]", "heuristic_scores": { "visibility_of_status": { "score": [1-5], "reason": "..." }, "match_with_expectations": { "score": [1-5], "reason": "..." }, "user_control": { "score": [1-5], "reason": "..." }, "consistency": { "score": [1-5], "reason": "..." }, "error_prevention": { "score": [1-5], "reason": "..." } }, "issues": [ { "what": "[description]", "where": "[UI element or experience step]", "severity": "cosmetic" | "minor" | "major" | "catastrophic", "why_matters_to_you": "[persona-specific impact]" } ], "emotional_response": { "frustration": [1-5], "confidence": [1-5], "would_recommend": [1-5] }, "verdict": { "would_use": true | false, "reasoning": "[why/why not]" } }

Self-Consistency Filter

Run each evaluation 3x with temperature 0.7. Only count vote if 2/3 or 3/3 agree. Discard inconsistent responses.

Model Selection

Operation Model Rationale

Persona Generation Claude Haiku Cost-effective for volume

Research Validation Claude Haiku Simple resonance scoring

PRD Validation Claude Haiku Structured output

Prototype Evaluation Claude Haiku Volume of evaluations

Synthesis/Aggregation Claude Sonnet Quality of final insights

Temperature Settings:

Persona generation: 0.9 (maximize diversity)
Evaluation: 0.7 (balanced for self-consistency)
Synthesis: 0.3 (consistent, coherent output)

Cost Estimation

Phase Sample Size Estimated Cost

Research Validation 200 personas × 5 pains ~$0.50

PRD Validation 300 personas × 10 stories ~$1.00

Prototype Evaluation 500 personas ~$2.00

Synthesis 1 aggregation ~$0.50

Total per initiative

~$4.00

Output File Locations

Save to pm-workspace-docs/initiatives/active/[name]/jury-evaluations/ :

research-v1.json
Pain point resonance
prd-v1.json
User story validation
proto-v1.json
Usability evaluation (raw)
jury-report.md
Human-readable synthesis
iteration-log.md
Change tracking

Quality Checks

Before trusting results:

Sample size adequate (≥100 research, ≥200 PRD, ≥300 proto)
Skeptic representation ≥15%
All relevant archetypes represented
Self-consistency applied
Variance check (std > 0.5 on 5-point scales)

Scripts Reference

Existing scripts in pm-workspace-docs/scripts/jury-system/ :

simulate_jury.py
Run jury simulation
iterate_from_feedback.py
Generate iteration docs

This System Supplements, Not Replaces

Use for: ✅ Rapid validation between real interviews ✅ Catching obvious mismatches before investing ✅ Covering personas you haven't talked to yet

Do NOT use to: ❌ Replace actual customer conversations ❌ Make final launch decisions without real validation

Safety Notice

Copy this and send it to your AI assistant to learn

Source Transparency

Related Skills

competitive-analysis

roadmap-analysis

placement-analysis

research-analyst