Jury System Skill
Specialized knowledge for running synthetic user validation using the Condorcet Jury Theorem.
When to Use
-
Validating research findings resonate broadly
-
Testing PRD user stories match mental models
-
Evaluating prototype usability
-
Checking graduation criteria between phases
Core Principle
If each synthetic persona has >50% accuracy in judging whether a feature matches their needs, aggregating 100-500+ votes produces near-certain collective judgment.
Stratified Sampling
Dimension Distribution
Role Sales Rep: 40%, Sales Leader: 25%, CSM: 20%, RevOps: 15%
Tech Proficiency Novice: 25%, Intermediate: 50%, Advanced: 25%
AI Adoption Skeptic: 15% (min), Curious: 40%, Early Adopter: 35%, Power User: 10%
Critical: Always include 15% skeptics. They catch issues optimists miss.
Validation by Phase
Research Validation
Sample: 100-200 personas Pass: >60% rate resonance 4+
PRD Validation
Sample: 200-300 personas Pass: >70% rate relevance 4+
Prototype Evaluation
Sample: 300-500 personas Pass: >70% combined pass rate
Aggregation Rules
Verdict Threshold
Validated
60% rate 4+
Contested 40-60% rate 4+
Rejected <40% rate 4+
Evaluation Prompt Templates
Research Validation Prompt
You ARE {persona.name}, a {persona.role} at a {persona.company_size} company.
Your context:
- Tech comfort: {persona.tech_literacy}
- AI attitude: {persona.ai_adoption_stage}
- Your primary pain: {persona.primary_pain}
A product team has identified this pain point from customer research:
PAIN POINT: {extracted_pain_point} SUPPORTING QUOTE: "{supporting_quote}"
As yourself, respond in JSON:
{ "resonance_score": [1-5, where 1="not my problem", 5="exactly my frustration"], "perspective": "[2-3 sentences explaining why this does/doesn't resonate, in first person]", "missing_aspect": "[Optional: related pain they might have overlooked]" }
PRD User Story Validation Prompt
You ARE {persona.name}. A product team proposes this user story:
USER STORY: As a {story.persona}, I want to {story.action} so that {story.benefit}.
ACCEPTANCE CRITERIA: {story.criteria}
Evaluate as yourself, respond in JSON:
{ "relevance_score": [1-5], "clarity": "clear" | "somewhat_clear" | "confusing", "missing_from_your_perspective": "[what's missing]", "usage_frequency": "daily" | "weekly" | "monthly" | "rarely" | "never" }
Prototype Evaluation Prompt
You ARE {persona.name}, a {persona.role} at a {persona.company_size} company.
YOUR CONTEXT:
- Tech comfort: {persona.tech_literacy}
- AI trust: {persona.trust_in_ai}
- Patience for learning: {persona.patience_for_learning}
SCENARIO: {scenario.description} YOUR TASK: {scenario.task.primary_goal}
PROTOTYPE: {prototype_description}
THE COMPLETE EXPERIENCE JOURNEY:
- DISCOVERY: {discovery_mechanism} How you would first learn this feature exists.
- ACTIVATION: {activation_flow} How you would set it up / enable it for the first time.
- USAGE: {usage_description} What your first interaction looks like.
- ONGOING VALUE: {ongoing_value_description} What happens when you come back the next day / next week.
- FEEDBACK: {feedback_mechanism} How the product team would hear from you about whether this is working.
Evaluate this prototype AND the full experience in JSON:
{ "first_impression": "[What you notice first, what's unclear]", "task_walkthrough": { "steps_you_would_try": ["step 1", "step 2"], "hesitation_points": ["where you'd pause"], "would_give_up": true | false, "give_up_reason": "[if true, why]" }, "experience_journey_scores": { "discovery": { "score": [1-5], "reason": "Would I actually find this?" }, "activation": { "score": [1-5], "reason": "Could I set this up alone?" }, "usage": { "score": [1-5], "reason": "Does the first interaction make sense?" }, "ongoing_value": { "score": [1-5], "reason": "Would I come back to this?" }, "feedback_loop": { "score": [1-5], "reason": "Would I bother giving feedback?" }, "experience_coherence": [1-5, "Does the full journey feel connected?"] }, "weakest_experience_step": "[which of the 5 steps is weakest and why]", "heuristic_scores": { "visibility_of_status": { "score": [1-5], "reason": "..." }, "match_with_expectations": { "score": [1-5], "reason": "..." }, "user_control": { "score": [1-5], "reason": "..." }, "consistency": { "score": [1-5], "reason": "..." }, "error_prevention": { "score": [1-5], "reason": "..." } }, "issues": [ { "what": "[description]", "where": "[UI element or experience step]", "severity": "cosmetic" | "minor" | "major" | "catastrophic", "why_matters_to_you": "[persona-specific impact]" } ], "emotional_response": { "frustration": [1-5], "confidence": [1-5], "would_recommend": [1-5] }, "verdict": { "would_use": true | false, "reasoning": "[why/why not]" } }
Self-Consistency Filter
Run each evaluation 3x with temperature 0.7. Only count vote if 2/3 or 3/3 agree. Discard inconsistent responses.
Model Selection
Operation Model Rationale
Persona Generation Claude Haiku Cost-effective for volume
Research Validation Claude Haiku Simple resonance scoring
PRD Validation Claude Haiku Structured output
Prototype Evaluation Claude Haiku Volume of evaluations
Synthesis/Aggregation Claude Sonnet Quality of final insights
Temperature Settings:
-
Persona generation: 0.9 (maximize diversity)
-
Evaluation: 0.7 (balanced for self-consistency)
-
Synthesis: 0.3 (consistent, coherent output)
Cost Estimation
Phase Sample Size Estimated Cost
Research Validation 200 personas × 5 pains ~$0.50
PRD Validation 300 personas × 10 stories ~$1.00
Prototype Evaluation 500 personas ~$2.00
Synthesis 1 aggregation ~$0.50
Total per initiative
~$4.00
Output File Locations
Save to pm-workspace-docs/initiatives/active/[name]/jury-evaluations/ :
-
research-v1.json
-
Pain point resonance
-
prd-v1.json
-
User story validation
-
proto-v1.json
-
Usability evaluation (raw)
-
jury-report.md
-
Human-readable synthesis
-
iteration-log.md
-
Change tracking
Quality Checks
Before trusting results:
-
Sample size adequate (≥100 research, ≥200 PRD, ≥300 proto)
-
Skeptic representation ≥15%
-
All relevant archetypes represented
-
Self-consistency applied
-
Variance check (std > 0.5 on 5-point scales)
Scripts Reference
Existing scripts in pm-workspace-docs/scripts/jury-system/ :
-
simulate_jury.py
-
Run jury simulation
-
iterate_from_feedback.py
-
Generate iteration docs
This System Supplements, Not Replaces
Use for: ✅ Rapid validation between real interviews ✅ Catching obvious mismatches before investing ✅ Covering personas you haven't talked to yet
Do NOT use to: ❌ Replace actual customer conversations ❌ Make final launch decisions without real validation