Usability Frameworks
Comprehensive frameworks and methodologies for planning, conducting, and analyzing usability tests to improve user experience.
When to Use This Skill
Auto-loaded by agents:
- research-ops
- For usability testing and heuristic evaluation
Use when you need:
-
Planning usability tests
-
Conducting user testing sessions
-
Evaluating interface designs
-
Identifying usability problems
-
Testing prototypes or live products
-
Applying Nielsen's heuristics
-
Measuring usability metrics
Core Concepts
What is Usability Testing?
Usability testing is a method for evaluating a product by testing it with representative users. Users attempt to complete typical tasks while observers watch, listen, and take notes.
Purpose: Identify usability problems, discover opportunities for improvement, and learn about user behavior and preferences.
When to use:
-
Before development (testing prototypes)
-
During development (iterative testing)
-
After launch (validation and optimization)
-
Before major redesigns
The Five Usability Quality Components (Jakob Nielsen)
-
Learnability: How easy is it for users to accomplish basic tasks the first time?
-
Efficiency: How quickly can users perform tasks once they've learned the design?
-
Memorability: Can users remember how to use it after time away?
-
Errors: How many errors do users make, how severe, and how easily can they recover?
-
Satisfaction: How pleasant is it to use the design?
Usability Testing Methodologies
- Moderated Testing
Setup: Researcher guides participants through tasks in real-time Location: In-person or remote (video call)
Best for:
-
Early-stage prototypes needing clarification
-
Complex products requiring guidance
-
Exploring "why" behind user behavior
-
Uncovering emotional reactions
Process:
-
Welcome and set expectations
-
Pre-task questions (background, experience)
-
Task scenarios with think-aloud protocol
-
Post-task questions and discussion
-
Wrap-up and thank you
Advantages:
-
Rich qualitative insights
-
Can probe deeper into issues
-
Observe non-verbal cues
-
Clarify misunderstandings immediately
Limitations:
-
More time-intensive (30-60 min per session)
-
Researcher bias possible
-
Smaller sample sizes
-
Scheduling logistics
- Unmoderated Testing
Setup: Participants complete tasks independently, recorded for later review Location: Remote, on participant's own schedule
Best for:
-
Mature products with clear tasks
-
Large sample sizes needed
-
Quick turnaround required
-
Benchmarking and metrics
Process:
-
Automated instructions and consent
-
Participants record screen/audio while completing tasks
-
Automated post-task surveys
-
Researcher reviews recordings later
Advantages:
-
Faster data collection
-
Larger sample sizes
-
More natural environment
-
Lower cost per participant
Limitations:
-
Can't probe or clarify
-
May miss nuanced insights
-
Technical issues harder to resolve
-
Participants may skip think-aloud
- Hybrid Approaches
Combination methods:
-
Moderated first impressions + unmoderated task completion
-
Unmoderated testing + follow-up interviews with interesting cases
-
Moderated pilot + unmoderated scale testing
Nielsen's 10 Usability Heuristics
Quick reference for evaluating interfaces. See references/nielsens-10-heuristics.md for detailed explanations and examples.
-
Visibility of system status - Keep users informed
-
Match between system and real world - Speak users' language
-
User control and freedom - Provide escape hatches
-
Consistency and standards - Follow platform conventions
-
Error prevention - Prevent problems before they occur
-
Recognition rather than recall - Minimize memory load
-
Flexibility and efficiency of use - Accelerators for experts
-
Aesthetic and minimalist design - Remove irrelevant information
-
Help users recognize, diagnose, and recover from errors - Plain language error messages
-
Help and documentation - Provide when needed
Think-Aloud Protocol
What It Is
Participants verbalize their thoughts while completing tasks, providing real-time insight into their mental model.
Types
Concurrent think-aloud: Speak while performing tasks
-
More natural thought flow
-
May affect task performance slightly
Retrospective think-aloud: Review recording and explain thinking after
-
Doesn't disrupt natural behavior
-
May forget or rationalize thoughts
Facilitating Think-Aloud
Prompts to use:
-
"What are you thinking right now?"
-
"What are you looking for?"
-
"What would you expect to happen?"
-
"Is this what you expected?"
Don't:
-
Ask leading questions
-
Provide hints or solutions
-
Interrupt natural flow too often
-
Make participants feel tested
See references/think-aloud-protocol-guide.md for detailed facilitation techniques.
Task Scenario Design
Good task scenarios are critical to meaningful usability test results.
Characteristics of Good Task Scenarios
Realistic: Based on actual user goals Specific: Clear endpoint/success criteria Self-contained: Provide all necessary context Actionable: Clear starting point Not prescriptive: Don't tell them how to do it
Example Transformation
Poor: "Click on the 'My Account' link and change your password"
- Too prescriptive, tells them exactly where to click
Good: "You've heard about recent security breaches and want to make your account more secure. Update your account to use a stronger password."
- Realistic motivation, clear goal, doesn't prescribe path
Task Complexity Levels
Simple tasks (1-2 steps): Establish baseline usability Medium tasks (3-5 steps): Test core workflows Complex tasks (6+ steps): Evaluate overall experience and error recovery
See assets/task-scenario-template.md for ready-to-use templates.
Severity Rating Framework
Not all usability issues are equal. Prioritize fixes based on severity.
Three-Factor Severity Rating
Frequency: How often does this issue occur?
-
High: > 50% of users encounter
-
Medium: 10-50% encounter
-
Low: < 10% encounter
Impact: When it occurs, how badly does it affect users?
-
High: Prevents task completion / causes data loss
-
Medium: Causes frustration or delays
-
Low: Minor annoyance
Persistence: Do users overcome it with experience?
-
High: Problem doesn't go away
-
Medium: Users learn to avoid/work around
-
Low: One-time problem only
Combined Severity Ratings
Critical (P0): High frequency + High impact Serious (P1): High frequency + Medium impact, OR Medium frequency + High impact Moderate (P2): High frequency + Low impact, OR Medium frequency + Medium impact, OR Low frequency + High impact Minor (P3): Everything else
See assets/severity-rating-guide.md for detailed rating criteria and examples.
Usability Metrics
Quantitative Metrics
Task Success Rate: % of participants who complete task successfully
-
Binary: Did they complete it? (yes/no)
-
Partial credit: Did they complete most of it?
Time on Task: How long to complete (for successful completions)
- Compare to baseline or competitor benchmarks
Error Rate: Number of errors per task
- Define what counts as an error for each task
Clicks/Taps to Task Completion: Efficiency measure
- More relevant for well-defined tasks
Standardized Questionnaires
SUS (System Usability Scale):
-
10 questions, 5-point Likert scale
-
Score 0-100 (industry avg ~68)
-
Quick, reliable, easy to administer
-
Good for comparing versions or benchmarking
UMUX (Usability Metric for User Experience):
-
4 questions, lighter than SUS
-
Similar reliability
-
Faster for participants
SEQ (Single Ease Question):
-
"Overall, how difficult or easy was the task to complete?" (1-7)
-
One question per task
-
Immediate subjective difficulty rating
Other scales:
-
SUPR-Q (for websites)
-
PSSUQ (post-study)
-
NASA-TLX (cognitive load)
Qualitative Insights
Observed behaviors:
-
Hesitations and confusion
-
Error patterns
-
Unexpected paths
-
Verbal frustrations
Verbalized thoughts (think-aloud):
-
Mental model mismatches
-
Expectation violations
-
Pleasantly surprising discoveries
Sample Size Guidelines
For Qualitative Insights
Nielsen's recommendation: 5 users finds ~85% of usability problems
-
Diminishing returns after 5
-
Run 3+ small rounds instead of 1 large round
-
Iterate between rounds
Reality check:
-
5 is a minimum, not ideal
-
Complex products may need 8-10
-
Multiple user types need 5 each
For Quantitative Metrics
Benchmarking: 20+ users per user group A/B testing: Depends on effect size and desired confidence Statistical significance: Use power analysis calculators
Planning Your Usability Test
- Define Objectives
What decisions will this research inform?
-
Redesign priorities?
-
Feature cut decisions?
-
Success of recent changes?
- Identify User Segments
Who needs to be tested?
-
New vs. experienced users?
-
Different roles or use cases?
-
Different devices or contexts?
- Select Tasks
What tasks represent success?
-
Most critical user goals
-
Most frequent tasks
-
Recently changed features
-
Known problem areas
- Choose Methodology
Moderated, unmoderated, or hybrid?
- Consider timeline, budget, research questions
- Create Test Script
See assets/usability-test-script-template.md for a ready-to-use structure including:
-
Welcome and consent
-
Background questions
-
Task instructions
-
Probing questions
-
Wrap-up
- Recruit Participants
-
Define screening criteria
-
Aim for 5-10 per user segment
-
Plan for no-shows (recruit 20% extra)
-
Offer appropriate incentives
- Conduct Pilot Test
-
Test with colleague or friend
-
Validate timing
-
Check recording setup
-
Refine unclear tasks
- Run Sessions
-
Stay neutral and encouraging
-
Observe without interfering
-
Take detailed notes
-
Record if permitted
- Analyze and Synthesize
-
Code issues by severity
-
Identify patterns across participants
-
Link issues to heuristics violated
-
Quantify task success and time
- Report and Recommend
-
Prioritized issue list
-
Video clips of critical issues
-
Recommendations with rationale
-
Quick wins vs. strategic fixes
Integration with Product Development
When to Test
Discovery phase: Test competitors or analogous products Concept phase: Test paper prototypes or wireframes Design phase: Test high-fidelity mockups Development phase: Test working builds iteratively Pre-launch: Validate before release Post-launch: Identify optimization opportunities
Continuous Usability Testing
Build it into your process:
-
Weekly or bi-weekly test sessions
-
Rotating focus (new features, established flows, mobile vs. desktop)
-
Standing recruiting panel
-
Lightweight reporting to team
Ready-to-Use Templates
We provide templates to accelerate your usability testing:
In assets/ :
-
usability-test-script-template.md: Complete moderator script structure
-
task-scenario-template.md: Framework for creating effective task scenarios
-
severity-rating-guide.md: Detailed criteria for rating usability issues
In references/ :
-
nielsens-10-heuristics.md: Deep dive into each heuristic with examples
-
think-aloud-protocol-guide.md: Advanced facilitation techniques and troubleshooting
Common Pitfalls to Avoid
-
Leading participants: "Was that easy?" → "How would you describe that experience?"
-
Testing the wrong tasks: Tasks that aren't real user goals
-
Over-explaining: Let users struggle and discover issues naturally
-
Ignoring severity: Fixing cosmetic issues while critical issues remain
-
Testing too late: After it's expensive to change
-
Not iterating: One-and-done testing instead of continuous improvement
-
Confusing usability with preference: "I like green" ≠ usability issue
-
Sample bias: Testing only power users or only complete novices
Further Learning
Books:
-
"Rocket Surgery Made Easy" by Steve Krug
-
"Handbook of Usability Testing" by Jeffrey Rubin
-
"Moderating Usability Tests" by Joseph Dumas
Online resources:
-
Nielsen Norman Group articles
-
Usability.gov
-
Baymard Institute research
This skill provides the foundation for conducting effective usability testing. Use the templates in assets/ for quick starts and references/ for deeper dives into specific techniques.