/dm:ab-test-plan

Purpose

Dedicated A/B test planning with a structured hypothesis framework, statistical sample size calculation, variant design, and monitoring plan. Produces a complete experiment specification with statistical rigor and clear decision criteria.

Input Required

The user must provide (or will be prompted for):

Element to test: The specific page, component, or experience being tested (landing page headline, CTA button, pricing page layout, email subject line, checkout flow, form design, etc.)
Current conversion rate: Baseline conversion rate for the metric being tested (or best estimate)
Desired minimum detectable effect (MDE): The smallest improvement worth detecting (e.g., 10% relative lift)
Daily traffic or impressions: Average daily visitors or impressions to the test page or element
Significance level: Desired confidence level, default 95% (alpha = 0.05)
Statistical power: Desired power, default 80% (beta = 0.20)
Number of variants: How many variants to test (default 1 treatment + 1 control; more for multivariate)
Business context: What prompted the test idea (analytics data, user feedback, competitive analysis, heuristic audit, stakeholder request)

Process

Load brand context: Read ~/.claude-marketing/brands/_active-brand.json for the active slug, then load ~/.claude-marketing/brands/{slug}/profile.json. Apply voice, compliance, industry context. Check guidelines/_manifest.json for restrictions, messaging, channel styles, voice-and-tone rules, and templates. If a template matching this command exists in ~/.claude-marketing/brands/{slug}/templates/, apply its format. If no brand exists, prompt for /dm:brand-setup or proceed with defaults.
Check campaign history: Run python campaign-tracker.py --brand {slug} --action list-campaigns to review past test results and avoid re-testing already-validated hypotheses.
Run sample size calculator: Execute scripts/sample-size-calculator.py with baseline conversion rate, minimum detectable effect, significance level, and power to determine required sample size per variant.
Build hypothesis statement: Structure the hypothesis in the format: "If [specific change], then [primary metric] will [direction and magnitude] because [rationale grounded in data, user research, or established UX principle]."
Design test variants: Define the control (current experience) and one or more treatment variants. Specify exactly what changes in each variant -- copy, layout, color, imagery, flow, or functionality. For multivariate tests, define the variable matrix and interaction effects to watch.
Define primary and secondary metrics: Identify the primary success metric (the one that determines the winner) and secondary metrics to monitor for unintended effects (e.g., testing CTA click rate as primary, but watching bounce rate, time on page, and downstream conversion as secondary guardrails).
Calculate test duration: Based on sample size requirements and daily traffic, estimate the number of days needed. Ensure the duration spans at least one full business cycle (7 days minimum) to account for day-of-week variation. Flag if duration exceeds 8 weeks (validity risk).
Create monitoring plan: Define interim checkpoints for technical QA (not statistical peeking), sample ratio mismatch (SRM) detection, and guardrail metric alerts that would trigger early test stoppage for data quality or user experience reasons.
Define stopping rules and decision criteria: Specify when to call the test (sample size reached + significance threshold met), when to stop early (guardrail violations, SRM detected, implementation bugs), and the protocol for inconclusive results (extend, redesign, or implement based on directional signal).
Assess traffic feasibility: Verify that the daily traffic can reach the required sample size within a reasonable timeframe (under 8 weeks). If traffic is insufficient, recommend reducing the number of variants, increasing the MDE, or using qualitative methods instead.
Document pre-registration: Record the test plan before launch -- hypothesis, metrics, sample size, duration, and decision criteria -- to prevent post-hoc rationalization and ensure scientific rigor.

Output

A structured A/B test plan containing:

Hypothesis statement in If/Then/Because format with supporting evidence or rationale
Control and variant descriptions with specific, implementable change details
Required sample size per variant and total sample size
Estimated test duration in days based on traffic volume and required sample size
Primary metric and secondary metric definitions with measurement methods
Guardrail metrics that trigger early stoppage if degraded
Monitoring dashboard specification with interim checkpoint schedule
Statistical analysis plan (frequentist or Bayesian, one-tailed or two-tailed, correction for multiple comparisons)
Stopping rules for early termination (guardrail violations, SRM detection, critical bugs)
Go/no-go decision criteria with clear thresholds for winner declaration
Post-test action plan for winning, losing, and inconclusive scenarios
Traffic feasibility assessment with low-traffic alternative recommendations if applicable
Test documentation template for recording results and learnings in the campaign tracker

Agents Used

cro-specialist -- Hypothesis design, variant specification, sample size calculation, statistical analysis planning, monitoring framework, stopping rules, traffic feasibility assessment, and experiment documentation

ab-test-plan

Safety Notice

Copy this and send it to your AI assistant to learn

/dm:ab-test-plan

Purpose

Input Required

Process

Output

Agents Used

Source Transparency

Related Skills

client-report

client-onboarding

pdf-report

translate-content