/dm:ab-test-plan
Purpose
Dedicated A/B test planning with a structured hypothesis framework, statistical sample size calculation, variant design, and monitoring plan. Produces a complete experiment specification with statistical rigor and clear decision criteria.
Input Required
The user must provide (or will be prompted for):
- Element to test: The specific page, component, or experience being tested (landing page headline, CTA button, pricing page layout, email subject line, checkout flow, form design, etc.)
- Current conversion rate: Baseline conversion rate for the metric being tested (or best estimate)
- Desired minimum detectable effect (MDE): The smallest improvement worth detecting (e.g., 10% relative lift)
- Daily traffic or impressions: Average daily visitors or impressions to the test page or element
- Significance level: Desired confidence level, default 95% (alpha = 0.05)
- Statistical power: Desired power, default 80% (beta = 0.20)
- Number of variants: How many variants to test (default 1 treatment + 1 control; more for multivariate)
- Business context: What prompted the test idea (analytics data, user feedback, competitive analysis, heuristic audit, stakeholder request)
Process
- Load brand context: Read
~/.claude-marketing/brands/_active-brand.jsonfor the active slug, then load~/.claude-marketing/brands/{slug}/profile.json. Apply voice, compliance, industry context. Checkguidelines/_manifest.jsonfor restrictions, messaging, channel styles, voice-and-tone rules, and templates. If a template matching this command exists in~/.claude-marketing/brands/{slug}/templates/, apply its format. If no brand exists, prompt for/dm:brand-setupor proceed with defaults. - Check campaign history: Run
python campaign-tracker.py --brand {slug} --action list-campaignsto review past test results and avoid re-testing already-validated hypotheses. - Run sample size calculator: Execute
scripts/sample-size-calculator.pywith baseline conversion rate, minimum detectable effect, significance level, and power to determine required sample size per variant. - Build hypothesis statement: Structure the hypothesis in the format: "If [specific change], then [primary metric] will [direction and magnitude] because [rationale grounded in data, user research, or established UX principle]."
- Design test variants: Define the control (current experience) and one or more treatment variants. Specify exactly what changes in each variant -- copy, layout, color, imagery, flow, or functionality. For multivariate tests, define the variable matrix and interaction effects to watch.
- Define primary and secondary metrics: Identify the primary success metric (the one that determines the winner) and secondary metrics to monitor for unintended effects (e.g., testing CTA click rate as primary, but watching bounce rate, time on page, and downstream conversion as secondary guardrails).
- Calculate test duration: Based on sample size requirements and daily traffic, estimate the number of days needed. Ensure the duration spans at least one full business cycle (7 days minimum) to account for day-of-week variation. Flag if duration exceeds 8 weeks (validity risk).
- Create monitoring plan: Define interim checkpoints for technical QA (not statistical peeking), sample ratio mismatch (SRM) detection, and guardrail metric alerts that would trigger early test stoppage for data quality or user experience reasons.
- Define stopping rules and decision criteria: Specify when to call the test (sample size reached + significance threshold met), when to stop early (guardrail violations, SRM detected, implementation bugs), and the protocol for inconclusive results (extend, redesign, or implement based on directional signal).
- Assess traffic feasibility: Verify that the daily traffic can reach the required sample size within a reasonable timeframe (under 8 weeks). If traffic is insufficient, recommend reducing the number of variants, increasing the MDE, or using qualitative methods instead.
- Document pre-registration: Record the test plan before launch -- hypothesis, metrics, sample size, duration, and decision criteria -- to prevent post-hoc rationalization and ensure scientific rigor.
Output
A structured A/B test plan containing:
- Hypothesis statement in If/Then/Because format with supporting evidence or rationale
- Control and variant descriptions with specific, implementable change details
- Required sample size per variant and total sample size
- Estimated test duration in days based on traffic volume and required sample size
- Primary metric and secondary metric definitions with measurement methods
- Guardrail metrics that trigger early stoppage if degraded
- Monitoring dashboard specification with interim checkpoint schedule
- Statistical analysis plan (frequentist or Bayesian, one-tailed or two-tailed, correction for multiple comparisons)
- Stopping rules for early termination (guardrail violations, SRM detection, critical bugs)
- Go/no-go decision criteria with clear thresholds for winner declaration
- Post-test action plan for winning, losing, and inconclusive scenarios
- Traffic feasibility assessment with low-traffic alternative recommendations if applicable
- Test documentation template for recording results and learnings in the campaign tracker
Agents Used
- cro-specialist -- Hypothesis design, variant specification, sample size calculation, statistical analysis planning, monitoring framework, stopping rules, traffic feasibility assessment, and experiment documentation