A/B Testing
An experimentation method comparing two versions of a page, feature, or flow to determine which performs better based on measured outcomes.
Definition
A/B testing (split testing) compares two or more variants of a page, feature, or experience to determine which performs better. Users are randomly assigned to variants, and statistical analysis determines if differences in outcomes are significant.
How A/B Testing Works
- Hypothesis - Define what you’re testing and expected outcome
- Variants - Create control (A) and treatment (B) versions
- Randomization - Assign users randomly to each variant
- Measurement - Track conversion events for each group
- Analysis - Determine statistical significance
Key Metrics for A/B Tests
| Metric | What It Measures |
|---|---|
| Conversion rate | % of users who complete desired action |
| Statistical significance | Confidence that results aren’t random |
| Sample size | Users needed for reliable results |
| Effect size | Magnitude of difference between variants |
Common A/B Test Types
Page Tests
Different layouts, headlines, or designs for landing pages.
Feature Tests
New feature vs no feature, or different feature implementations.
Pricing Tests
Different price points or packaging options.
Copy Tests
Different messaging, CTAs, or value propositions.
Tools for A/B Testing
Platforms that support A/B testing:
- PostHog - Experimentation with product analytics
- Amplitude - Experiment analysis and targeting
- Optimizely - Enterprise experimentation platform
Statistical Considerations
- Sample size - Use calculators to determine required traffic
- Runtime - Run tests for full business cycles (at least 1-2 weeks)
- Multiple comparisons - Testing many variants increases false positive risk
- Segmentation - Results may differ across user segments
Frequently Asked Questions
How long should I run an A/B test?
Run until you reach statistical significance AND complete at least one full business cycle (typically 1-2 weeks). Stopping early due to initial results leads to false conclusions.
What sample size do I need?
It depends on your baseline conversion rate and minimum detectable effect. Use a sample size calculator. Typical tests need 1,000-10,000 users per variant.
What’s the difference between A/B testing and feature flags?
Feature flags control who sees what. A/B testing measures impact of variants. They often work together - feature flags implement the test, analytics measure results.