Skip to main content
New 200+ startup directories & guest blogging sites — $25 Get the datasets →

A/B Testing

An experimentation method comparing two versions of a page, feature, or flow to determine which performs better based on measured outcomes.

Definition

A/B testing (split testing) compares two or more variants of a page, feature, or experience to determine which performs better. Users are randomly assigned to variants, and statistical analysis determines if differences in outcomes are significant.

How A/B Testing Works

  1. Hypothesis - Define what you’re testing and expected outcome
  2. Variants - Create control (A) and treatment (B) versions
  3. Randomization - Assign users randomly to each variant
  4. Measurement - Track conversion events for each group
  5. Analysis - Determine statistical significance

Key Metrics for A/B Tests

MetricWhat It Measures
Conversion rate% of users who complete desired action
Statistical significanceConfidence that results aren’t random
Sample sizeUsers needed for reliable results
Effect sizeMagnitude of difference between variants

Common A/B Test Types

Page Tests

Different layouts, headlines, or designs for landing pages.

Feature Tests

New feature vs no feature, or different feature implementations.

Pricing Tests

Different price points or packaging options.

Copy Tests

Different messaging, CTAs, or value propositions.

Tools for A/B Testing

Platforms that support A/B testing:

  • PostHog - Experimentation with product analytics
  • Amplitude - Experiment analysis and targeting
  • Optimizely - Enterprise experimentation platform

Statistical Considerations

  • Sample size - Use calculators to determine required traffic
  • Runtime - Run tests for full business cycles (at least 1-2 weeks)
  • Multiple comparisons - Testing many variants increases false positive risk
  • Segmentation - Results may differ across user segments

Frequently Asked Questions

How long should I run an A/B test?

Run until you reach statistical significance AND complete at least one full business cycle (typically 1-2 weeks). Stopping early due to initial results leads to false conclusions.

What sample size do I need?

It depends on your baseline conversion rate and minimum detectable effect. Use a sample size calculator. Typical tests need 1,000-10,000 users per variant.

What’s the difference between A/B testing and feature flags?

Feature flags control who sees what. A/B testing measures impact of variants. They often work together - feature flags implement the test, analytics measure results.

Related