A/B Test Sample Size Calculator
Current conversion rate of your control group (e.g., 10 for 10%).
Smallest *relative* lift you want to detect (e.g., 10 for a 10% relative increase). If baseline is 10%, a 10% MDE means detecting a change to 11%.
Probability of a Type I error (false positive). Common is 0.05.
Probability of detecting an effect if one truly exists. Common is 0.80.
Total number of groups in your experiment (e.g., 2 for A/B, 3 for A/B/C).
Results:
Required Sample Size Per Variant: N/A
Total Required Sample Size: N/A
Understanding A/B Test Sample Size with the Statsig Calculator
When running A/B tests or any form of experimentation, determining the correct sample size is crucial for obtaining statistically significant and reliable results. A sample size that is too small might lead to inconclusive results or missing a real effect (Type II error), while a sample size that is too large wastes resources and time. This A/B Test Sample Size Calculator helps you estimate the number of users or observations needed for your experiment.
What is Statistical Significance?
Statistical significance helps you determine if the observed difference between your A/B test variants is likely due to a real effect or just random chance. A common threshold is a 95% confidence level (or a significance level of 0.05), meaning there's only a 5% chance that you would see such a difference if there were no actual difference between the variants.
Key Inputs Explained:
- Baseline Conversion Rate (%): This is the current performance of your control group (Variant A). For example, if 10% of users currently complete a purchase, your baseline conversion rate is 10. This is a critical input as it forms the basis for detecting changes.
- Minimum Detectable Effect (MDE) (%): The MDE is the smallest *relative* improvement (or degradation) you want to be able to detect with your experiment. For instance, if your baseline conversion rate is 10% and you set an MDE of 10%, you want to be able to detect a change from 10% to 11% (a 10% relative lift). A smaller MDE requires a larger sample size.
- Significance Level (Alpha): Also known as alpha (α), this is the probability of making a Type I error (false positive). It's the risk of incorrectly concluding that there is a difference between your variants when, in reality, there isn't. Common values are 0.05 (95% confidence) or 0.01 (99% confidence). A lower significance level requires a larger sample size.
- Statistical Power (1 – Beta): This is the probability of correctly detecting an effect if one truly exists (avoiding a Type II error, or false negative). Common values are 0.80 (80% power) or 0.90 (90% power). Higher power means you're more likely to detect a real effect, but it also requires a larger sample size.
- Number of Variants: This is the total number of groups in your experiment, including your control group. For a standard A/B test, this would be 2. For an A/B/C test, it would be 3. The calculator provides sample size per variant and then multiplies to give the total.
How to Interpret the Results:
The calculator provides two key outputs:
- Required Sample Size Per Variant: This is the minimum number of observations (e.g., users, sessions, clicks) you need in *each* of your experiment groups (control and all treatment variants) to detect your specified MDE with your chosen significance and power levels.
- Total Required Sample Size: This is the sum of the sample sizes for all your variants. This is the total number of observations you need across your entire experiment.
Example Scenario:
Let's say you're running an A/B test on a new checkout flow. Your current checkout conversion rate (baseline) is 5%. You want to be able to detect a 15% relative increase in conversion (MDE). You're comfortable with a standard 95% confidence (0.05 significance) and 80% power (0.80). You are testing one new variant against your control, so you have 2 variants.
- Baseline Conversion Rate: 5%
- Minimum Detectable Effect: 15%
- Significance Level: 0.05
- Statistical Power: 0.80
- Number of Variants: 2
Plugging these values into the calculator, you might find that you need approximately 3,500 users per variant, for a total of 7,000 users across your experiment. This means you should run your test until at least 3,500 users have been exposed to the control group and 3,500 users to the new checkout flow before drawing conclusions.
Why is this important for Statsig users?
Platforms like Statsig provide powerful tools for running A/B tests and analyzing results. However, even with advanced analytics, the foundation of a robust experiment lies in its design. Using a sample size calculator *before* launching your experiment ensures that you allocate enough traffic and time to your tests, preventing premature conclusions or inconclusive results. It helps you set realistic expectations for how long your experiment needs to run to achieve meaningful insights.