What is minimum detectable effect (MDE)?

MDE is the smallest improvement you want the test to reliably detect, expressed as a relative or absolute change to the baseline. A smaller MDE requires a larger sample, because spotting a tiny lift takes more data to separate from noise.

How is the sample size calculated?

The tool uses the standard two-proportion power formula: n = (z_alpha/2 + z_beta)^2 * (p1(1-p1) + p2(1-p2)) / (p2 - p1)^2, where p1 is the baseline rate and p2 is the baseline plus the MDE. It returns the sample size per variant.

What significance and power should I use?

Conventionally significance (alpha) is 0.05 — a 5% false-positive rate — and power is 0.80, meaning an 80% chance of detecting a true effect of MDE size. Raising power or lowering alpha increases the required sample size.

Why does a smaller effect need so many more users?

Sample size scales with one over the effect squared. Halving the MDE roughly quadruples the required sample. That is why detecting small lifts on already-good conversion rates demands enormous traffic.

Can I stop the test early if it looks significant?

Peeking and stopping at the first significant result inflates false positives. Decide the sample size up front, run to it, and only then analyze — or use a sequential testing method designed for continuous monitoring.

A/B Test Plan Builder

Design experiments that actually answer the question

An A/B test only produces a trustworthy answer if it is sized and specified before it runs. The two most common mistakes are testing without enough traffic to detect the effect you care about, and peeking until you see a result you like. This builder fixes the first by computing a real per-variant sample size, and the second by making you commit to the plan up front.

How it works

You provide the experiment design (hypothesis, primary metric, control and variant) and four statistical inputs: baseline conversion rate, minimum detectable effect, significance level alpha, and power. The tool computes the sample size per variant with the two-proportion power formula:

p1 = baseline rate
p2 = p1 + MDE (absolute)
n  = (z_{alpha/2} + z_{beta})^2 * (p1(1-p1) + p2(1-p2)) / (p2 - p1)^2

z_{alpha/2} and z_{beta} are the normal critical values for your chosen alpha (two-sided) and power. With alpha 0.05 and power 0.80 these are about 1.96 and 0.84. The result is rounded up and reported per variant; total users needed is twice that. If you enter daily eligible traffic, the tool divides total sample by traffic to estimate test duration in days.

Tips and example

Suppose your checkout converts at 4% and you want to detect at least a 0.5 percentage-point absolute lift (to 4.5%) at alpha 0.05, power 0.80. The formula yields roughly 27,000 users per variant. If 2,000 users hit checkout per day across both arms, that is about 27 days of running.

Choose the MDE from business value, not convenience: the smallest lift that would justify shipping the change. Always declare a guardrail metric (revenue, refunds, latency) so a primary-metric win that quietly harms something else is caught. Commit to the sample size and run the full test before analyzing — early stopping on a lucky peek is how teams ship changes that do nothing.

A/B Test Plan Builder

Email me this result

Design experiments that actually answer the question

How it works

Tips and example