Design experiments that actually answer the question
An A/B test only produces a trustworthy answer if it is sized and specified before it runs. The two most common mistakes are testing without enough traffic to detect the effect you care about, and peeking until you see a result you like. This builder fixes the first by computing a real per-variant sample size, and the second by making you commit to the plan up front.
How it works
You provide the experiment design (hypothesis, primary metric, control and variant) and four statistical inputs: baseline conversion rate, minimum detectable effect, significance level alpha, and power. The tool computes the sample size per variant with the two-proportion power formula:
p1 = baseline rate
p2 = p1 + MDE (absolute)
n = (z_{alpha/2} + z_{beta})^2 * (p1(1-p1) + p2(1-p2)) / (p2 - p1)^2
z_{alpha/2} and z_{beta} are the normal critical values for your chosen alpha (two-sided) and power. With alpha 0.05 and power 0.80 these are about 1.96 and 0.84. The result is rounded up and reported per variant; total users needed is twice that. If you enter daily eligible traffic, the tool divides total sample by traffic to estimate test duration in days.
Tips and example
Suppose your checkout converts at 4% and you want to detect at least a 0.5 percentage-point absolute lift (to 4.5%) at alpha 0.05, power 0.80. The formula yields roughly 27,000 users per variant. If 2,000 users hit checkout per day across both arms, that is about 27 days of running.
Choose the MDE from business value, not convenience: the smallest lift that would justify shipping the change. Always declare a guardrail metric (revenue, refunds, latency) so a primary-metric win that quietly harms something else is caught. Commit to the sample size and run the full test before analyzing — early stopping on a lucky peek is how teams ship changes that do nothing.