When to stop A/B experiments early

Published in The Startup on Medium.

Not “concluding” an experiment early—stopping it when we suspect something’s wrong: build, data collection, or design. Acting early matters because restarting usually means losing existing data.

I describe a simple early-warning process I used at Trainline so product owners, designers, and developers could monitor their own experiments—no analytics or statistics skills required. The goal is to identify issues, not to read results early.

Three checks sit at the centre:

  1. Even split across variation groups — Are counted users split ~50/50 between A and B? A simple formula (e.g. % in group A) plus rules of thumb (e.g. <1% variance after a few days) help spot counting or assignment problems.
  2. Expected traffic in the experiment — Are we missing users who should be counted? We compare “users expected” (e.g. from analytics) vs “users counted” so we spot data-collection or targeting issues.
  3. Health metrics — Are critical metrics (e.g. conversion) showing unnaturally large moves? We use z-scores (e.g. ±4 or more) as signals to investigate, not statistical significance. High significance on young data is common; extreme z-scores are not.

The piece includes simple visuals for traffic split and counted vs missed users, and notes when to use A/A tests to set baselines and when to involve analysts.

Read the full article on The Startup →

Iqbal Ali

Iqbal Ali

Fractional AI Advisor and Experimentation Lead. Training, development, workshops, and fractional team member.