The essential guide to Sample Ratio Mismatch for your A/B tests

7/22/2021

If you can’t trust the result of an experiment, you can’t trust the decisions you make from it. Data integrity issues are common—especially with redirect tests, single-page apps, or complex setups. The Sample Ratio Mismatch (SRM) check is a simple, essential validation anyone can do: the observed split of users (or visits) between control and variation doesn’t match the expected split (e.g. 50/50).

The article gives a practical overview: what SRM is (e.g. expected even split vs skewed counts), two rules—prioritise users over visits (users are assigned to experiments; visit skews can be behavioural), and check frequently from launch, treating new tests like “intensive care” for at least the first week. It then covers glaring checks (obvious imbalances), a sample ratio formula (control % = control / total, etc.), and the Chi-squared test of independence in Python (using scipy.stats.chisquare with observed and expected counts) and in spreadsheets (CHITEST). A p-value below 0.01 (stricter than 0.05) is recommended for declaring SRM, to reduce false alarms. Optional: a deeper look at the Chi formula (observed vs expected, degrees of freedom, CHISQ.DIST.RT). Cumulative views of sample ratios over time help spot when SRM started. Summary: use Chi regularly, don’t cry wolf on day one unless it’s glaring, and treat this as the start of data validation.

Read the full article on Towards Data Science →

Iqbal Ali

Fractional AI Advisor and Experimentation Lead. Training, development, workshops, and fractional team member.