A/B Testing

(highly recommended - buy) trustworthy controlled experiments by Ron, Diane, Ya
(really good) A comprehensive A/B testing course - dynamic fields
CUPED- Improving the Sensitivity of Online Controlled Experiments by Utilizing Pre-Experiment Data CUPED is an acronym for Controlled experiments Using Pre-Experiment Data [1]. It is a method that aims to estimate treatment effects in A/B tests more accurately than simple differences in means. As reviewed in the previous section, we traditionally use the observed difference between the sample means" by Ron Kohavi, Alex Deng, Ya Xu, Toby Walker
(unread) a/b testing
Bayesian a/b testing
(youtube) Lukas Vermeer: Democratizing Online Controlled Experiments at Booking.com - ELITE CAMP 2018
(Paper) Top Challenges from the first Practical Online Controlled Experiments Summit - "Online controlled experiments (OCEs), also known as A/B tests, have become ubiquitous in evaluating the impact of changes made to software products and services. While the concept of online controlled experiments is simple, there are many practical challenges in running OCEs at scale. To understand the top practical challenges in running OCEs at scale and encourage further academic and industrial exploration, representatives with experience in large-scale experimentation from thirteen different organizations (Airbnb, Amazon, Booking.com, Facebook, Google, LinkedIn, Lyft, Microsoft, Netflix, Twitter, Uber, Yandex, and Stanford University) were invited to the first Practical Online Controlled Experiments Summit. All thirteen organizations sent representatives. Together these organizations have tested more than one hundred thousand experiment treatments last year. Thirty-four experts from these organizations participated in the summit in Sunnyvale, CA, USA on December 13-14, 2018. While there are papers from individual organizations on some of the challenges and pitfalls in running OCEs at scale, this is the first paper to provide the top challenges faced across the industry for running OCEs at scale and some common solutions"
(book) modern epidemiology - has a/b testing
Challenges, Best Practices and Pitfalls in Evaluating Results of Online Controlled Experiments - A/B Testing is the gold standard to estimate the causal relationship between a change in a product and its impact on key outcome measures. It is widely used in the industry to test changes ranging from simple copy change or UI change to more complex changes like using machine learning models to personalize user experience. The key aspect of A/B testing is evaluation of experiment results. Designing the right set of metrics - correct outcome measures, data quality indicators, guardrails that prevent harm to business, and a comprehensive set of supporting metrics to understand the “why” behind the key movements is the #1 challenge practitioners face when trying to scale their experimentation program. On the technical side, improving sensitivity of experiment metrics is a hard problem and an active research area, with large practical implications as more and more small and medium size businesses are trying to adopt A/B testing and suffer from insufficient power. In this tutorial we will discuss challenges, best practices, and pitfalls in evaluating experiment results, focusing on both lessons learned and practical guidelines as well as open research questions.
Slides:
- Part I. Introduction
- Part II. Best Practices
Increasing experimentation accuracy and speed by using control variates - In this article, we share details about our team’s journey to bring the statistical method known as CUPED to Etsy, and how it is now helping other teams make more informed product decisions, as well as shorten the duration of their experiments by up to 20%. We offer some perspectives on what makes such a method possible, what it took us to implement it at scale, and what lessons we have learned along the way.
(Microsoft) Why Tenant-Randomized A/B Test is Challenging and Tenant-Pairing May Not Work
(good) How to Double A/B Testing Speed with CUPED - Microsoft’s variance reduction that’s becoming industry standard. "Controlled-experiment Using Pre-Existing Data (CUPED) is a variance reduction technique created by Microsoft in 2013. Since then, it has been implemented at Netflix, Booking.com, BBC, and many others.
In short, CUPED uses pre-experiment data to control for natural variation in an experiment’s north star metric. Be removing natural variation, we can run statistical tests that require a smaller sample size. CUPED can be added to virtually any A/B testing framework; it’s computationally efficient and fairly straightforward to code."
(Netflix) Improving the Sensitivity of Online Controlled Experiments: Case Studies at Netflix
(Booking) How Booking.com increases the power of online experiments with CUPED, comment
(BBC) Increasing experimental power with variance reduction at the BBC - This article discusses how the Experimentation team have been accounting for pre-experiment variance in order to increase the statistical power of their experiments
(TripAdvisor) (good) Reducing A/B test measurement variance by 30%+
(medium) practitioner guide to
Hubspot
1. Howto Do A/B Testing: 15 Steps for the Perfect Split Test
2. Sample size calculation
statistical tests
Vidhya on
1. A/B Testing Measurement Frameworks - Every Data Scientist Should Know
2. A/B Testing for Data Science using Python – A Must-Read Guide for Data Scientists
(Yandex) - Online Evaluation for Effective Web Service Development

A/B Testing Tools

PreviousDOE Tools NextMulti Armed Bandits

Last updated 1 year ago