A Guide to Bayesian A/B Testing for Conversion Rates

Overview

A/B testing, or split testing, is a fundamental technique used by businesses to experiment with different versions of web pages, emails, or marketing assets. The goal is to identify which version performs better in terms of key metrics such as user engagement, click-through rates, and, most importantly, conversion rates. Conversion rates measure the percentage of visitors who complete a desired action, like making a purchase or signing up for a newsletter. The optimization of conversion rates can have a profound impact on the success of digital campaigns.

This article delves into Bayesian A/B testing, a statistical framework that provides more nuanced insights into conversion rates, compared to traditional frequentist methods. We will:

Explore the process of Bayesian A/B testing for conversion rates.
Compare Bayesian and frequentist approaches, highlighting their advantages and limitations.
Walk through an advanced example involving customer behavior changes after an intervention.

The Importance of Conversion Rates in A/B Testing

Conversion rates are the cornerstone of A/B testing. A conversion occurs when a user completes a specific action, such as signing up for a service or making a purchase. In digital marketing and website optimization, increasing conversion rates can directly impact revenue, making A/B testing an essential tool for any business.

To understand how A/B testing works, consider the following scenario: an e-commerce company runs an experiment to see if changing the color of a call-to-action button increases purchases. The company presents one version of its site (Version A) to a control group and another version (Version B) to an experimental group. By tracking user behavior, such as whether they complete a purchase, the company can assess which version performs better in driving conversions.

Example Setup for Conversion Rate Testing

Suppose an A/B test is conducted on a website, with two groups of 100 visitors each. Group A is exposed to a new button design, and Group B sees the original. The conversion results are summarized as follows:

Group A: 5 conversions (5%)
Group B: 3 conversions (3%)

The data collected from these tests can be analyzed using frequentist methods, such as a Chi-square test, or using a Bayesian framework. Let’s explore both.

Frequentist Approach to A/B Testing

The frequentist approach is the most widely used in A/B testing, primarily due to its simplicity and long history in statistical inference. The Chi-square test is a common method in this framework, which assesses whether the differences observed between two groups are statistically significant or simply due to random chance.

Chi-square Test for Conversion Rates

The Chi-square test compares the observed frequencies of conversions in each group with the expected frequencies under the null hypothesis that there is no difference between the two versions. In this case, the null hypothesis assumes that the new button design has no impact on the conversion rate.

Here’s a summary of how the test works:

Observed Data:
- Group A: 5 conversions, 95 no conversions.
- Group B: 3 conversions, 97 no conversions.
Contingency Table: $\begin{array}{|c|c|c|} \hline & \text{Conversions} & \text{No Conversions} \\ \hline \text{Group A} & 5 & 95 \\ \hline \text{Group B} & 3 & 97 \\ \hline \end{array}$
Test Statistic: The Chi-square test calculates a statistic to determine whether the observed difference in conversion rates is statistically significant. If the p-value falls below a specified threshold (commonly 5%), we reject the null hypothesis and conclude that the button design affects conversion rates.

However, the Chi-square test has notable limitations:

Sample Size Sensitivity: With a large sample size, even trivial differences can become statistically significant, while with a small sample size, meaningful differences may go undetected.
No Information on Effect Size: The test does not provide insights into the magnitude of the difference between the two versions, limiting its practical utility.
Multiple Comparisons Problem: Conducting multiple tests increases the risk of Type I errors (false positives). To counteract this, researchers often apply corrections, like the Bonferroni correction, but these methods reduce statistical power.

Pitfalls in Frequentist Interpretation

One common misunderstanding in the frequentist approach arises when interpreting the p-value. The p-value tells us the probability of observing data as extreme as we did, assuming the null hypothesis is true. It does not provide direct insight into the probability of the conversion rates themselves. This can lead to confusion, especially when making decisions based on small or non-significant p-values.

Bayesian Approach to A/B Testing

The Bayesian approach offers an alternative to frequentist methods by focusing on updating beliefs about conversion rates based on observed data. In this framework, we model the process that generates the data, specifying prior beliefs about the conversion rates and using observed data to update these beliefs.

Bayes’ Theorem

Bayesian inference is based on Bayes’ Theorem:

\[P(\theta | D) = \frac{P(D | \theta) P(\theta)}{P(D)}\]

Where:

$$ P(\theta D) $is the **posterior** probability of the parameter$\theta$(e.g., conversion rate) given the data$D$$.
$$ P(D \theta) $is the **likelihood** of the data given$\theta$$.
$P(\theta)$ is the prior probability, representing our belief about $\theta$ before seeing the data.
$P(D)$ is the evidence, which normalizes the posterior distribution.

Modeling Conversion Rates with Bayesian Inference

We will use Python’s PyMC package to model conversion rates. Below is a sample model using Bayesian inference to estimate the conversion rates of two groups:

import pymc as pm
import numpy as np

with pm.Model() as ConversionModel:
    # Priors for conversion rates
    pA = pm.Uniform('pA', 0, 1)
    pB = pm.Uniform('pB', 0, 1)
    
    # Difference between conversion rates
    delta = pm.Deterministic('delta', pA - pB)
    
    # Likelihood (Bernoulli distributed conversions)
    obsA = pm.Bernoulli('obsA', pA, observed=[1, 0, 1, 0, ...])  # Group A data
    obsB = pm.Bernoulli('obsB', pB, observed=[0, 1, 0, 0, ...])  # Group B data
    
    # Posterior sampling
    trace = pm.sample(2000)

In this model:

pA and pB represent the conversion probabilities for Group A and Group B, respectively.
The delta variable captures the difference between the conversion rates.
We assume a Bernoulli distribution for the observed conversions (1 for conversion, 0 for no conversion).

After running this model, we can draw samples from the posterior distribution, which gives us a full probability distribution of the conversion rates for both groups. This allows us to compute credible intervals, answer questions like “What is the probability that Group A has a higher conversion rate than Group B?”, and make probabilistic statements about the conversion rates.

Advantages of Bayesian A/B Testing

Bayesian inference offers several advantages over frequentist methods:

Direct Probability Statements: Bayesian methods allow us to make probability statements about the conversion rates themselves, such as “There is a 90% chance that Group A’s conversion rate is higher than Group B’s.”
Incorporating Prior Knowledge: If we have prior knowledge or historical data, we can incorporate it into the model, improving the accuracy of the results.
Credible Intervals: Instead of p-values, Bayesian methods provide credible intervals, which give a range of values for the conversion rates based on the observed data.

Posterior Distributions for Conversion Rates

After running the Bayesian model, we can visualize the posterior distributions for the conversion rates of both groups. These distributions provide rich insights into the likelihood of different conversion rates, allowing us to make informed decisions about which version of the webpage is more likely to perform better.

Advanced Example: Modeling Behavioral Changes After an Intervention

In more complex A/B testing scenarios, we may want to analyze not only conversion rates but also behavioral changes over time. For example, suppose we introduce two interventions (A and B) to different groups of customers and track their interactions with the platform over 100 days. We aim to determine if the interventions led to changes in behavior (e.g., frequency of logins) and when these changes occurred.

Bayesian Model for Behavior Over Time

To model this scenario, we assume that each individual’s interaction with the platform is normally distributed, with a switch point where behavior changes. We model this process using the following PyMC model:

with pm.Model(coords={'ind_id': ind_id}) as SwitchPointModel:

    sigma = pm.HalfCauchy("sigma", beta=2, dims="ind_id")

    # Switch point for each individual
    switchpoint = pm.DiscreteUniform("switchpoint", lower=0, upper=100, dims="ind_id")

    # Interaction intensity before and after the switch point
    mu1 = pm.HalfNormal("mu1", sigma=10, dims="ind_id")
    mu2 = pm.HalfNormal("mu2", sigma=10, dims="ind_id")
    
    # Difference between pre- and post-switch behavior
    diff = pm.Deterministic("diff", mu1 - mu2)

    # Interaction behavior as a function of switchpoint
    intercept = pm.math.switch(switchpoint < X.T, mu1, mu2)

    # Observed interactions
    obsA = pm.Normal("y", mu=intercept, sigma=sigma, observed=obs)
    
    trace = pm.sample()

This model captures the dynamics of behavioral changes over time. We can then visualize the posterior distributions for the switch points and the differences in behavior before and after the intervention. A forest plot helps to show these differences clearly, providing actionable insights into how the two groups responded to their respective interventions.

Conclusion

Bayesian A/B testing provides a flexible and intuitive framework for analyzing conversion rates and user behavior. While frequentist methods, such as the Chi-square test, are simpler and more familiar, they come with limitations related to sample size sensitivity, multiple comparisons, and p-value interpretation. Bayesian methods, on the other hand, allow for more nuanced analysis, direct probability statements, and the incorporation of prior knowledge.

Whether conducting straightforward A/B tests for conversion rates or analyzing complex behavioral changes over time, Bayesian inference offers powerful tools to model uncertainty and provide richer insights. As A/B testing continues to play a crucial role in digital marketing and website optimization, leveraging Bayesian techniques can help businesses make more informed, data-driven decisions.

Check out Data Science Books on Amazon

Share on

Twitter Facebook LinkedIn