How to choose an experimental design

Share this article

Are you wondering how to choose the right experimental design for your next experiment? Or maybe you want to learn about experimentation techniques that are commonly used in industry? Well either way, you are in the right place! In this article, we discuss some of the most common experimentation designs and techniques that are used in industry.

We start out by discussing some of the main biases that occur in experiments. This will provide more context that will help you choose the right experimental design for your situation. After that, we talk about experiment designs and techniques that are commonly used in randomized experiments. These are controlled studies where treatments are assigned to subjects by the person who is running the experiment. Finally, we talk about designs and techniques that are popular in natural experiments. These are observational studies where treatments are not necessarily assigned by the person running the experiment.

Biases in experimentation

Before you can understand which experimental design you should use for your experiment, you first need to be aware of some common types of biases that impact experiments. It is important to understand what these biases are and how they impact experiments because some experimental designs were created specifically for situations where one or more of these experimental biases are present.

In particular, we will focus on biases and experimentation issues that are prevalent in industry settings. Here are some examples of biases that are common in industry settings.

  • Interference. Interference is a bias that occurs when there are complex dependencies between subjects in an experiment. It occurs when the actions that a user in one treatment group takes have the ability to affect the actions that a user in another treatment group takes. Interference is a problem in experimentation because if the actions of users in the treatment group can affect the actions of users in the control group, then the control group does not provide an accurate representation of what would happen if the treatment was not applied. Interference is common in two-sided marketplaces, situations where subjects are competing for a limited pool of resources, and situations where there are complex network effects at play.
  • Carryover effects. Carryover effects occur when one or more treatments in an experiment have long lasting effects on users that impact their behavior for extended periods of time. In this situation, users who are currently seeing the control treatment may still be under the influence of an experimental treatment that they previously saw. Again, this means that the control group does not provide an accurate baseline of what would happen if the treatment was not applied. Carryover effects are most problematic in experimental designs where individual subjects alternate between different treatments over the course of the experiment.
  • Novelty effects. Novelty effects occur when subjects respond positively to a treatment just because it is new and they have not seen it before. This causes problems because these same subjects may stop responding as favorably to the treatment once it is not new and shiny anymore. That means that there may not be sustained impacts to metrics that were positively impacted during the experimentation window. This most commonly happens when there are obvious user facing changes that the user can recognize.
  • Interaction effects. Interactions between experiments occur when there are synergies between two or more experiments such that subjects that see a particular treatment for one experiment are more likely to respond well to a particular treatment for another experiment. If there is a large synergy between two experimental treatments for experiments that are run at the same time and one of the experimental treatments is accepted as the new default and the other is abandoned, the impact of making the former treatment the new default may not be as large as the experimental results would suggest. That is because the former treatment specifically performed well in the presence of the latter treatment. This is most common when many experiments are being run on the same group of users at the same time and when experiments touch closely related components of the user experience.
  • Cannibalization. Cannibalization happens when an experimental treatment has a positive impact on the portion of the user experience where it is applied, but that positive impact is balanced out by a negative impact on another portion of the user experience. For example, if you work at an online clothing store and you run an experiment that aims to increase the number of shirts that are sold – you may see an increase in the number of shirts that are sold that is balanced out by a decrease in the number of hats that are sold. This is a problem because if your experimental metrics only consider shirt sales, then they will look positive despite the fact that the overall impact on the business is neutral. This is common in situations where subjects are spending limited resources.

Experimental designs for randomized experiments

Now we will talk about experimental designs and experimentation techniques that can be used for randomized experiments. As a reminder, randomized experiments are controlled experiments where the person who is running the experiment exerts control over the treatment that is shown to a particular subject.

We will start out by talking about common designs that are used in randomized experiments. These are common experimental setups that provide a template for how to run an experiment. After that, we will discuss common experimentation techniques that are used in randomized experiments. These are techniques that can be applied to a variety of different experimental designs to control for bias and increase confidence in results.

Designs for randomized experiments

What are some common examples of experimental designs that can be used for randomized experiments? Here are some common experimental designs that can be used for randomized experiments. For each design that we mention, we will provide additional information to help you understand when that experimental design should be used.

  • Simple AB testing. Simple AB testing should be the default option that you reach for any time you start designing an experiment. You should only use a more advanced experimentation technique if there is a particular bias or issue that you need to control for that cannot be handled in a simple AB testing paradigm.
  • Multivariate experiments. Multivariate experiments are a great tool that can be used when you want to test multiple changes to an experience that are likely to interact with one another. This design allows you to test many changes at once, rather than sequentially testing individual changes.
  • Switchback experiments. Switchback experiments are a great option when there are complex dependencies between the subjects in your experiment and you expect there to be a lot of interference. This is one of the more straightforward designs that can adjust for interference.
  • Staggered experiments. Staggered experiments are generally a good option to reach for when there are complex dependencies between your subjects and you also expect your treatment to have large carryover effects. Staggered experiments can account for both interference and large carryover effects.

Experimentation techniques for randomized experiments

Now we will talk more about some common experimentation techniques that are used in randomized experiments. These are experimentation techniques that can be applied in a variety of different experimental designs.

  • Blocking. Blocking is a technique that is applied when you need to be absolutely certain that your different treatment groups are balanced over a particular covariate or characteristic. This is commonly used in cases where simple randomization may not achieve this result, such as cases where sample size is relatively small.
  • A/A testing. A/A testing is a technique where you include multiple control groups in your experiment and use statistical tests to ensure that there is no significant difference between the control groups. This is often used to provide more confidence in test results, and particularly to provide more confidence that you are not seeing false positives that make your experimental treatment group look overly good.
  • Interleaving. Interleaving is a technique where you allow individual subjects to see multiple treatments side by side and measure which treatment the subject responds best to. This is generally used in situations where you do not have enough individual subjects to achieve statistical significance in a timely fashion.
  • Peeking adjustments. Peeking is a phenomenon that occurs when someone looks at the results of an experiment multiple times when the experiment is in progress, and especially when they do this with the intent to stop the experiment if the results look favorable. This introduces multiple testing issues into your experiment that increase the chances that you will get a false positive result, but there are adjustments that can be made to experimental designs to account for this.
  • Variance reduction techniques. There are multiple techniques that can be applied to reduce variance and therefore reduce the sample size required for an experiment. CUPED, outlier capping, and stratification are just a few examples of techniques that can be applied to reduce variance.

Experimental designs for natural experiments

Now we will talk about some experimental designs and techniques that are used for natural experiments. As a reminder, natural experiments are observational studies where the person who is running the experiment does not assert control over what treatments the subject sees.

We will start out by discussing some of the main difficulties that impact the analysis of natural experiments. After that, we will provide information on techniques that are commonly used to analyze natural experiments.

Difficulties with analyzing natural experiments

We will start out by discussing some of the main difficulties that make it nontrivial to analyze the results of natural experiments. This will provide you with important context that will help you choose the right analysis method.

  • Confounding variables. One of the main difficulties with analyzing observational data is that since subjects are not randomized into treatments, there may be confounding variables that affect both a subject’s likelihood to be exposed to a treatment and their likelihood to achieve a certain outcome. Since subjects that have similar values for one of these variables are likely to both be exposed to a certain treatment and have a certain outcome, it can make it look like there is a strong relationship between the treatment and the outcome. This may lead some to believe that the treatment causes the outcome, when in reality both the treatment and the outcome were caused by a different variable.
  • There may not be similar subjects represented in different treatment groups. Another difficulty with analyzing data from natural experiments is that you may not have subjects in your treatment group that share similar characteristics with subjects in your control group. This makes it increasingly difficult to compare the control group against the treatment group.

Techniques for analyzing natural experiments

Here are some common techniques that are used to analyze the results of natural experiments. Note that some of these techniques may also be used to analyze the results of randomized experiments, especially in cases where one suspects that biases may be present.

  • Propensity score-based methods. Propensity score-based methods are generally used when you have cross-sectional data on multiple individual subjects in both the control group and the experimental treatment group. They are most commonly used when the treatment is represented by a single binary variable, but they can be adapted for other circumstances.
  • Synthetic controls. Synthetic controls can be used when you have as few as one subject in your experimental treatment group. Synthetic control methods generally require you to have time series data available for the individuals or groups of individuals that you want to monitor. The most common implementations also assume that the treatment can be represented by a single binary variable and that the treatment is applied to all subjects at the same time.
  • Interrupted time series. Interrupted time series can be used when you do not have data on individual subjects, but rather time series data on an aggregated measure. They can also be used when you do not have a control group at all. Since this method looks primarily at aggregated time series data, it is necessary that the treatment be applied to the whole treated population at the same time.


Share this article

Leave a Comment

Your email address will not be published. Required fields are marked *