# When to use staggered experiments

Are you wondering when you should use staggered experiments (also called stepped wedge experiments)? Or maybe you are interested in hearing more about the advantages and disadvantages of staggered experiments? Well either way, you are in the right place! In this article, we tell you everything you need to know to determine whether staggered experiments are the right choice for you.

We start out by talking about how staggered experiments are designed. We follow that up with a discussion around how staggered experiments are analyzed. After that, we talk about some of the main advantages and disadvantages of staggered experiments. Next, we describe some situations where it would be a good idea to run a staggered experiment.

## How to design staggered experiments?

Staggered experiments, or stepped wedge experiments, are another type of experiment where subjects are separated into groups. Treatments are then randomized at the group level, meaning that all subjects that are put in the same group will receive the same randomly selected treatment. There are some implementations of staggered designs where randomization happens at the subject level rather than the group level, but this pattern is much less common. In this article, we will assume that all randomization happens at the group level.

The method that is used to separate subjects into groups will vary depending on the constraints of the experiments. In some cases, subjects will be allocated into the same groups because they are already part of some sort of natural grouping. For example, if the experiment is being run at a high school then students might be separated into groups for the experiment based on what math class they are in.

In other cases, groups of subjects may be separated into groups that have complex dependencies on one another, such that the actions of one subject may affect the actions of another. For example, if Uber was running experiments on riders then they may separate riders into groups based on geographic regions. This way riders in the same area that are competing for and interacting with the same drivers are collocated in the same group.

Once subjects are separated into groups, it is time to determine which treatment will be applied to each group. In staggered experiments, the experimental time frame is broken up into multiple evenly spaced time intervals. In the first time interval, all groups are shown the control treatment. This serves as a baseline period where we can observe how all of the different groups behave when the control treatment is applied to them.

After the first time interval, or the first few time intervals depending on how long you want the baseline period to last, you start to expose groups of subjects to the experimental treatment. At the start of each time interval, one or more groups that have not yet been exposed to the experimental treatment are randomly selected to be exposed to the treatment. Once a group has been exposed to the experimental treatment, they continue to receive the experimental treatment until the end of the experiment. This process continues until all groups have been exposed to the treatment for at least one time interval.

## How to analyze staggered experiments?

How is data from staggered experiments measured? While the treatment that is assigned to each subject is randomized at the group level, the actual analysis that is used to determine the results of the experiment is generally not conducted at the group level. Instead, the analysis is generally conducted at the individual subject level. That means that care has to be taken to ensure that factors like the interrelatedness of subjects in the same cluster and the confounded relationship between time and treatment are accounted for.

What are some of the main advantages and disadvantages of staggered experiments, or stepped wedge experiments? In this section, we will discuss some of the main advantages and disadvantages of staggered experiments.

What are some of the main advantages of staggered experiments? In this section, we will discuss some of the main advantages of staggered experiments.

• Can be used to account for dependencies between subjects. Since treatments in staggered experiments are randomized and applied at the group level, that generally means that it is a more appropriate design for situations where there are complex dependencies between subjects that are in the same group. This can be useful in situations where simple AB tests would not suffice because the behaviors of subjects that are treated with the experimental treatment would influence the behavior of subjects that were not. That being said, you should take care to ensure that you are analyzing your results appropriately if you are in a situation where there are heavy dependencies between subjects in the same group.
• Can be used when treatments have long lasting effects on subjects. Since groups that have been chosen to receive the experimental treatment generally continue to receive the experimental treatment until the end of the experiment, staggered experiment designs can be used in situations where treatments have long lasting effects on subjects’ behavior. This is not true of experiment designs like switchback experiments where groups alternate back and forth between different treatments.
• Can specify design ahead of time. Additionally, most staggered experiment designs enable you to specify the full experiment design ahead of time. This enables you to spot any issues with the way that groups are randomized into treatments and modify the experiment design before the experiment is rolled out. We will note that there may be some cases where this is not true, but it is true in most cases.
• More consistent user experience. Another advantage of staggered experiments is that since each group only switches from the control treatment to the experimental treatment once, it provides a consistent user experience that is not confusing. This cannot be said about trial designs that alternate groups between multiple different treatments like switchback experiments.

• Design is not broadly used. One disadvantage of a staggered experiment is that this experimental design is not as common or broadly used as a simple AB test. Stakeholders who are skeptical of methods they do not understand may be more likely to question your decision to use this design. Colleagues who are not familiar with the design will also have more trouble reviewing your work.
• Design is more difficult to explain. Another disadvantage of staggered experiments is that this experimental design is more complex and a little more difficult to explain than some other designs. That means that you may need to spend even more time explaining the design to stakeholders and colleagues who are not familiar with it.
• There are many design choices to be made. Additionally, there are many design choices that need to be made before you launch a staggered experiment. That means that a lot of time and effort needs to be put into evaluating and validating these choices.
• Some ambiguity in how to analyze results. Another downside of staggered experiments is that the exact details of how you should analyze your experiment can vary a lot based on the circumstances and design of your experiment. That means that you need to put more thought and effort into choices around how to analyze your results. This is thought and effort that you would not have to spend if you used a more simple design that had a more standardized path for analyzing results.
• Disruptions from anomalous events. Due to the staggered rollout of the experimental treatment, staggered experiment designs can be affected by disruptions from anomalous events that occur over short to medium intervals of time. For example, if an anomalous event occurs towards the beginning of the experiment that improves metrics across the population then it can make the effect of the control treatment look overly promising since most groups will still be experiencing the control treatment. If the same happens towards the end of the experiment, it can make the experimental treatment look more promising because most groups will be experiencing the experimental treatment.
• Complicated to run simultaneous experiments. Similarly, the staggered rollout can make it difficult to assess the effects of running other staggered rollout experiments on the same groups during the same time period.

## When to use staggered experiments

Are you wondering when you should use staggered experiments with stepped wedge design? Here are some examples of use cases where you should use staggered experiments.

• When you only have the resources to apply a treatment to part of the population at once. Staggered experiments are a great option to reach for when you do not have the resources available to apply your experimental treatment to a large portion of your population at once. If your experimental treatment takes a large amount of time or money to apply and that means you need to gradually apply the treatment over a longer interval of time then this is a good experiment design.
• When you have naturally grouped or clustered subjects and your treatment has long term effects. Staggered experiments are also a good choice when you have naturally grouped or clustered subjects that you want to apply the same treatment to at the same time and your treatment has long term effects. If your treatment has long term effects, it can be difficult to apply other grouped or clustered experiment designs that alternatively expose groups to different treatments.
• When you have naturally grouped or clustered subjects and you need consistent user experience. Similarly, staggered experiments are also a good choice when you have naturally grouped or clustered subjects and alternating between treatments multiple times would provide a confusing or inconsistent user experience.

## When not to use staggered experiments

• Simple AB test would serve your purposes. In general, if you think that you would be able to accurately measure and analyze your experiment using a simple AB test then you are generally better off taking that route. Simple AB tests are more well understood, easier to design, and easier to analyze.
• When you have naturally grouped or clustered subjects but your treatment only has short term effects. If you have naturally grouped or clustered subjects but your treatment only has short term effects on your subjects, you may be able to use a more simple experiment design like a switchback experiment design. These designs are easier to understand and easier to analyze. This is especially true if alternating between treatments does not provide a bad or inconsistent user experience.