Are you wondering when you should run switchback experiments (also called time split experiments)? Or maybe you want to learn more about the advantages and disadvantages of switchback experiments? Well then you are in the right place! In this article, we tell you everything you need to know to understand when to use switchback experiments.
We start out by providing a little detail on how switchback experiments are designed. After that, we talk a little more about how switchback experiments are analyzed. Next, we discuss some of the advantages and disadvantages of using switchback experiments. After that, we provide a few examples of scenarios where it is a good idea to run switchback experiments. Finally, we provide a few examples of scenarios where it is not a good idea to use switchback experiments.
How to design switchback experiments?
How are switchback experiments designed? Switchback experimentation is one example of an experimentation technique where you separate individual subjects into groups then broadly apply the same treatment to everyone who is in the same group. This means that the treatment is not randomized at the subject level, but rather the group level.
Groups are carefully chosen so that subjects that have meaningful dependencies on one another, such that the actions of one subject may affect the actions of another, are placed in the same group. This may happen if multiple subjects access a shared pool of resources or if there are complex network effects that govern a subject’s behavior. For example, if you worked at Uber and wanted to run a switchback experiment for a treatment that was applied to riders, you might separate riders into groups based on geographical area. This is because riders who are in the same geographic area access the same pool of drivers, which is a limited resource.
Once subjects are separated into groups, the control treatment and the experimental treatment are sequentially applied to the entire group of subjects for evenly spaced time intervals. To initialize a switchback experiment, you would start out by randomly choosing between the control treatment and the experimental treatment then applying that treatment to the entire group for a given time interval. Once that interval of time is over, then you randomly choose between the control treatment and experimental treatment again and apply that treatment to the entire group for the next interval of time. This generally happens multiple times until both the control treatment and the experimental treatment have been applied to the group for multiple evenly spaced intervals of time.
How to analyze switchback experiments?
How do you analyze the results of a switchback experiment? In general, switchback experiments are analyzed by looking at the overall difference in metrics between the control treatment and the experimental treatment across all time intervals. That being said, there are a few different ways to analyze the results of switchback experiments and different methods may be used depending on the circumstances of the experiment.
Generally speaking, there are two different schools of thought on how to analyze the results of switchback experiments. These schools of thought differ based on what is considered a single observation for the purpose of the analysis. Here is more information on these different schools of thought.
- Treat each time interval as one observation. Some of these analysis methods recommend treating each time interval as a single observation and only looking at the aggregate metrics across all subjects for each time interval. In this paradigm, you would have one aggregate metric that represented the behavior of all of the individual subjects that were active during a given time interval and you would treat that aggregate metric as a single observation. The main issue with this paradigm is that it drastically reduces the sample size that you have to work with.
- Treat each subject as one observation. Other methods recommend using individual subjects as the unit of measure and looking at the metrics for each individual subject (during each time interval). In this paradigm, you would have a much larger sample size because each subject counts as a different observation. The main issue with this paradigm is that you need to carefully consider the complex interdependencies between subjects and you cannot use simple inference techniques that assume independence between observations. This is generally the more popular path to take and cluster robust standard errors are often used to address the problem of interrelatedness.
Advantages and disadvantages of switchback experiments
What are some of the main advantages and disadvantages of switchback experiments? In this section, we will describe some of the main advantages and disadvantages of switchback experiments. This will give you more context so that you can better understand when to use switchback experiments and when to spring for a different experimentation technique.
Advantages of switchback experiments
We will start out by describing some of the main advantages of switchback experiments. Specifically, we will discuss advantages that switchback experiments have compared to other experimentation techniques.
- Can account for situations where there are strong dependencies between subjects. One of the main advantages of switchback experiments is that they can properly account for situations where there are strong interdependencies between subjects in your population, and specifically when the actions of one subject in your population have the potential to impact the actions of other subjects in your population. Simple AB tests produce biased results in these scenarios. The reason for this is that the actions of the subjects that saw the experimental treatment have the potential to impact the actions of the subjects that saw the control treatment. That means that the control group is also impacted by the experimental treatment, so the control group is not truly reflective of subjects that were not exposed to the experimental treatment.
- Design is simple to explain. Another advantage of switchback experiments is that the overall design of switchback experiments is simple and intuitive. That means that it is easy to justify the experimental design to stakeholders who do not have a strong technical background in experimentation.
- Can specify design ahead of time. Another advantage of switchback experiments is that you are able to fully design the experiment and determine which treatment will be shown for which time period ahead of time. This means that you can spot any issues with the design and tweak the details of the experiment design before the experiment launches.
Disadvantages of switchback experiments
Now we will talk of some of the disadvantages of switchback experiments. These are factors that can make it a little more difficult to run switchback experiments in certain situations.
- Not as broadly understood as simple AB tests. One disadvantage of switchback experiments is that they are not as broadly understood and used as AB tests. That means that you may have to justify the design of the experiment to stakeholders who are only familiar with basic AB tests.
- Susceptible to disruptions from large events. Another disadvantage of switchback experiments is that they are susceptible to disruptions from anomalous events that take place during short time periods. Since it is often the case that only one treatment is shown to subjects at a time, if there is an anomalous event that impacts metrics of interest across the whole population, it will only impact one treatment. We will note that this is not as big of a deal if there are multiple groups involved in the experiment that are being shown different sequences of randomized treatments.
- Choice of time interval can be subjective. Another disadvantage of switchback experiments is that the choice of time interval that is used in your experiment design can be somewhat subjective, but your experiment can be negatively impacted if you choose the wrong interval. If you choose a time interval that is too short, you might end up switching to a new treatment before the original treatment even had time to take effect and show impact.
- Not robust to treatment effects that last a long time. Another disadvantage of switchback experiments is that they can be difficult to run in situations where your treatment has long lasting effects. If your experiment treatment has long lasting effects on the subjects that saw it and those subjects are present during multiple time intervals, you may see lingering effects of the experimental treatment during a later time interval when the control treatment is being displayed.
- There is some ambiguity in how to analyze results. As we mentioned above in the analysis section, there is some ambiguity around how to analyze the results of switchback experiments. This means that tradeoffs need to be considered and decisions need to be made. This takes both time and energy.
- Not easy to account for overlapping tests. Another disadvantage of switchback testing is that it is not as straightforward to run multiple switchback tests on the same group of users at the same time. This is because one test can create an “anomalous” event during another experiment, especially if the tests only overlap during one or two time intervals. While it is technically possible to run multiple switchback tests at the same time, you have to carefully consider how the tests overlap. Many organizations choose to run simultaneous tests on non-overlapping groups of users instead.
- Inconsistent user experience. Depending on what type of treatment is being tested, switchback experiments also have the potential to create an inconsistent and confusing user experience. For example, if you are using a switchback experiment to test a change to a user facing module in a popular app then it may confuse users to repeatedly see different versions of the module.
When to use switchback experiments
When should you use switchback experiments rather than another experimentation technique? Here are some examples of situations where you should use switchback experiments.
- When you are operating in a two sided marketplace. The first example of a situation where switchback experiments are commonly used is if you are running an experiment in a two-sided marketplace. If subjects on one side of the marketplace are interacting with the same set of subjects on the other side of the marketplace, then there is likely to be cross pollination between subjects such that the behaviors of subjects that are exposed to the experimental treatment may affect the behaviors of subjects that were not exposed to the experimental treatment. For example, if Uber runs a test where they provide handy recommendations to some riders in the app, those tips may impact the way those riders interact with drivers. That may in turn impact the way that drivers behave towards other riders who were not exposed to the experimental treatment.
- When individual subjects are competing for a limited resource. Generally speaking, switchback experiments are a good candidate to use in any situation where subjects are competing for a limited pool of resources. This introduces complex dependencies between subjects because if a subject is influenced by an experimental treatment to take more resources from the shared pool, it will limit the number of resources that are viable for subjects that did not see the experimental treatment and modify their behavior. This type of situation can occur in two sided marketplaces, such as the Uber marketplace where riders in the same area are competing for a limited pool of drivers, but it can also happen in other situations.
- When there are strong network effects at play. Switchback experiments are also able to handle situations where there are strong network effects at play, especially in cases where the actions of users in the see an experimental treatment have the potential to affect the actions of users who see a control treatment. For example, if you were running an experiment on Twitter that increases the number of tweets that users who saw the experimental treatment sent out by 1000%, there would be a lot more content for users who did not see the experimental treatment to respond to. Even though they did not see the experimental treatment, their engagement rate might increase just because there is more content to engage with.
When not to use switchback experiments
When should you avoid using switchback experiments? Here are some examples of situations where you should avoid using switchback experiments.
- When there are no complex dependencies between users. The main situation where it does not necessarily make sense to use switchback experiments is when there are no complex dependencies between users. If there are no complex dependencies between users and the actions of one user do not affect the actions of another user, then you may be better off using a more simple AB testing approach to run your experiment. Simple testing schemas have fewer pitfalls that you need to worry about.
- When the experimental treatment has long lasting effects on a subject. In general, switchback experiments are more difficult to use when the experimental treatment that is being applied has long lasting effects on the subjects that are exposed to it. This is because subjects that are exposed to the experimental treatment will continue to experience the effects of the treatment even when the control treatment is applied to them later on. In this situation, staggered experiments may be a better choice.
Other experimentation techniques
- How to choose an experimental design
- When to use staggered experiments
- When to use simple AB tests
- When to use multivariate experiments