Are you wondering what blocking is in experimental design? Then you are in the right place! In this article we tell you everything you need to know about blocking in experimental design. First we discuss what blocking is and what its main benefits are. After that, we discuss when you should use blocking in your experimental design. Finally, we walk through the steps that you need to take in order to implement blocking in your own experimental design.
What is blocking in experimental design?
What is blocking in experimental design? Blocking is one of those concepts that can be difficult to grasp even if you have already been exposed to it once or twice. Why is that? Because the specific details of how blocking is implemented can vary a lot from one experiment to another. For that reason, we will start off our discussion of blocking by focusing on the main goal of blocking and leave the specific implementation details for later.
At a high level, blocking is used when you are designing a randomized experiment to determine how one or more treatments affect a given outcome. More specifically, blocking is used when you have one or more key variables that you need to ensure are similarly distributed within your different treatment groups. If you find yourself in this situation, blocking is a method you can use to determine how to allocate your observational units (or the individual subjects in your experiment) into your different treatment groups in a way that ensures that the distribution of these key variables is the same across all of your treatment groups.
So what types of variables might you need to balance across your treatment groups? Blocking is most commonly used when you have at least one nuisance variable. A nuisance variable is an extraneous variable that is known to affect your outcome variable that you cannot otherwise control for in your experiment design. If nuisance variables are not evenly balanced across your treatment groups then it can be difficult to determine whether a difference in the outcome variable across treatment groups is due to the treatment or the nuisance variable.
So how is blocking performed at a high level? It is a two step process. First the individual observational units are split into blocks of observational units that have similar values for the key variables that you want to balance over. After that, the observational units from each block are evenly allocated into treatment groups in a way such that each treatment group is allocated similar numbers of observational units from each block.
When should you use blocking?
When should you use blocking in your experimental design? In general you should use blocking if you are designing an experiment that fits the following two criteria.
- There are key variables(s) you need to balance across treatment groups. The first criteria that needs to be met in order for blocking to make sense for your experimental design is that you need to have at least one variable that needs to be equally distributed across your different treatment groups. If you are not in this situation, then you generally do not need to perform blocking.
- You have a relatively small sample size. The second criteria is that you are working with a relatively small sample size. So how small is small? That can vary depending on the type of experiment you are performing. As a general rule, you should use blocking when your sample size is small enough that you are not confident that simply randomizing your observations into treatment groups without performing any blocking will result in treatment groups that are balanced across the key variables called out in the previous criteria.
A simple example where blocking may be useful
As an example, imagine you were running a study to test two different brands of soccer cleats to determine whether soccer players run faster in one type of cleats or the other. Further, imagine that some of the soccer players you are testing your cleats on only have grass fields available to them and others only have artificial grass or turf fields available to them. Now, say you have reason to believe that athletes tend to run 10% faster on turf fields than grass fields.
In this case, an observational unit is a soccer player and your treatment is the type of soccer cleats that a soccer player wears. The main outcome of your study is how fast an athlete can run. You also have a nuisance variable which is the type of field a soccer is running on when their time is recorded. In this experimental design, you need to ensure that the proportion of players running on turf fields is similar for each treatment group.
Why is it important to make sure that the number of soccer players running on turf fields and grass fields is similar across different treatment groups? Because the type of field is another variable that is known to impact the speed a player runs at and if this variable is not balanced across treatment groups then you will not know whether any changes in your outcome between treatment groups are due to the type of soccer cleat or the type of field.
Imagine an extreme scenario where all of the athletes that are running on turf fields get allocated into one group and all of the athletes that are running on grass fields are allocated into the other group. In this case it would be near impossible to separate the impact that the type of cleats has on the run times from the impact that the type of field has.
How does blocking work in experimental design?
So how does blocking work in experimental design? Here are the main steps you need to take in order to implement blocking in your experimental design.
1. Choose your blocking factor(s)
The first step of implementing blocking is deciding what variables you need to balance across your treatment groups. We will call these blocking factors. Here are some examples of what your blocking factor might look like.
- Nuisance variable(s). It is most common for your blocking factors to be nuisance variables that affect your outcome. It is important to ensure that these variables are balanced across your treatment groups so that you can feel assured that the changes you see in your outcome across treatment groups are a result of your treatments and not differences in a nuisance variable.
- The outcome. In some scenarios, you might also want to use your outcome variable as a blocking factor. For example, if there is a large skew in your outcome variable and 10% of observations have much higher values than the rest of the observations then it might make sense to ensure that these outlying observations with high values are equally distributed across groups.
2. Allocate you observations into blocks
The next thing you need to do after you determine your blocking factors is allocate your observations into blocks. To simplify things, we will assume that you have one main blocking factor that you want to balance over.
- One block for each level of a variable. If your main blocking factor is a categorical variable that only has a few levels then one common choice is to have one block per level of that variable. For example, in the previous example where the main blocking factor was a categorical variable with two levels that represented different types of soccer fields, a common choice would be to have two blocks. One block would contain soccer players that ran on turf and would contain soccer players that ran on grass.
- A few blocks based on standard cutoffs. But what if your main blocking factor is a continuous variable? If your blocking factor is a continuous variable and there are any standard cutoffs that are used to group observations into levels for other purposes then you should feel free to use those cutoffs to create blocks. For example, if your main blocking factor was blood pressure then you could use standard cutoffs for classifying low, average, and high blood pressure to classify your observations into three blocks.
- A few blocks based on quantiles. But what if your blocking factor is continuous and there are no obvious cutoffs to use? Then you can also create blocks based on quantiles of your blocking factor. For example, you can create one block with the observations that have values for your blocking factor that are in the top 50th percentile and another with observations that are in the bottom 50th percentile.
- Many small blocks that contain one observation per treatment group. A fourth option is to create many small blocks that contain one observation per treatment group. This is a somewhat non-traditional setup, but it might be useful if you have a continuous blocking factor that has a highly skewed distribution and it has some values that are much higher or lower than the average value. One way to handle this is to sort your observations by the blocking factor then go down the list and assign small blocks with one observation per treatment group. For example, if you had two treatment groups then you would assign the observations with the two highest values for the blocking factor to one block, the observations with the third and fourth highest values to another, and so on. This will ensure that the distribution of your blocking factor is balanced across treatment groups.
3. Allocate your observations into treatments
The final step in the blocking process is allocating your observations into different treatment groups. In most blocking designs, this is relatively straightforward. All you have to do is go through your blocks one by one and randomly assign observations from each block to treatment groups in a way such that each treatment group gets a similar number of observations from each block.