A gentle introduction to propensity score matching

Share this article

Are you looking for a gentle introduction to propensity score matching that is suitable for beginners without much experience with causal inference? Well then you are in the right place! In this article we discuss everything you need to know to understand the basic idea behind what propensity score matching is and when you should use it.

We start this article off by discussing what kind of scenarios propensity score matching is used in and what the main goal of propensity score matching is. After that, we go into more detail on how to calculate propensity scores and how to match propensity scores for propensity score matching. Finally, we provide a simple example of propensity score matching for those who learn best by example.

In this article we focus on using propensity score matching in the simple case where you have data with one control group and one treatment group. This is the simplest case to understand, so it is the best case to stick to for a beginner-level introduction.

When is propensity score matching used?

When is propensity score matching used? Propensity score matching is used when you want to examine the effect that a given treatment has on an outcome but you cannot run a randomized experiment. If you cannot run an experiment where you randomly assign your observational units to different treatment groups, all you will have to go on is observational data that can be collected without any sort of intervention or intentional treatment assignment.

Here are some examples of scenarios where you might not be able to run a randomized experiment and therefore might need to fall back on propensity score matching.

  • A treatment is harmful. One of the most common scenarios where you might need to use propensity score matching is if one of the treatments you want to study is harmful and you cannot ethically assign that treatment in an experiment. For example, say you work at a hospital and you want to study the effect that smoking has on lung cancer. You know that smoking is harmful to a person’s health, so you cannot ethically force a randomized group of people to smoke for the sake of your experiment. In this case, you will have to rely on existing data from people who have made their own decision on whether or not to smoke.
  • A treatment is prohibitively costly. Similarly, you might need to use propensity score matching if your treatment is prohibitively costly. For example, if you wanted to examine the effect that increasing your wealth by 10 million dollars has on happiness, then you will likely need to use observational data from people who have won the lottery or inherited large sums of money. It would simply be too expensive to give every person in your treatment group 10 million dollars.
  • A treatment is inherently self-selected. There are also cases where the treatment you want to study is inherently self-selected and you cannot simply assign an observational unit to a treatment group for the sake of an experiment. For example, imagine you worked at a bank and you wanted to study the effect that accepting a large business loan had on a business’s growth. You do not have any way to force a business to accept your loan, so you cannot randomly assign a treatment group here.

What is the goal of propensity score matching?

So what is the goal of propensity score matching? And how does propensity score matching help you analyze the effect that a treatment has on a given outcome using observational data? In short, propensity score matching helps you to select samples of observations from your control and treatment groups that are highly comparable to use in your analysis. Specifically, propensity score matching helps you to select samples of observations that are well balanced across confounding variables that affect both treatment assignment and your outcome variable.

Let’s dive into a quick example to demonstrate why we must make sure that our samples are comparable. Pretend you wanted to study the relationship between smoking and lung cancer using only observational data from people who have made their own decision on whether they want to smoke or not. This might seem like a simple problem at first – just compare the rate of lung cancer in the sample of smokers and the sample of non-smokers.

Now imagine that older people are both more likely to smoke and more likely to develop lung cancer. Now you have age as a confounding variable. If you simply compared the lung cancer rates in the whole non-smoker group to the lung cancer rates in the whole smoker group, it would be difficult to determine whether any elevated cancer rates in the smoker group were a result of the smoking or elevated age in the smoking group.

If you wanted to be able to isolate the impact that smoking has on lung cancer rates, you would need to compare samples of non-smokers and smokers that have similar distributions for age and any other variables we think might impact both lung cancer rates and smoking. This is where propensity score matching comes into play.

A diagram of the relationship between confounding variables and the treatment and outcome variables.

What is a propensity score?

So we know that propensity score matching is a method that helps us select comparable samples of observations across multiple treatment groups, but we haven’t defined exactly what a propensity score is. Now we will take a step back and do that. A propensity score is the probability of an individual observation being assigned to a given treatment. Returning to the previous example with smoking and lung cancer, the propensity score for each person would be the probability that that given person was in the smoker group.

Why is it useful to know the probability that a person is in the treatment group? Let’s gain some intuition by thinking about the most simple case where there is one main confounding variable that affects the likelihood that a person will be in a given treatment group. If there is one variable that affects the likelihood of treatment, then that variable will have a large impact on an observation’s propensity score. That means that if you choose samples of observations with similar propensity scores from your different treatment groups, those samples are also likely to have similar distributions across that variable that impacts the treatment group a person is in.

Returning to the previous example with smoking and lung cancer, let’s say that age was the main factor that affected whether a person was in the smoker group or not. That means that age would have a major impact on the propensity scores and people who were of a similar age would likely have similar propensity scores. If we selected samples of people from the smoker and non-smoker group that had similar distributions of propensity scores, those samples would also be likely to have similar distributions over age.

How do you calculate propensity scores?

So how do you calculate the propensity score for a given observation? Let’s start out by making a distinction between the true propensity score, or the true underlying probability that an observation would be in a given treatment group, and an estimated propensity score.

In the real world, we will never know the true propensity score because we will never have all of the data on every possible factor that might affect the probability of treatment. However, we can get a good estimate of the true propensity score by selecting the key confounding variables that we think are most likely to impact the propensity score and basing our analysis on those variables.

There are many ways that you can estimate a propensity score, but perhaps the most common way is to build a simple binary classifier like a logistic regression to predict whether an observation is in the treatment group or the control group. These predictions should be made using all of the key covariates you believe might affect both the treatment group assignment and the outcome. You can then use the probability of being in the treatment group that is outputted by that binary classifier as an estimate of the propensity score.

How do you match propensity scores?

How do you use propensity scores after you calculate them? As we stated before, the main goal of propensity score matching is to create samples of your different treatment groups that have similar distributions across the key variables that affect treatment assignment. And the way we do this is by making sure that the distribution of the propensity scores is similar across the different samples.

There are many ways that you can create these matched samples with similar distributions of propensity scores, but for now we will discuss what we think is the simplest method. This involves matching each observation from the smaller of your treatment groups 1:1 with the available observation in the larger treatment group that has the most similar propensity score. Once you have finished the matching, you can select only the observations that have been paired up to be included in the sample of data you use for your analysis.

The exact details of how this is implemented might look different depending on whether you control group or treatment group is larger, but let’s say you had a treatment group with 10 people and a control group with 50 people. One simple way to implement propensity score matching would be to go through the observations in the treatment group one by one and pair them up with the observation in the control group that has the most similar propensity score. Make sure that each observation in the treatment group gets assigned to a unique observation from the control group that has not yet been paired up with an observation in the treatment group.

After that, take only the observations from the control group that have been paired up with an observation in the treatment group and use only those observations in your analysis. Discard any observations from the control group that were not paired up with a similar observation in the treatment group. Your result should be a neat subsample of your control groups that has similar characteristics to the observations in your treatment group.

A diagram that shows how to implement propensity score matching given that you have propensity scores for each of you observations.

Propensity score matching example

Now we will walk through a simple example of how to implement propensity score matching. We will stick with the lung cancer example we have talked about throughout the article. As a refresher, the main goal of this analysis is to assess the effect that smoking has on lung cancer without running a randomized experiment where people are randomly assigned to the smoker and non-smoker groups.

In this example, we will consider smokers the treatment group and non-smokers the control group. Let’s imagine that we have 200 smokers and 50 non-smokers in our study.

  1. Identify confounding variables that affect both treatment assignment and the outcome. The first step of this analysis would be to determine what variables affect both the probability of smoking and the probability of getting lung cancer. For this analysis, let’s say we chose gender, age, and income level as the main confounding variables.
  2. Estimate the propensity scores. Next we need to estimate the propensity score, or the probability of being in the smoker group, for all of the observations in our sample. We can do this by training a logistic regression model on the confounding variables we identified in the previous step to predict whether a person is in the smoker group or the non-smoker group. We can use the probability that a person is a smoker outputted by the model as an estimate of the propensity score.
  3. Create matched samples using the propensity scores. After we calculate the propensity scores, we need to create matched samples based on the propensity scores. We will first look to see which of the treatment groups has the smallest size, which is the non-smoker group. After that, we will run down the list of observations in the non-smoker group and pair each observation in the non-smoker group with the available observation in the smoker group that has the closest propensity score. At the end of this, we should have 50 matched pairs with one observation from the non-smoker group and one observation from the smoker group. We will then discard all of the other observations in the smoker group and only use the observations that were part of a matched pair in our analysis.
  4. Run the analysis on only the matched samples. After we have created these uniform samples of smokers and non-smokers, we can proceed on with our analysis as normal. You can run the same kind of analyses on this data as you would on data that was generated via a randomized experiment.

Other resources

Are you looking for other resources related to propensity score matching? Here are some papers on propensity score matching!


Share this article

Leave a Comment

Your email address will not be published. Required fields are marked *