Are you wondering when you should use ordinal logistic regression? Well then you are in the right place! In this article, we tell you everything you need to know to decide whether ordinal logistic regression is the best choice for your next data science project.
We start this article off by discussing what types of outcome variables can be used with ordinal logistic regression. After that, we discuss some of the main advantages and disadvantages you should consider when deciding whether to use ordinal logistic regression. Finally, we discuss specific examples of cases where you should and should not use ordinal logistic regression.
What outcomes can ordinal logistic regression handle?
What types of outcome variables can ordinal logistic regression be used for? Ordinal logistic regression is generally used when you have a categorical outcome variable that has more than two levels. Specifically, ordinal logistic regression is used when there is a natural ordering to your outcome variable.
As an example of a multiclass outcome variable that has a natural order to it, you can think of a survey question that asks you to rank your trust in a given politician as very low, low, medium, high, or very high. In this scenario, there is a natural ordering where low is higher than very low, medium is higher than low, and so on. This is exactly the type of outcome variable you should be using with ordinal logistic regression.
Advantages and disadvantages of ordinal logistic regression
So what are the main advantages and disadvantages of ordinal logistic regression? Here are some of the main advantages and disadvantages you should keep in mind when deciding whether to use ordinal logistic regression.
Advantages of ordinal logistic regression
- Handles ordered outcomes. Ordinal logistic regression is one of the few common machine learning models that was specifically developed to handle multiclass outcomes that have a natural order to them. That means that it is in a league of its own when it comes to handling ordinal outcomes.
- Fewer parameters than other multiclass regression models. The ordinal logistic regression model is a simple model that has fewer parameters that need to be estimated than other regression models that can handle multiclass data. Given that two models have relatively similar performance, it is almost always better to go with the more simple model.
- Interpretable coefficients. As with many other regression models, ordinal logistic regression models provide highly interpretable coefficients that explain the relationship between your features and your outcome variable. These coefficients often come along with confidence intervals and statistical tests for even better interpretability.
Disadvantages of ordinal logistic regression
- Proportional odds assumption. One of the main disadvantages of ordinal logistic regression is that it makes a fairly strong assumption that is not necessarily valid in all cases. This assumption, called the proportional odds assumptions, essentially implies that the differences associated with moving from one category of the outcome variable to the next higher category are the same across all categories. There are many examples of situations where this is not true, so you should consider the domain of the problem and assess your data to determine whether this assumption holds.
- Not available in common libraries. Another downside of ordinal logistic regression is that it is a relatively niche model that is not available in all common machine learning libraries. Ordinal logistic regression, and regression models in general, tend to be more commonly used in fields where inference and classical statistics are king. That means that ordinal logistic regression models are more likely to be implemented in languages and programs that favor classical statistics such as SAS and Stata.
- General regression downsides. Ordinal logistic regression is subject to many of the same pitfalls that other regression models like linear regression and logistic regression are. This means that ordinal logistic regression models are also easily thrown off by things like outliers, correlated features, non-specified interactions, and missing data.
When to use ordinal logistic regression
So when should you use ordinal logistic regression over other machine learning models? Here are some examples of situations where you should reach for ordinal logistic regression.
- You have an ordinal outcome and inference is your primary goal. In general, you should reach for regression models that have highly interpretable coefficients when inference is your primary goal. That means that you should reach for regression models that can handle multiclass outcomes like ordinal logistic regression models or multinomial regression models any time inference is your primary goal.
- Proportional odds assumption holds. You should specifically use ordinal logistic regression over a similar model like multinomial regression when your data has a natural ordering to it and you believe the proportional odds assumption holds. In these scenarios, the ordinal logistic regression model is the simpler model with fewer parameters that need to be estimated.
When not to use ordinal logistic regression
When should you avoid using ordinal logistic regression models? Here are some examples of situations where you should not use ordinal logistic regression.
- Multi-level outcome that is not naturally ordered. If you have a multi-level outcome that does not have a natural order to it, you should not use an ordinal logistic regression model. Instead, you should use a more flexible model like a multinomial regression model that does not make any assumptions about the order of different levels of your outcome variable.
- Proportional odds assumption does not hold. Even if you have a multiclass outcome variable that has a natural ordering to it, you should avoid using an ordinal logistic regression model if you do not believe that the proportional odds assumption holds. Instead, you can use a more flexible model like a multinomial regression model. This type of model will not take into account the ordered nature of your outcome variable, but that is okay in some situations. If you are unsure whether you should use an ordinal logistic regression model or a multinomial regression model, you can always try running both and comparing model fit.
- Outcome variable has many categories. If you have an ordered outcome variable that has many categories, you are sometimes better off treating that outcome variable as a numeric variable than an ordered multiclass variable.
Related articles
- When to use multinomial regression
- When to use linear regression
- When to use logistic regression
- When to use poisson regression
- When to use Bayesian regression
- When to use ridge regression
- When to use LASSO
- When to use mixed models
Are you trying to figure out which machine learning model is best for your next data science project? Check out our comprehensive guide on how to choose the right machine learning model.