Are you wondering when you should use ridge regression rather than LASSO? Or maybe you are wondering whether you should use a ridge regression model over a standard regression model. Well either way, you are in luck!
In this article, we tell you everything you need to know to understand when ridge regression should be used. We start out by discussing what kind of outcome variables ridge regression models can be used for. We follow that up with a discussion of some of the main advantages and disadvantages of ridge regression. At the end, we provide specific examples of scenarios where you should and should not use ridge regression.
What outcomes can ridge regression handle?
One important distinction to keep in mind when discussing ridge regression models is that the term “ridge regression” does not necessarily refer to one unique model. Instead, it refers to a family of models that arise when you introduce a L2 penalty to a family of regression models. That means that there are different ridge regression models out there that can handle many different types of outcomes.
Can you use ridge regression with a continuous outcome?
Can you use ridge regression with a continuous outcome? Yes! If you introduce a L2 penalty to a standard linear regression model then you will have a ridge regression model that can be used with a continuous outcome. The continuous outcome ridge regression model is perhaps the most common type of ridge regression model around. If someone uses the word “ridge regression” to refer to one specific model, they are almost certainly referring to this model.
Can you use ridge regression with a binary outcome?
Can you use ridge regression with a binary outcome? Yes, you can also create a ridge regression model with a binary outcome. In this case, you would take a standard logistic regression model and incorporate a L2 penalty into the model. This type of model is sometimes referred to as a “logistic ridge regression”.
Advantages and disadvantages of ridge regression
What are some of the main advantages and disadvantages of ridge regression? Here are some of the main advantages and disadvantages of ridge regression.
Advantages of ridge regression
- Handles correlated features. The main advantage of ridge regression is that ridge regression models can be used on datasets that have many correlated features. Usually correlated features are a big problem for regression models, but when you introduce the L2 penalty into a regression model, the negative impact of correlated features is minimized.
- More features than observations. Another advantage of ridge regression is that it can be used in cases where you have more features than you do observations. This setup generally causes problems for standard regression models, but it is not as much of a problem for ridge regression models.
- Reduced overfitting. Even if you do have fewer features than you do observations, the L2 penalty that is introduced for ridge regression models will still work to reduce overfitting. This is because the penalty shrinks some coefficients close to zero and effectively reduces the complexity of the model.
Disadvantages of ridge regression
- Biased coefficients. The main disadvantage of ridge regression is that the coefficient estimates that are produced by ridge regression models are biased. The L2 penalty that is added to a ridge regression model has the effect of shrinking the regression coefficients closer to zero. That means that the coefficients that the model outputs do not actually represent the magnitude of the relationship between a feature and the outcome variables, but rather a shrunken version of that magnitude.
- Hard to get accurate standard errors. In addition to the fact that ridge regression coefficients are biased, it is also difficult to estimate standard errors for ridge regression coefficients. This makes it difficult to construct confidence intervals and preform statistical tests on the coefficients.
- Additional hyperparameters to tune. Another disadvantage of ridge regression is that it introduces another hyperparameter that needs to be tuned. This parameter controls the magnitude of the L2 penalty that is used in the model. The fact that most standard regression models do not have hyperparameters that need to be tuned is generally considered to be an advantage, so some consider the introduction of this hyperparameter to be a small disadvantage.
- Other issues that plague standard regression models. For the most part, the types of issues that plague standard regression models also impact ridge regression models. Concerns related to interactions, outliers, and model assumptions still apply here.
When to use ridge regression
So when should you use ridge regression models? Here are some examples of use cases where you should consider using a ridge regression model.
- Many correlated features. The main case that ridge regression models are useful for is the case where you have many correlated features and you want to include all of them in your model. Some models handle this scenario better than others, but there are not many models that are uniquely suited to this scenario in the way ridge regression models are.
- More features than observations. Another possible use case where you may want to use ridge regression is if you want to run a regression model but you have many more features than you have observations. If you are okay with using a model other than a regression model, you have some other options. If you are dedicated to using a regression model, ridge regression may be the way to go.
When not to use ridge regression
When should you avoid using ridge regression? Here are some examples of situations where you should avoid using ridge regression.
- You are primarily interested in inference. Since ridge regression coefficients are biased and it is difficult to estimate standard errors for them, it is often best to avoid using ridge regression when inference is your primary goal. This is especially true if you want to use statistical tests to determine whether the relationships between your features and the outcome variable are statistically significant. If you are primarily interested in inference but you have a lot of correlated features, you might be best off applying dimensionality reduction or feature selection techniques to your features then using a standard regression model on the reduced feature set.
- When to use LASSO
- When to use random forests
- When to use ordinal logistic regression
- When to use multinomial regression
- When to use logistic regression
- When to use support vector machines
- When to use gradient boosted trees
- When to use linear regression
- When to use Bayesian regression
- When to use neural networks
- When to use mixed models
Are you trying to figure out which machine learning model is best for your next data science project? Check out our comprehensive guide on how to choose the right machine learning model.