Are you wondering when you should use grading boosted trees over other machine learning algorithms? Well then you are in the right place! In this article we tell you everything you need to know to understand when to use gradient boosted trees for a machine learning project.
We start out by talking about what kinds of outcomes can be predicted with gradient boosted trees. After that, we go over some of the main advantages and disadvantages of gradient boosted trees. This provides some context to the final portion of the article where we discuss situations where you should and should not use gradient boosted trees.
Types of outcomes for gradient boosted trees
What types of outcome variables can you use gradient boosted trees to predict? The main types of outcome variables that are supported by gradient boosted trees are binary outcomes and numeric outcomes. These types of outcome variables can easily be supported using a single model.
Some implementations of gradient boosted trees also support multiclass outcomes, but it is important to note that these implementations are often built by combining the output of multiple binary classification models.
Advantages & disadvantages of gradient boosted trees
What are the advantages and disadvantages of gradient boosted trees? Here are some of the main advantages and disadvantages you should keep in mind when deciding whether to use gradient boosted trees.
Advantages of gradient boosted trees
- Excellent performance on tabular data. The main advantage of gradient boosted trees is that they tend to make more accurate predictions on tabular data than other common machine learning models. If you are working on a problem where predictive performance is important, you should seriously consider using gradient boosted trees.
- Handle interactions. Another advantage of gradient boosted trees is that they can detect interactions between features even if those interactions are not explicitly specified. This is useful if you have a large number of features or some you do not fully understand the relationship between some features. This is a common feature of most models that are based on decision trees.
- Handle missing data. Another advantage of gradient boosted trees is that they can handle missing data natively. That means that for many implementations of gradient boosted trees, you do not need to preprocess your data to remove or fill in missing values. This is another advantage that is common to many tree-based models.
- Handle non-linearity. As with other tree-based models, gradient boosted trees work well in situations where there relationships between your outcome variable and your features are not perfectly linear.
- Handle outliers. Like other models that use decision trees, gradient boosted trees are not heavily affected by outliers. That means that you do not have to spend as much time preprocessing your data to remove or replace outlying data points.
Disadvantages of gradient boosted trees
- Not natively multiclass. One disadvantage of gradient boosted trees is that they do not handle multiclass outcomes natively. Many implementations of gradient boosted tree algorithms tackle multiclass problems by training multiple classifiers then aggregating the results of those classifiers. There is usually at least one model trained for each class in your outcome variable, which can increase training times.
- No interpretable coefficients. Another disadvantage of gradient boosted trees is that they do not provide easily interpretable coefficients out of the box. There are a few different methods that can be used to help determine which features contribute the most to these models, but many of these methods involve additional steps that need to be taken after the model is trained. They also do not provide precise information about the magnitude of the relationship between the features and the outcome variable.
- Not as easily parallelizable. Unlike random forest models where multiple decision trees can be built in parallel, the trees in gradient boosted tree models are generally built sequentially. This is because each tree is built to make up for the shortcomings of the previous trees. Since the trees are built sequentially, it is not as easy to parallelize these models to speed up training times.
- Somewhat sensitive to hyperparameter choice. Another small disadvantage of gradient boosted trees is that they are somewhat sensitive to the choice of hyperparameters that you use to build your model. Gradient boosted trees are not as sensitive as some models like support vector machines, but they are more sensitive than other tree based models like random forests models. That means that you have to take some time to tune the hyperparameters of your model.
When should you use gradient boosted trees?
When should you use gradient boosted trees? Here are some examples of scenarios where you should reach for gradient boosted trees.
- Predictive performance is important. Whenever you are working on a data science project where predictive performance is important, you should consider using gradient boosted trees. Gradient boosted trees tend to have better predictive performance on tabular data than most other machine learning algorithms, especially if you take the time to tune model hyperparameters.
When not to use gradient boosted trees
What are some examples of cases where you should avoid using gradient boosted trees? Here are some examples of cases where you should avoid using gradient boosted trees.
- Multiclass outcome with many classes. One situation where you should avoid using gradient boosted trees is if you have a multiclass outcome variable with many classes. Gradient boosted trees do not handle multiclass outcomes natively, so at least one classifier will be trained for each outcome class. This gets to be computationally inefficient and time consuming as the number of classes grows. In this case, you may be better off using an implementation of a random forest model that handles multiclass outcomes natively with a single model.
- Inference is your primary goal. If you are more interested in inference than prediction, you may be better off using a model that provides a more direct interpretation such as a linear or logistic regression model.
- Quick baseline. If you are just looking to put together a quick baseline model or proof of concept model, then you may be better off using a model like a random forest that is less sensitive to the choice of hyperparameters. You can even reach for a model like a linear or logistic regression that does not have hyperparameters that need to be tuned.
Related posts
- When to use logistic regression
- When to use random forests
- When to use ridge regression
- When to use LASSO
- When to use support vector machines
- When to use linear regression
- When to use Bayesian regression
- When to use neural networks
Are you trying to figure out which machine learning model is best for your next data science project? Check out our comprehensive guide on how to choose the right machine learning model.