Hyperparameter tuning in random forests

Share this article

Are you wondering which hyperparameters you should tune when training a random forest model? Or maybe you are more interested in learning what values you should try for each of these parameters? Well either way, you are in the right place!

In this article, we tell you everything you need to know about tuning hyperparameters for random forest models. First, we discuss whether random forest models are highly sensitive to the choice of hyperparameters used. After that, we talk about all of the different hyperparameters that random forest models have. We follow that up with a discussion of which parameters are most important to tune and what ranges of values you should try for each of these hyperparameters.

Are random forests sensitive to hyperparameters?

Are random forest models sensitive to the choice of hyperparameters used to train them? In general, random forest models are less sensitive to the choice of hyperparameters used than many other machine learning models.

Note that just because random forest models are less sensitive to the choice of hyperparameters used, that does not mean that you should not tune the hyperparameters of a random forest model. It is still likely that you will see a modest increase in predictive performance after tuning the hyperparameters of a random forest model.

What hyperparameters do random forests have?

What hyperparameters do random forest models have? In this section, we will discuss the most common hyperparameters that appear across different implementations of random forest models.

  • Number of trees. The number of trees parameter controls the number of individual decision trees that are used in a given random forest model. The main benefit of using more trees is that you are likely to achieve better predictive performance when you incorporate more trees into your model (though you will start to see diminishing returns after a certain point). The main downside of using more trees is that computational performance suffers.
  • Number of features to consider for each split (mtry). Each time a new split is introduced into a decision tree in a random forest, a sample of features is taken from the total feature pool and only the features in that sample are considered to be split on. This helps introduce randomness and ensure that all of the decision trees created do not look exactly the same. Mtry is a parameter that controls how many features are available to be chosen from at each split. The higher the value is, the more likely there are to be important features that are meaningfully related to the outcome variable available for each split. The lower the value is, the more randomness will be introduced and the more difference there will be between trees.
  • Number of observations used for each tree. Each decision tree in a random forest is trained on a different resampled dataset to ensure that all of the trees do not come out looking exactly the same. The number of observations used for each tree controls the size of the resampled dataset that is used to train each decision tree. If this parameter has a large value, there will be more data points available to train each tree. If this parameter has a smaller value, there will be fewer.
  • Maximum tree depth. The maximum tree depth controls the number of levels deep a random forest can be. The higher this number is, the more complexity can be encoded in each tree and the better the predictive performance will be. That being said, you are likely to hit diminishing returns after adding a certain number of levels to a tree. It is also possible that the model will overfit to the training data if it tries to encode too much complexity. Models with a smaller value for max depth are more computationally efficient and take less time for both training and inference.
  • Minimum number of samples required to create a new split. The minimum number of samples required to create a new split is applied to determine whether a new split is allowed to be made when growing a tree. If the number of samples in the current leaf node is higher than the minimum number of samples required to make a split, then another split will be made. Otherwise, no further splits will be made. This is another way to control how deep a tree can grow and has presents similar tradeoffs to the maximum tree depth parameter.
  • Minimum number of samples required to be in a leaf node. Similarly, the minimum numbers of samples required in a leaf node prevents splits that will result in leaf nodes with a number of samples that is smaller than the minimum. This is another way to control how deep a tree can grow and has presents similar tradeoffs to the maximum tree depth parameter.
  • Maximum number of leaf nodes. The maximum number of leaf nodes parameter sets an upper bound on the number of leaf nodes that can be included in a tree. Splits are made successively until the maximum number of leaf nodes is reached, at which point no further splits are made. This is another way to control how deep a tree can grow and has presents similar tradeoffs to the maximum tree depth parameter.
  • Minimum improvement required to create a new split. Each time a new split is added to a decision tree, a numeric criteria is used to evaluate how good that split is (and compare it to other possible splits). If you put constraints on the minimum improvement required to create a new split, the improvement in the numeric criteria for the best possible split will be evaluated against a threshold. If improvement is larger than the threshold, the split will be included. Otherwise, no further splits will be added to that node. This is another way to control how deep a tree can grow and has presents similar tradeoffs to the maximum tree depth parameter.
  • Criteria to determine the quality of a split. Each time a new split is added to a random forest, many split options are evaluated to determine which split is best. There are a few different types of numeric criteria that can be used to evaluate how good each split is. Some implementations of random forest models allow you to choose from a set of metrics that can be used.
A simple example of a decision tree.

How to tune hyperparameters for random forests

While random forests have many possible hyperparameters that can be tuned, some hyperparameters are more important to tune than others. In this section, we will discuss which hyperparameters are most important to tune and what ranges of values should be investigated for each of those parameters.

Which hyperparameters are most important to tune for random forests?

Which hyperparameters are most important to use when you are building a random forest model? Here are the hyperparameters that are most important to tune for most models.

  • Number of trees. The first parameter that you should tune when building a random forest model is the number of trees. In general, values in the range of 50 to 400 trees tend to produce good predictive performance.
  • Number of features considered at each split (mtry). The number of features considered at each split is another parameter that should be tuned when building a random forest model. There are a few common heuristics that can help you select values to try for this parameter. Two common examples are the square root of the total number of features and one third of the total number of features.
  • A parameter that controls tree depth. You should also tune exactly one parameter that controls the depth that trees can grow to. This could include the maximum tree depth, the maximum number of leaf nodes, the maximum number of samples required to create a new split, or the minimum number of samples that can be in a leaf node. Make sure to include at least one or two values that will produce small trees with no more than a few splits. You will often see good results from using a large collection of shallow trees with only 1 or 2 splits. It is common to see diminishing returns after a tree has grown 10 or 20 layers deep.

How many combinations of hyperparameters should you try?

How many different combinations of hyperparameters should you try when training a random forest model? Random forest models are less sensitive to the choice of hyperparameters used than many other machine learning models, which means that you generally do not need to try as many different combinations of hyperparameters (unless very small increases in predictive performance will result in large increase in business value).

In general, we recommend trying at least 3 – 4 values for each hyperparameter you are tuning. You should include at least one value that is on the low end of the recommended range, one that is on the high end of the recommended range, and one that is towards the middle of the recommended range.

Related articles

More articles about random forests


Share this article

Leave a Comment

Your email address will not be published.