Are you wondering what mtry is in a random forest? Or maybe you are more interested in hearing about what range of values you should consider for mtry? Well either way, you are in the right place! In this article, we tell you everything you need to know about the mtry parameter in a random forest.
We start out by explaining what mtry is and whether mtry is an important parameter to tune in a random forest model. After that, we talk about the range of values that you should try for mtry. Finally, we discuss other parameters that are closely related to mtry.
What is mtry in random forests?
What does the mtry parameter control in a random forest model? Before we talk about what mtry controls, we will first talk about how random forest models are created.
Random forest models are created by combining a bunch of simple models called decision trees. Many different decision trees are trained independently of one another, then the predictions of all of the decision trees are combined together to create a final prediction. In order to ensure that all of the decision trees that you train do not look exactly the same, you need to add some randomness to the decision tree creation process.
The mtry parameter does exactly this – it controls how much randomness is added to the decision tree creation process. Specifically, the mtry parameter controls how many of the input features a decision tree has available to consider at any given point in time. Since different sets of features will be available to different decision trees at different points, it will be (nearly) impossible for all of your trees to look exactly the same.
In order to explain exactly how mtry works, we first have to explain a little more about how decision trees are built. Decision trees are created by successively splitting data into two parts based on a selected feature (see the diagram below for a simple example). For each new spit that is added to a tree, a new set of features is chosen to be considered. Mtry controls how many features are available to be considered for each new split.
Different ways to represent mtry
It is important to note that there are two different ways to represent mtry. Different representations are used in different implementations of random forests, so you should make sure to pay attention to which representation is being used in the random forest implementation you are using. Here are the two ways to represent mtry.
- As a proportion. Sometimes mtry is restricted to being a promotion that falls between 0 and 1. In this case, mtry represents that proportion of features that are available to be selected from at each split.
- As a whole number. Other times, mtry is represented by a number that ranges from 1 to the total number of features in the model. In this case, mtry represents the number of features that are available to be considered at each split.
Is it important to try different values for mtry?
Is it important to try different values for mtry when you are building a random forest model? In general, it is important to tune mtry when you are building a random forest. The amount of randomness that is injected into a random forest model is an important lever that can impact model performance.
That being said, it is not as important to find the perfect value for mtry as it is to find the perfect value for max depth or number of trees. If you are trying to limit the number of hyperparameter combinations that you have to test out, it is okay to only try a few values of mtry so that you can try a larger range of values for max depth and number of trees.
What values should you consider for mtry?
What range of values should you consider for mtry? In this section we will tell you everything you need to know to answer that question. First we will talk about the main advantages and disadvantages of using a high value for mtry. After that, we will give recommendations for the range of values you should try for mtry.
Advantages of using a large value for mtry
What are the advantages of using a large mtry value? The advantage of using a large mtry value is that it will be more likely that your decision trees are able to select important features that are actually related to the outcome variable for most of the splits that are made.
If you use a low value for mtry and some of the features in your model are not meaningfully related to the outcome variable, then you will often run into cases where none of the features that were chosen for a given split are meaningful. This means you will have to add an uninformative split on a feature that does not contribute anything to the model.
Disadvantages of using a large value for mtry
What are the disadvantages of using a large value for mtry when building a random forest model? The main disadvantage of using a large value for mtry is that there will not be as much randomness introduced into your model. This means that the decision trees in your model will all look very similar to one another, which reduces the benefits of building multiple independent decision trees.
If each model has access to all of the features for every split, then there will be no randomness and the same feature will be selected as the best feature to split on every time. If all of the decision trees in your random forest mode are exactly the same, then you will not benefit from having multiple trees. You will be in the same situation you would be if you only built a single decision tree.
What range of values should you use for mtry?
What range of values should you use for mtry? The good thing about mtry is that there is a hard upper and lower bound on the range of values that mtry can take on. This is because you have to have at least 1 feature available for each split and you can not have more than the total number of the features available at each split.
There are a few common heuristics for choosing a value for mtry. These heuristics are a good place to start when determining what value to use for mtry.
- Square root of the total number of features
- One third of the total number of features
- Log base 2 of the total number of features
Parameters that are similar to mtry
Other names for mtry
Are there other names that are used to refer to the same concept as mtry? Indeed, there are a few different terms that are used to refer to mtry. Different terms are used in different implementations of random forests. Here are some examples of terms that refer to the same concept (or a closely related concept).
- Max features. The term max features is often used to reference the maximum number of features that can be considered at each split. This is the same as mtry.
- Hyperparameter tuning for random forests
- Max depth in random forests
- Number of trees in random forests
- When to use random forests
- Random forest overfitting