Are you wondering whether you should use generalized additive models (GAMs) for your next data science project? Or maybe you are more interested in hearing about the differences between generalized additive models and generalized linear models (GLMs)? Well either way, you are in the right place!
In this article, we tell you everything you need to know to determine when to use generalized additive models. We start out by discussing what kinds of datasets generalized additive models can be used on. After that, we discuss the advantages and disadvantages of generalized additive models. Finally, we provide specific examples of cases where you should and should not use generative additive models.
What data is required to train generalized additive models?
What types of datasets should generalized additive models be used on? In general, generalized additive models should be used on datasets that have both an outcome variable you want to predict and features that are associated with the outcome variable. They should be used in situations where you want to model the relationship between the features and the outcome variable.
What types of outcome variables can be modeled using generalized additive models? Much like generalized linear models, generalized additive models can be used with multiple types of outcome variables. Generalized additive models can be used to model numeric outcome, binary outcome, count variables, and more.
Advantages and disadvantages of generalized additive models
Advantages of generalized additive models
What are the main advantages of generalized additive models? Here are some of the main advantages that generalized additive models have over other similar models.
- Can model highly nonlinear relationships. One of the main advantages of generalized additive models is that they can be used to model highly nonlinear relationships between your features and your outcome variables. That means that they are great for situations where there relationships between your features and your outcome variables are highly nonlinear and cannot easily be made linear by applying common transformations.
- Provide some interpretability. Another advantage of generalized additive models is that they are more interpretable than many other models that are used to model nonlinear relationships. They are a good option to turn to if you want to be able to examine the relationship between your features and the outcome variable.
- Include regularization to avoid overfitting. Another advantage of generalized additive models is that regularization can be incorporated to avoid overfitting. This is particularly important for models that are highly flexible like generalized additive models.
- Less global sensitivity to outliers. While it is possible for outliers to have a strong local influence on the fit of a generalized additive model, outliers do not have as strong of a global influence. This is especially true when you compare generalized additive models to linear models like linear regression models.
Disadvantages of generalized additive models
What are some of the main disadvantages of generalized additive models? Here are some disadvantages of generalized additive models.
- Cannot model with non-additive relationships. One disadvantage of generalized additive models is that they struggle in situations where the relationships between the features and the outcome variables are not additive. They do not capture situations where there are interactions between multiple features where one the value of one feature affects the relationship between another feature and the outcome variable natively.
- Not well known. Another disadvantage of generalized additive models is that they are not as commonly used or well known as many other supervised learning models. That means that collaborators may need to set aside some time to learn about the model before they can contribute meaningfully.
- Wider confidence intervals and less powerful tests. As a tradeoff for the high flexibility, generalized additive models generally have wider confidence intervals than simpler models. If you care about having narrow confidence intervals and your data can be modeled using a generalized linear model, you may be better off using that approach.
- Require some validation. Generalized additive models are highly flexible and can sometimes converge to solutions that do not make sense. It is generally recommended that you spend some time inspecting the results of your models to make sure that they make sense. Generalized additive models might not be the best option if you do not have bandwidth for this.
- Not guaranteed to converge. One disadvantage of GAMs is that the models are not guaranteed to converge and you may need to experiment with different model training routines before you arrive on a model that does converge appropriately.
- Can take longer to converge. Even if the models do converge eventually, generalized additive models tend to take longer to converge than more simple models. For example, they generally take a little longer to converge than generalized linear models.
- Sensitive to predictors that are associated with one another. Much like generalized linear models are sensitive to situations where multiple predictors are linearly correlated with one another, generalized additive models are also sensitive to situations where multiple predictors display similar patterns. Concurvity is a generalization of multicollinearity that is used to describe this specific situation.
- Does not natively handle missing values. Another disadvantage of generalized additive models is that they do not natively handle missing values. Most implementations of generalized additive models require the user to select a method that will be used to handle missing data. While there are some more advanced options, the default option is usually to drop rows with missing values.
When to use generalized additive models
So when should you use generalized additive models over other types of statistical models? Here are some examples of situations where you should use generalized additive models.
- When your data is highly nonlinear but you still need interpretability. Generalized additive models are generally a good option to reach for when you have data that is highly nonlinear but you still need your results to be interpretable. When data is highly nonlinear, it is common for data professionals to reach for nonparametric machine learning models. That being said, these nonparametric models generally are not as interpretable as classical statistical models. Generalized additive models strike a nice balance here.
When not to use generalized additive models
When should you avoid using generalized additive models? Here are some examples of situations where you should consider using other statistical models or machine learning models.
- When the relationships between your features and your outcome are linear. If the relationships between your features and your outcome variable are linear, or if they can reasonably be made to be linear by applying a common transformation, then you may want to consider using a more simple model such as a linear regression model or another generalized linear model. These models are simpler to use, more widely understood, and generally have tighter confidence intervals.
- When you are working on a project with many contributors. Generalized additive models are relatively niche and are not understood by many data professionals. If you are collaborating on a project with many data professionals and you want many different people to be able to contribute to a project, you may be better off using a more common model that other data professionals are more likely to be familiar with. This is especially true if you want contributors to be able to contribute quickly without requiring a lot of time to get caught up to speed.
Are you trying to figure out which machine learning model is best for your next data science project? Check out our comprehensive guide on how to choose the right machine learning model.