Are you wondering whether you should use an ARIMA model for your data science project? Or maybe you want to hear more about situations where ARIMA models perform better than other time series models? Well either way, you are in the right place!
In this article we tell you everything you need to know to determine when to use ARIMA models. First, we talk about what types of datasets ARIMA models should be used for. Next, we talk about a few different types of ARIMA models. After that, we discuss the main advantages and disadvantages of ARIMA models. Finally, we provide specific examples of situations where you should and should not use ARIMA models.
What kind of data should you use ARIMA models for?
What type of data should you use ARIMA models for? ARIMA models are typically used to model time series data. Time series data is data that consists of many repeated measurements that are taken on the same quantity. These measurements are typically taken over and over again over a long period of time. One example of time series data is data on the number of units of a given product that were sold each day.
One thing that is important to know about time series models like ARIMA models is that they can be trained without any data on features that are associated with the outcome variable. Rather than relying on feature values to predict the value of the outcome variable, time series models use previous measurements of the outcome variable to predict future values of the outcome variable. Some time series models can accommodate feature information, but it is generally not required.
Types of ARIMA models
In addition to the basic ARIMA model, there are a few other models that are commonly referred to as ARIMA models. Most of these are models that extend the basic ARIMA model to add new functionality. Here are the main types of models that are referred to as ARIMA models.
- ARIMA. This is a basic formulation of an ARIMA model that does not have any bells and whistles attached. This simple model cannot account for things like seasonality or features that are associated with the outcome variable.
- SARIMA. Basic ARIMA models do not include a seasonal component, so they are not ideal for representing data that contains seasonal trends. A SARIMA model is an extension of a basic ARIMA model that is designed to handle data with seasonal trends. SARIMA models are sometimes referred to as seasonal ARIMA models.
- ARIMAX. Basic ARIMA models do not enable you to incorporate information on features that are associated with the outcome variable. An ARIMAX model is an extension of a basic ARIMA model that is designed to include covariates or features that can be used to help predict the outcome variable.
- SARIMAX. A SARIMAX model is a combination of a SARIMA model and ARIMAX model. That means that this model can be used to model time series data that has both seasonal components and features that are associated with the outcome variable.
Advantages and disadvantages of ARIMA models
Advantages of ARIMA models
What are some of the main advantages of ARIMA models? Here are some of the main benefits of using ARIMA models.
- Well understood. The first advantage of using ARIMA models is that ARIMA models are fairly well studied and well understood. All else considered equal, it is beneficial to use well studied models. When you use common and well studied models, it is easier for teammates to provide advice and help to troubleshoot issues. Less technical stakeholders also tend to trust common models that they have heard of more than complex models they have not heard of.
- Relatively explainable. In addition to being well understood, ARIMA models are also somewhat explainable. This is a large benefit if you work with stakeholders who do not trust complex models that they do not understand. The more explainable a model is, the easier it is to get a skeptical stakeholder to accept the results of that model.
- Can handle covariates. Another advantage of ARIMA models is that there are formulations of ARIMA models that can handle covariate information. This makes it possible to incorporate external information to enhance your predictions. Not all types of time series models can be formulated to handle covariates.
- Flexible model specification. Another advantage of ARIMA family models is that they are highly flexible. This means that they can be adapted to model many different types of time series. This is useful if you want to set up a process that models many different time series using a single type of model.
- Suitable for small datasets. One advantage that ARIMA models have over neural networks and deep learning models that are trained on time series data is that they can be trained on relatively small datasets. This is because there are fewer parameters that need to be specified in order to specify the model.
- Reliable performance. Finally, ARIMA models tend to have solid performance that is on par with other common statistical time series techniques. Even if they are not the absolute best model out there, you can generally expect decent performance from them. That means they are a good option for situations where you do not have a lot of time to experiment and play around with different time series models.
- Can handle missing data. Another benefit of ARIMA models is that they can generally handle data that has missing values. This is important if you are using data that has occasional missing values, such as data from a sensor that occasionally breaks or fails to take a proper reading.
Disadvantages of ARIMA models
What are some of the main disadvantages of ARIMA models? Here are some examples of disadvantages of ARIMA models.
- Cannot handle multiple seasonality natively. One disadvantage of ARIMA models is that ARIMA models generally can’t natively handle situations where there is multiple seasonality. For example, if your data has both daily and yearly trends then it may be difficult to model with basic ARIMA models. There are ways to adapt ARIMA models for these situations, but they are not handled natively in most time series libraries.
- Can struggle with mean shifts. Another disadvantage of time series models is that they can be thrown off by sudden shifts that change the level of the data. They are not ideal for handling situations where there was a sudden increase or drop in the average value of the outcome. This can make it difficult to model time series that were disrupted by events.
- Only intended for univariate time series. Another disadvantage of ARIMA models is that they are generally designed to model univariate time series that are independent of one another. That means that they may not be ideal for handling complex situations where you have multiple interconnected time series that you want to model simultaneously.
- Can not model nonlinear dependencies over time. Another disadvantage of ARIMA models is that they are not designed for situations where there are nonlinear dependencies between the current value of the outcome variable and previous values of the outcome variable. If the dependencies between the current value of a variable and a previous value of the variable are quadratic, for example, then an ARIMA model will not be able to model these dependencies well.
- Sensitive to outliers. Another disadvantage of ARIMA models is that they are somewhat sensitive to outliers and extreme values. If you have a dataset that contains a lot of outliers, you may need to preprocess your data before training an ARIMA model.
- Specifying parameters is more of an art than a science. Another disadvantage of ARIMA models is that they have multiple parameters that need to be specified. Specifying these parameters is more of an art than a science, which means that the quality of the model produced depends more heavily on the skill and experience of the person building the model. Some forecasting libraries provide automated parameter selection utilities, but the quality of these automated choices can vary from one dataset to another.
- Can have high time complexity. Another disadvantage of ARIMA models is that they can be slow and computationally intensive to train depending on the set of parameters that is used. This downside can be reduced by limiting the range of parameter values that are considered, but this may mean sacrificing some predictive power.
- Requires that data can be made stationary by differencing. Finally, ARIMA models rely on the assumption that the data that is being modeled is stationary or can be made stationary by differencing. If this assumption is not met, the performance of the models will suffer.
When to use ARIMA models
So when should you use an ARIMA model to model time series data? Here are some examples of situations where you should consider using ARIMA models.
- When you are collaborating on a project with multiple contributors. If you are working on a large project that is going to have multiple contributors, it is often best to use a relatively common model that all of the contributors are going to be aware of. This is a great use case for ARIMA models, which might be the most well known time series models around.
- When you need a solid baseline to benchmark a more complex model. If you are looking to create a solid baseline model that you can use to benchmark the performance of more complicated models, such as deep learning time series models, then an ARIMA model is a great option. ARIMA models are trusted and well understood models that generally have solid performance. If a model can create better time series forecasts than an ARIMA model, it demonstrates that the model is effective.
When not to use ARIMA models
And when should you avoid using ARIMA models to create time series forecasts? Here are some examples of situations where you should not use ARIMA models.
- When there is a large mean shift or disruption in your data. If you are modeling a time series dataset that has a large mean shift or disruption in it, then you might be better off using a time series model that can account for mean shifts and data disruptions. Facebook’s Prophet model is an example of a model that is designed to account for mean shifts.
- When you need to jointly forecast multiple time series. ARIMA models are designed to be used in situations where you have one or more univariate time series that you want to forecast independently. They generally can not handle situations where you have multiple time series that you want to forecast jointly. In these situations, you are better off looking for a multivariate time series model that can forecast multiple time series jointly.
- When your data has multiple seasonality. If your data has multiple different seasonal trends, then you may be better off using a model that can handle multiple seasonality. A TBATS model is a great example of a model that can handle this type of data natively. Fourier ARIMA models are another option.
Related articles
Time series models
- When to use Fourier ARIMA models
- When to use exponential smoothing models
- When to use TBATS models
- When to use Facebook Prophet
Are you trying to figure out which machine learning model is best for your next data science project? Check out our comprehensive guide on how to choose the right machine learning model.