Are you wondering when you should use factor analysis? Or maybe you want to hear more about the situations where you should choose factor analysis over another dimension reduction technique like principal component analysis? Well either way, you are in the right place!
In this article, we tell you everything you need to know to understand whether factor analysis is right for you. We start out by talking about what types of datasets factor analysis can be applied to. After that, we discuss some of the main advantages and disadvantages of factor analysis. Finally, we provide specific examples of situations where you should and should not use factor analysis.
Data for factor analysis
What kind of datasets can you use factor analysis on? Factor analysis is an unsupervised algorithm, which means that it can be used on datasets that do not have a clear outcome variable. Factor analysis is used when you want to reduce the dimension of your dataset, or compress as much of the information from your input features as possible into a smaller collection of transformed features.
Factor analysis is specifically intended to be used in cases where all of the input features in the dataset are numeric. If you have categorical features in your dataset, you may be better off using a different dimension reduction technique.
Advantages and disadvantages of factor analysis
What are the advantages and disadvantages of factor analysis? Here are some of the main advantages and disadvantages you should keep in mind when deciding whether to use factor analysis.
Advantages of factor analysis
- More interpretable than other dimension reduction techniques. Many dimension reduction techniques produce transformed features that are not easily interpretable. Factor analysis, however, provides results that are more easily interpretable. This is because factor analysis provides coefficients that show how much each input feature contributes to each transformed feature, or factor. This contextual information can help you understand what each factor represents.
- Can help identify variables with the same information. Since factor analysis provides contextual information on what input features contribute to each transformed feature, it can be used to help identify input features that contain a lot of the same information. Input features that share the same information will likely contribute to the same transformed features.
- Some implementations can guarantee uncorrelated features. Some implementations of factor analysis are guaranteed to create transformed features that are uncorrelated. This is a desirable property if you want to feed the results of your factor analysis into another algorithm that cannot handle uncorrelated features.
- Not as sensitive to the scale of variables. Many implementations of factor analysis can provide meaningful results even in situations where the input variables are on different scales. This means that you do not have to take an additional step to rescale your data before applying factor analysis.
Disadvantages of factor analysis
- Assumes linear relationships between input variables. One of the main disadvantages of factor analysis is that it makes the assumption that the input features are linearly related to one another. That means it may not perform well on sets of features that are not linearly related.
- Assumes bivariate normal between each pair of variables. In addition to assuming that the input features are linearly related, many implementations of factor analysis also make the assumption that each pair of features can be represented using a bivariate normal distribution. This can be problematic in situations where the input features are not normally distributed.
- Does not handle categorical data well. Another disadvantage of factor analysis is that it does not generally perform well in situations where there are categorical variables in the input data. In these situations, you are often better using a different dimension reduction technique.
- Does not handle outliers well. Factor analysis does not generally perform well in situations where there are large outliers in the input data. Outliers can have an oversized effect on the result of the model and distract it from the broader trends that exist across the majority of the dataset.
- Does not handle missing data. Another disadvantage of factor analysis is that it does not handle missing data natively. That means that you need to pre-process your data to handle missing data before you apply factor analysis to it.
- Cannot produce meaningful output if variables are not correlated. Factor analysis generally cannot produce meaningful results in situations where none of the input features are correlated. Factor analysis works by condensing input features that contain similar information. This is not possible if there are no features that contain similar information.
When to use factor analysis
When should you use factor analysis over another dimensionality reduction technique? Here are some examples of situations where you would be better off using factor analysis.
- Latent variables. Factor analysis is a great tool to turn to when you have latent variables in your data that you want to quantify. For example, imagine you have input features that represent the number of steps a person takes in a day, the number of days per week a person goes to the gym, and the number of sports that a person plays recreationally. All of these features might contribute to a more abstract latent variable that represents how active a person is. Factor analysis is the perfect tool to create transformed features that represent this latent variable and show which of the input features contribute the most to the latent variable.
- Interpretation is important. More generally, factor analysis is a good dimension reduction technique to turn to in situations where interpretation is important. Most dimension reduction techniques do not provide much information about how the transformed features were created or what they represent. The transformed features produced by factor analysis come along with contextual information about which input features contributed the most to each factor. By looking at this contextual information, you can better understand what each of the transformed features represents.
When not to use factor analysis
When should you avoid using factor analysis? Here are some examples of situations where you would be better off using a different dimension reduction technique.
- Non-normal data or data that is not linearly related. Two of the main assumptions of factor analysis are that the input features are linearly related and that any two pairs of features can be represented using a bivariate normal distribution. If you have features that are not linearly related, or if your features are not roughly normally distributed, you may be better off using a dimension reduction technique that does not make as many assumptions about the structure of the data.
Related articles
Are you trying to figure out which machine learning model is best for your next data science project? Check out our comprehensive guide on how to choose the right machine learning model.