Are you wondering when you should use t-sne? Or maybe you are more interested in hearing how t-sne compares to similar dimension reduction techniques like PCA? Well either way, you are in the right place!
In this article, we tell you everything you need to know to understand when to use t-sne. We start out by talking about what kinds of datasets t-sne should be applied to. After that, we discuss the main advantages and disadvantages of t-sne. Finally, we provide specific examples of situations where you should and should not apply t-sne to your data.
Data for t-sne
What kind of datasets should you apply t-sne to? T-sne is an unsupervised machine learning algorithm, which means that it can be applied to datasets that do not have a specific outcome variable to predict. Instead, t-sne is applied to a set of features to reduce the dimensionality of that feature set. This just means that t-sne tries to compress the information contained in the original feature set into a smaller set of transformed features.
In general, it is easiest to use t-sne when your features are numeric. T-sne may also be applied to other data types, but you need to be careful to make sure you are using an appropriate distance metric in these situations.
Advantages and disadvantage of t-sne
What are the main advantages and disadvantages of t-sne? Here are some of the main advantages and disadvantages you should keep in mind when deciding whether to use t-sne.
Advantages of t-sne
- Preserves local neighborhoods. One of the main advantages of t-sne is that it preserves local neighborhoods in your data. That means that observations that are close together in the input feature space should also be close together in the transformed feature space. This is why t-sne is a great tool for tasks like visualizing high dimensional data in a lower dimensional space. The points that were adjacent in the high dimensional spaces should also be close together in the low dimensional space.
- Does not assume linear relationships between features. Another advantage of t-sne is that it does not make many assumptions about the way the features in your data are related. That means that it is appropriate to use in situations where the features in your data are not linearly related.
- Not thrown off by outliers as easily. In general, t-sne results are not distorted by outliers as easily as the results of some other dimension reduction techniques.
Disadvantages of t-sne
- Relatively slow. One of the main disadvantages of t-sne is that it is computationally intensive and therefore relatively slow. This means that it is not appropriate for very large datasets with many observations. Part of the reason for this is that pairwise distances need to be calculated between all of the points in the dataset.
- Not great at pre-processing features for prediction. T-sne is generally not as good at pre-processing features that are going to be fed into a predictive mode as other models like PCA. When you train a t-sne model on a dataset, it focuses only on the specific details of the observations in that particular dataset. It does not learn a general function that can be effectively applied to unseen data when, for example, you get new observations coming in that you want to make predictions on.
- Has hyperparameters. Another disadvantage of t-sne is that it has hyperparameters that need to be tuned correctly in order for the model to perform well. This means that you have to go through some additional steps to make sure these parameters are set appropriately.
- Sensitive to scale. Like many other dimension reduction techniques, t-sne is sensitive to scale and can provide misleading results if the input features are on very different scales. This is because features that are on larger scales may have undue influence on the results. This means that you may need to re-scale your data before applying t-sne.
- Cannot handle missing data. Like many other dimension reduction techniques, t-sne cannot handle missing data natively. This means that you need to preprocess your data to remove the missing values before you apply t-sne.
- Sensitive to initialization conditions. T-sne is sensitive to the choice of seed and initialization conditions used, which means that you may get different results if you run the model on the same dataset multiple times.
- Need to be careful when incorporating categorical variables. As part of the calculations that are performed during t-sne, pairwise distances are calculated between all of the points. You have to be careful about incorporating categorical variables into your data whenever distances are being calculated. Standard distance metrics do not play well with categorical variables. That being said, there are distance metrics that are more appropriate for mixed data types. They may or may not be available to use in a given implementation of t-sne though.
When to use t-sne
So when should you use t-sne rather than other dimension reduction techniques? Here are some examples of situations where you should use t-sne rather than another dimension reduction technique.
- Visualizing high dimensional data. T-sne is a great option to reach for when you have a high dimensional dataset that you want to be able to visualize in a lower dimension. This is because t-sne preserves the local structure of the data and tries to make sure that observations that are close to one another in the input feature space are also close together in the transformed feature space.
- Features are not linearly related. T-sne is a non-parametric algorithm, which means that it does not make many assumptions about the data or the way that features are related. If you are working with a dataset that has many features that are not linearly related, you may be better off using t-sne than another algorithm that makes stronger assumptions about the structure of the input data.
When not to use t-sne
When should you avoid using t-sne? Here are some examples of situations where you should avoid using t-sne and consider using other dimension reduction techniques instead.
- Large dataset. T-sne is a relatively slow algorithm that only gets slower as the number of observations in your dataset increases. That means that it does not scale well to datasets that have many observations. If you have a very large dataset with many observations, you should generally avoid t-sne and try to use a faster alternative line PCA.
- Pre-processing features for a predictive model. If you are pre-processing your data to feed it into a predictive model, you may see better performance with PCA than t-sne. Of course this assumes that your dataset is appropriate for PCA and the assumptions of PCA are met. If the assumptions of PCA are not met, there are situations where it might make sense to use t-sne.
Related articles
Are you trying to figure out which machine learning model is best for your next data science project? Check out our comprehensive guide on how to choose the right machine learning model.