Are you wondering when to use spectral clustering? Or maybe you are interested in hearing more about the practical differences between spectral clustering and other clustering algorithms. Well either way, you are in the right place!
In this article, we tell you everything you need to know to determine whether spectral clustering is right for you. We start out by discussing the types of datasets that spectral clustering should be used on. After that, we provide more context on the advantages and disadvantages of spectral clustering. Finally, we provide specific examples of scenarios where you should and should not use spectral clustering.
Data for spectral clustering
What kind of datasets can you use spectral clustering on? Spectral clustering is a good option to turn to if you have a dataset without an obvious outcome variable to predict. Spectral clustering can be applied to datasets without an obvious outcome variable to identify patterns and similarities that exist across different observations.
Advantages and disadvantages of spectral clustering
Are you wondering what the main advantages and disadvantages of spectral clustering are? Here are some of the main advantages and disadvantages you should keep in mind when determining whether to use spectral clustering.
Advantages of spectral clustering
- Applicable for high dimensional datasets. One of the main advantages that spectral clustering has over other clustering algorithms is that it can be used on high-dimensional datasets with many features. This is a relatively rare quality that applies primarily to graph-based clustering methods like spectral clustering.
- Not strong assumptions about cluster shape. Spectral clustering does not make strong assumptions about the shape of the clusters in the data. That means that it is appropriate to use spectral clustering even when you suspect the clusters in your data may be irregularly shaped.
- Can sometimes handle categorical variables. Some implementations of spectral clustering can handle cases where you have mixed data types, such as cases where you have categorical variables in your data. This is in part because spectral clustering uses similarity metrics rather than distance metrics to determine which points have more in common.
Disadvantages of spectral clustering
- Relatively slow. One disadvantage of spectral clustering is that it is relatively slow compared to other clustering algorithms like k-means clustering. If you have a dataset with many, many data points then you may be better off using a faster algorithm.
- Less common. Another disadvantage of hierarchical clustering is that it is less popular and well-studied than other clustering algorithms like k-means and hierarchical clustering. That means that it will not be as easy for collaborators to provide advice on or assist on projects that use spectral clustering.
- Not intuitive or easy to explain . Many clustering algorithms are intuitive and easy to explain to relatively non-technical coworkers. This is not the case with spectral clustering, which can be difficult to fully understand if you do not have a strong math background.
- Sensitive to seed. Traditional spectral clustering algorithms include a step where the k-means algorithm is applied. That means that like k-means clustering, spectral clustering is sensitive to the choice of seed and initialization conditions that are used. The clustering results may change when the algorithm is run multiple times.
- Need to select the number of clusters. As with many other clustering algorithms, spectral clustering requires you to select the number of clusters that should be used for your dataset. This can be difficult to do if you do not have strong intuition about the true number of clusters in the data.
When to use spectral clustering
Are you wondering when you should use spectral clustering over another clustering algorithm? Here are some scenarios where you should consider using spectral clustering.
- Many features in dataset. The main situation where you should consider using spectral clustering, or other graph-based clustering techniques, is when you have many features in your dataset and you want to include them all in your model. Many other clustering algorithms require you to apply dimensionality reduction or feature selection techniques to reduce the dimensionality of your data, but spectral clustering can be directly applied to high dimensional datasets.
When not to use spectral clustering
Are you wondering when you should avoid using spectral clustering? Here are some examples of scenarios where you should not use special clustering.
- Large dataset. The main reason you should avoid using spectral clustering is if you are working with a very large dataset that has many observations. Spectral clustering algorithms are relatively slow and the problem gets worse as the number of observations grows. If you have a very large dataset, k-means and mini batch k-means are faster options. DBSCAN is another option that performs better on irregularly sized clusters.
- Need an explainable algorithm. If you are working with coworkers that are skeptical of complicated algorithms they do not understand, you may be better off using another clustering algorithm that is more intuitive. Many other algorithms like k-means, DBSCAN, and hierarchical clustering have more straightforward explanations.
- When to use hierarchical clustering
- When to use DBSCAN
- When to use gaussian mixture model
- When to use k-means clustering
Are you trying to figure out which machine learning model is best for your next data science project? Check out our comprehensive guide on how to choose the right machine learning model.