When to use hierarchical clustering

Share this article

Are you wondering when to use hierarchical clustering? Or maybe you want to hear more about the differences between hierarchical clustering and other clustering algorithms like k-means clustering? Well either way, you are in the right place! In this article we tell you everything you need to know to understand when to use hierarchical clustering.

We start out by discussing what kinds of datasets you can use hierarchical clustering on. After that, we go over some of the main advantages and disadvantages of hierarchical clustering. Finally, we discuss specific situations where you should and should not use hierarchical clustering.

Data for hierarchical clustering

What kinds of datasets should you use hierarchical clustering for? In general, you should use hierarchical clustering for datasets that do not have a clear outcome variable to predict. Hierarchical clustering can help you detect patterns in your data even when you do not have a designated outcome variable.

The most common implementations of hierarchical clustering assume that your features are all numeric, but you can also use hierarchical clustering on datasets with mixed features types as long as you use an appropriate distance metric.

Advantages and disadvantages of hierarchical clustering

Are you wondering what the main advantages and disadvantages of hierarchical clustering are? Here are some advantages and disadvantages you should keep in mind when deciding whether to use hierarchical clustering.

Advantages of hierarchical clustering

  • Get the most similar observations to any given observations. The main advantages of hierarchical clustering is that it provides detailed information about which observations are most similar to each other. This level of detail is not provided by many other algorithms, which generally just return the ID of the cluster a given observation belongs to. Hierarchical clustering is particularly useful in situations where you have a few observations you are particularly interested in and you want to be able to identify observations that are similar to those observations.
  • Not so sensitive to initialization conditions. Another advantage of hierarchical clustering is that it is not sensitive to initialization conditions such as seeds that are set or the order of the dataset. You should generally get very similar results, and in some cases the same exact result, if you re-run your analysis with different initialization conditions.
  • Can be adapted to incorporate categorical variables. Another advantage of hierarchical clustering is that it can be adapted to support situations where you have a mixture of numeric and categorical variables relatively easily. In order to do this, you must ensure that you are using a distance metrics that is appropriate for mixed data types such as Grower’s distance.
  • Well studied. After k-means clustering, hierarchical clustering is probably the second most popular and well-studied type of clustering algorithm. That means that more of you coworkers will be familiar with it, which will make it easier for them to understand and contribute to analyses that use hierarchical clustering.
  • Less sensitive to outliers. Hierarchical clustering is less sensitive to outliers than some other clustering algorithms. This means that the presence of a few outliers is not likely to affect the way the algorithm performs on the other data points. This is because outliers generally do not get added to a cluster until the end of the process when all of the other observations have already been handled (at least for agglomerative hierarchical clustering).
  • Less stringent assumptions about cluster shape. Hierarchical clustering algorithms do not make as stringent assumptions about the shape of your clusters. Depending on the distance metric you use, some cluster shapes may be detected more easily than others, but there is more flexibility.

Disadvantages of hierarchical clustering

  • Relatively slow. One disadvantage of hierarchical clustering is that it is relatively slow. Hierarchical clustering generally requires you to compute the pairwise distance between all of the observations in your dataset, so the number of computations required grows rapidly as the size of your dataset increases.
  • Have to specify the number of clusters (at some point). Like many other clustering algorithms, hierarchical implementations algorithms generally require you to specify the number of clusters you want to have in your data (or provide information that can be used to help make that decision). One caveat here is that the decision about the number of clusters can be changed after the main part of the algorithm has run, so you can experiment with using different numbers of clusters without having to run the algorithm from scratch. This gives hierarchical clustering a slight advantage over other methods like k-means clustering that require the clustering algorithm to be rerun from scratch each time the number of clusters changes.
  • Sensitive to scale. Like many other clustering algorithms, many implementations of hierarchical clustering are very sensitive to the scale. This means that you may need to rescale your data before running your clustering. The exact level of sensitivity will vary depending on what distance metric you are using to calculate the distance between points.

When to use hierarchical clustering

When should you use hierarchical clustering? Here are some examples of situations where you should use hierarchical clustering.

  • Need to identify most similar observations to a given data point. If you are operating in a scenario where you need to be able to identify the observations that are most similar to a given observation, hierarchical clustering is a great option. The dendrogram that is created for hierarchical clustering can identify exactly which observations are most similar to one another.
  • Want to see clusters at varying levels of granularity. With hierarchical clustering, you are able to change the amount of clusters you want to use after the fact without having to re-run the full algorithm from scratch. This is a relatively unique property that is not common to many clustering algorithms. This means you can do things like separate your data into three main clusters to look at very broad trends, but also run a separate version of the analysis where you break those three main clusters into sub-clusters of smaller sizes.

When not to use hierarchical clustering

When should you avoid using hierarchical clustering? Here are some examples of situations where you should avoid using hierarchical clustering.

  • Large dataset with many observations. If you are working with a large dataset with many observations, you are generally better off avoiding hierarchical clustering if possible. In order to perform hierarchical clustering, you have to calculate the pairwise distances between all of the observations in your dataset. This becomes increasingly computationally intensive as the number of observations in your dataset grows. When you have many observations, you may be better off using a faster algorithm like k-means clustering.
  • Only interested in identifying broad patterns. If you are only interested in identifying broad patterns across your dataset and do not have much interest in looking at more granular information about specific observations, you may be better off using another clustering algorithm. There are many clustering algorithms out there that can segment your dataset into different clusters and many of them are more computationally efficient and more easily interpretable. DBSCAN is a great option because it is robust to outliers and can identify clusters of any shape and size.

Extensions of hierarchical clustering

What are some common extensions of hierarchical clustering? Here are some examples of extension of hierarchical clustering that address shortcomings of the original algorithms.

  • BIRCH. BIRCH is an extension of hierarchical clustering that runs faster on large datasets. It also has lower memory requirements than standard hierarchical clustering.

Related articles

Are you trying to figure out which machine learning model is best for your next data science project? Check out our comprehensive guide on how to choose the right machine learning model.


Share this article

Leave a Comment

Your email address will not be published. Required fields are marked *