Density based clustering with nested clusters -- how to extract hierarchy

Question

I am trying to automatically extract clusters by density for image embeddings for exploratory analysis. Idea is finding repeating patterns in my dataset, which can be very specific or more general; think images with the exact same background vs images of streets during daytime.

OPTICS seems to work great, but I am finding I need to do sweeps with varying the minimum cluster sizes to extract clusters at these varying levels of specificity. And it does not necessarily allow for constructing a hierarchy -- some sample might be clustered with min_cluster_size=4, end up in no cluster for min_cluster_size=8 and then get clustered again for min_cluster_size=16.

Is there a better density-based method that would allow extracting the kind of hierarchical cluster labels I have in mind? With this rough visualization for example; I would be able to infer that samples in the red and green clusters would also belong in the blue cluster without having to do a sweep with min cluster size set to be larger than the red and green sizes.

Sklearn implementation also provides a cluster hierarchy output, but I cannot tell how to interpret it exactly or translate it to a dendrogram:

https://scikit-learn.org/stable/modules/generated/sklearn.cluster.OPTICS.html#:~:text=cluster_hierarchy_ndarray%20of%20shape%20(n_clusters%2C%202)

Thanks! This looks promising. Particularly the clusterer.condensed_tree_.to_pandas() bit to get a dendrogram — Layman
– Layman, Commented Sep 30, 2025 at 23:30

MuhammedYunus · Accepted Answer · 2025-10-01 14:07:29Z

9

HDBSCAN uses hierarchical clustering, and you can access the cluster tree depending on which implementation you use.

The official implementation provides access to the cluster tree (via the .condensed_tree_ attribute). The respective github repo has installation instructions, including pip install hdbscan. This implementation is part of scikit-learn-contrib, not scikit-learn.

Their docs page has an example around visualising the cluster hierarchy - see here.

There is also a scikit-learn implementation sklearn.cluster.HDBSCAN, but it doesn't provide access to the cluster tree.

edited Oct 1, 2025 at 14:07

answered Oct 1, 2025 at 14:01

MuhammedYunus

4,3721 gold badge4 silver badges21 bronze badges

Add a comment |

Stack Exchange Network

Density based clustering with nested clusters -- how to extract hierarchy

1 Answer 1

Hot Network Questions

Density based clustering with nested clusters -- how to extract hierarchy

1 Answer 1

Related

Hot Network Questions