6
$\begingroup$

I am trying to automatically extract clusters by density for image embeddings for exploratory analysis. Idea is finding repeating patterns in my dataset, which can be very specific or more general; think images with the exact same background vs images of streets during daytime.

OPTICS seems to work great, but I am finding I need to do sweeps with varying the minimum cluster sizes to extract clusters at these varying levels of specificity. And it does not necessarily allow for constructing a hierarchy -- some sample might be clustered with min_cluster_size=4, end up in no cluster for min_cluster_size=8 and then get clustered again for min_cluster_size=16.

Is there a better density-based method that would allow extracting the kind of hierarchical cluster labels I have in mind? With this rough visualization for example; I would be able to infer that samples in the red and green clusters would also belong in the blue cluster without having to do a sweep with min cluster size set to be larger than the red and green sizes.

enter image description here

Sklearn implementation also provides a cluster hierarchy output, but I cannot tell how to interpret it exactly or translate it to a dendrogram:

https://scikit-learn.org/stable/modules/generated/sklearn.cluster.OPTICS.html#:~:text=cluster_hierarchy_ndarray%20of%20shape%20(n_clusters%2C%202)

$\endgroup$
1
  • 1
    $\begingroup$ Thanks! This looks promising. Particularly the clusterer.condensed_tree_.to_pandas() bit to get a dendrogram $\endgroup$ Commented Sep 30, 2025 at 23:30

1 Answer 1

9
$\begingroup$

HDBSCAN uses hierarchical clustering, and you can access the cluster tree depending on which implementation you use.

The official implementation provides access to the cluster tree (via the .condensed_tree_ attribute). The respective github repo has installation instructions, including pip install hdbscan. This implementation is part of scikit-learn-contrib, not scikit-learn.

Their docs page has an example around visualising the cluster hierarchy - see here.

enter image description here

There is also a scikit-learn implementation sklearn.cluster.HDBSCAN, but it doesn't provide access to the cluster tree.

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.