Questions tagged [clustering]

Ask Question

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval etc.

1,376 questions

5 votes

1 answer

69 views

Re-Utilizing Clusters as Features for Boosting

I am doing a binary classification modelling project - and was wondering if running clustering on numeric features to create groupings as another categorical feature, would be beneficial, under a ...

user54565

asked Dec 18, 2025 at 12:43

3 votes

1 answer

45 views

Is there a fast method from sampling from document embeddings to maximize pairwise distances?

I have a large set of document embeddings, and I would like to sample a subset where the median or average pairwise distance is maximized. The idea here is to get a more balanced sample set where long ...

Layman

asked Nov 11, 2025 at 23:52

7 votes

1 answer

90 views

LDA linearly separates 2 out 3 classes, what insight does it provide?

My dataset consists of board games data: each board game is rated with a categorical variable (low, medium, high). I've plotted the LDA projection to check whether classes are linearly separable. The ...

Giulio Lanza

asked Oct 26, 2025 at 10:36

1 vote

0 answers

48 views

Combining Embeddings and Ontology (DAG) in Visualisation

How can I visualise a hierarchical ontology of items in embedding space, combining text embeddings with the graphical structure? (Something similar to the example below) I have a hierarchical ...

baked goods

asked Oct 23, 2025 at 15:04

1 vote

0 answers

38 views

How to identify and quantify main tendencies across participants from cluster membership heatmaps?

I'd appreciate your thoughts on the following problem. I've created a heatmap plot (attached) showing the cluster membership ratio for each participant (in separate subplots) and condition (η). Now, I'...

maria mystakidou

asked Oct 23, 2025 at 9:21

6 votes

1 answer

115 views

Density based clustering with nested clusters -- how to extract hierarchy

I am trying to automatically extract clusters by density for image embeddings for exploratory analysis. Idea is finding repeating patterns in my dataset, which can be very specific or more general; ...

Layman

asked Sep 30, 2025 at 21:01

2 votes

0 answers

67 views

Clustering from multi-sources with missing data

Problem description I have a dataset which is a combination of multiple sources gathering the same kind of data. I have retrieved those data to fit them into several columns of a pandas dataframe. All ...

patacoing

asked Sep 8, 2025 at 14:52

3 votes

1 answer

89 views

Calculating closest point on cluster boundary from point of interest

I am working on a cluster analysis. I have 4 clusters with about 35,000 datapoints. I got relatively strong clusters. I am in marketing and this is for segmentation. One of these clusters has a very ...

David Orndorf

asked Aug 16, 2025 at 18:52

6 votes

0 answers

63 views

Extract flat clusters from hierarchy: Which "criterion" makes most logical sense for single-linkage?

I am looking ahead to using SciPy's fcluster to hierarchically cluster according to the single-linkage. Clusters can be long and meandering. In extracting a flat ...

user2153235

asked Jul 16, 2025 at 18:38

3 votes

0 answers

69 views

Finding clusters in sales data and predicting future sales based on those

I have monthly sales data from a set of online merchants that sell on an online shop using a cloud-based software solution. The data look something like this: month merchant_id shop_id shop_country ...

Max

asked Jul 16, 2025 at 6:12

8 votes

0 answers

63 views

How does SciPy's linkage() calculate centroid from pairwise distances?

I am learning about hierarchical clustering from SciPy's linkage documentation (which is much more understandable than the Wikipedia page. Some of the cluster ...

user2153235

asked Jul 15, 2025 at 20:14

7 votes

1 answer

304 views

SciPy's dendrogram method depicts two cluster merges as one

I am following the example code in the linkage documentation: ...

user2153235

asked Jul 15, 2025 at 19:33

7 votes

1 answer

133 views

SciPy's linkage method should take 1D condensed distance matrix of length n choose 2

I am educating myself on hierarchical clustering and the relevant SciPy methods. The 1st argument of the linkage method is a 1D condensed distance matrix $X$ of ...

user2153235

asked Jul 15, 2025 at 17:59

7 votes

1 answer

148 views

If my data is all-pairs distances, can I still use scipy's fcluster or fcluster data?

SciPy's fclusterdata requires the coordinates of M points in N dimensional space (or M observations of N dimensions each). My data is in the form of pairwise ...

user2153235

asked Jul 14, 2025 at 22:02

3 votes

0 answers

55 views

Scalable Clustering Strategies for 300M Address Variants: Validation and Deduplication

I need to cluster 300 million unstructured addresses for validation, ensuring variants (e.g., "55 Tower F. EST City" vs. "Tower F 55, EST City, SINGA ROAD") map to a group similar ...

IAIMT2024

asked Jun 23, 2025 at 6:41

15 30 50 per page

2 3 4 5

…

92 Next

Stack Exchange Network

Questions tagged [clustering]

Re-Utilizing Clusters as Features for Boosting

Is there a fast method from sampling from document embeddings to maximize pairwise distances?

LDA linearly separates 2 out 3 classes, what insight does it provide?

Combining Embeddings and Ontology (DAG) in Visualisation

How to identify and quantify main tendencies across participants from cluster membership heatmaps?

Density based clustering with nested clusters -- how to extract hierarchy

Clustering from multi-sources with missing data

Calculating closest point on cluster boundary from point of interest

Extract flat clusters from hierarchy: Which "criterion" makes most logical sense for single-linkage?

Finding clusters in sales data and predicting future sales based on those

How does SciPy's linkage() calculate centroid from pairwise distances?

SciPy's dendrogram method depicts two cluster merges as one

SciPy's linkage method should take 1D condensed distance matrix of length n choose 2

If my data is all-pairs distances, can I still use scipy's fcluster or fcluster data?

Scalable Clustering Strategies for 300M Address Variants: Validation and Deduplication

Hot Network Questions