Skip to main content

Questions tagged [clustering]

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval etc.

5 votes
1 answer
69 views

I am doing a binary classification modelling project - and was wondering if running clustering on numeric features to create groupings as another categorical feature, would be beneficial, under a ...
user54565's user avatar
  • 115
3 votes
1 answer
45 views

I have a large set of document embeddings, and I would like to sample a subset where the median or average pairwise distance is maximized. The idea here is to get a more balanced sample set where long ...
Layman's user avatar
  • 291
7 votes
1 answer
90 views

My dataset consists of board games data: each board game is rated with a categorical variable (low, medium, high). I've plotted the LDA projection to check whether classes are linearly separable. The ...
Giulio Lanza's user avatar
1 vote
0 answers
48 views

How can I visualise a hierarchical ontology of items in embedding space, combining text embeddings with the graphical structure? (Something similar to the example below) I have a hierarchical ...
baked goods's user avatar
1 vote
0 answers
38 views

I'd appreciate your thoughts on the following problem. I've created a heatmap plot (attached) showing the cluster membership ratio for each participant (in separate subplots) and condition (η). Now, I'...
maria mystakidou's user avatar
6 votes
1 answer
115 views

I am trying to automatically extract clusters by density for image embeddings for exploratory analysis. Idea is finding repeating patterns in my dataset, which can be very specific or more general; ...
Layman's user avatar
  • 291
2 votes
0 answers
67 views

Problem description I have a dataset which is a combination of multiple sources gathering the same kind of data. I have retrieved those data to fit them into several columns of a pandas dataframe. All ...
patacoing's user avatar
3 votes
1 answer
89 views

I am working on a cluster analysis. I have 4 clusters with about 35,000 datapoints. I got relatively strong clusters. I am in marketing and this is for segmentation. One of these clusters has a very ...
David Orndorf's user avatar
6 votes
0 answers
63 views

I am looking ahead to using SciPy's fcluster to hierarchically cluster according to the single-linkage. Clusters can be long and meandering. In extracting a flat ...
user2153235's user avatar
3 votes
0 answers
69 views

I have monthly sales data from a set of online merchants that sell on an online shop using a cloud-based software solution. The data look something like this: month merchant_id shop_id shop_country ...
Max's user avatar
  • 31
8 votes
0 answers
63 views

I am learning about hierarchical clustering from SciPy's linkage documentation (which is much more understandable than the Wikipedia page. Some of the cluster ...
user2153235's user avatar
7 votes
1 answer
304 views

I am following the example code in the linkage documentation: ...
user2153235's user avatar
7 votes
1 answer
133 views

I am educating myself on hierarchical clustering and the relevant SciPy methods. The 1st argument of the linkage method is a 1D condensed distance matrix $X$ of ...
user2153235's user avatar
7 votes
1 answer
148 views

SciPy's fclusterdata requires the coordinates of M points in N dimensional space (or M observations of N dimensions each). My data is in the form of pairwise ...
user2153235's user avatar
3 votes
0 answers
55 views

I need to cluster 300 million unstructured addresses for validation, ensuring variants (e.g., "55 Tower F. EST City" vs. "Tower F 55, EST City, SINGA ROAD") map to a group similar ...
IAIMT2024's user avatar

15 30 50 per page
1
2 3 4 5
92