Questions tagged [clustering]
Cluster analysis is the task of partitioning data into subsets of objects according to their mutual "similarity," without using preexisting knowledge such as class labels. [Clustered-standard-errors and/or cluster-samples should be tagged as such; do NOT use the "clustering" tag for them.]
4,046 questions
0
votes
0
answers
23
views
Modeling recurring monthly transactions with weekend-shift effects: DBSCAN vs rule-based temporal detection?
I have 3 months of categorized bank transaction data and need to identify recurring cash inflows and outflows for lending risk modeling.
Complications:
1. Income dates shift earlier when payday falls ...
0
votes
0
answers
33
views
Role of Z-Tests in Kernel Density Estimation for Cluster Classification
In a recent bioinformatics paper, the authors describe a statistical/machine learning approach to classify clusters of cells using kernel density estimation (KDE) and Z-scores. While the details of ...
1
vote
1
answer
50
views
Vector direction of individual clusters after PCA
Suppose I have two multi-dimensional population samples - $A$ and $B$.
I hypothesise that $\mathbb{E}[A]$ and $\mathbb{E}[B]$ are orthogonal in this high-dimensional space.
To test this hypothesis, I ...
1
vote
0
answers
32
views
Supervised Clustering Algorithms / Full Graph Edge Prediction Algorithms
I have an interesting problem I am trying to solve and I cannot find any non-deep methods available to solve it.
Problem Description
Plain
The real life problem this relates to are handwritten digits ...
2
votes
1
answer
46
views
Pattern analysis for time between events data
I am trying to subset data based on a pattern of "strings" or clusters of food deliveries to young that I see in my data (see plots labeled 2, 4, 5, 6, and 8 in the figure below for the most ...
0
votes
0
answers
27
views
How to identify and quantify main tendencies across participants from cluster membership heatmaps?
I'd appreciate your thoughts on the following problem.
I've created a heatmap plot (attached) showing the cluster membership ratio for each participant (in separate subplots) and condition (η).
Now, I'...
2
votes
1
answer
122
views
Examining country-level effects based on individual-level data combined with country-level data
I am new to working with country-level effects in comparative OLS regression with individual-level data. Are there any good resources for this?
Suppose my dependent variable is social integration (an ...
0
votes
0
answers
44
views
Are there clustering algorithms or preprocessing strategies tailored for zero-inflated and continuous data types?
I am currently working on the project where I need to assign customers across N recipes before AB testing such that KPIs for each customer are balanced across recipes (reduce pre-test bias)
Dataset ...
0
votes
0
answers
57
views
How to peform clustering on heavily right skewed data and zero inflated data
I am currently working on clustering continuous variables (such as AOV, RPV, and conversions(conversion/visits)). The variables are heavily right skewed with long tails and one variable is dominated ...
3
votes
1
answer
129
views
Bayesian Clustering with a Finite Gaussian Mixture Model with Missing Data
I would like to perform clustering with a finite Gaussian Mixture model, however, I have missing data (some features are missing at random). I am using Variational Inference to fit my Bayesian GMM. Is ...
2
votes
0
answers
72
views
Estimating number of clusters using Scikit Bayesian GMM
I am generating clustering data using the Bayesian mixture of Gaussian models described in Bishop's Pattern Recognition and Machine Learning textbook, with model parameters drawn from the following ...
1
vote
1
answer
59
views
Mixture-Based Clustering for Ordered Stereotype Model - Distance Scores
I have a 5-variable/3 category-level ordinal survey data set. E.g. 5 health variables ranked 1-3 (good-moderate-poor).
I want to row-cluster different responses. But also, I want determine whether ...
1
vote
0
answers
54
views
Are equal and diagonal variance matrices implicitly assumed in k-means clustering?
When applying k-means clustering, I understand that the goal is to partition the dataset by assigning each point to its nearest cluster center. However, I’ve come across statements that k-means can be ...
1
vote
0
answers
72
views
"How to validate if a dataset has natural clusters?"
I've recently learnt unsupervised learning methods such as KMeans and DBSCAN.
While working on this dataset, I applied KMeans clustering but faced the following issues: The Elbow Method showed no ...
0
votes
1
answer
60
views
Data cross validation to predict label from cluster analysis [closed]
My project has the following steps:
Use elbow method to determine the features and number of clusters for kmeans.
Run kmeans on the data (with determined features and n clusters), and gives the ...