Questions tagged [correlation]
A measure of the degree of linear association among a pair of variables.
399 questions
2
votes
0
answers
29
views
Is it methodologically sound to apply WOE/IV binning before correlation and VIF-based feature selection?
In credit scoring / logistic regression models, it’s common to apply WOE (Weight of Evidence) binning to continuous and categorical variables before modeling.
However, WOE binning discretizes ...
2
votes
1
answer
39
views
Is there a difference between r-(the sample correlation coefficient) and rho coefficient?
The two concepts lack a clear meaning.To me, the term - Rho appears to reflect the validity and r -reflect the sample correlation)! - is this understanding valid?
7
votes
1
answer
112
views
differentiate between collinearity and correlation coefficient?
The two terms:collinearity and correlation coefficient are freqently used in statistics. Could you please help me understand in ordinary language the difference in two concepts.
0
votes
0
answers
9
views
RStudio (Multiple Correlation Analysis)
I have a population of customers (n=50,000) that I would like to analyze (in RStudio) and predict which product is the Next Most Likely (NML) to be bought, based on the current populations active ...
1
vote
1
answer
49
views
Correlated Features In Classificatification Problem
I'm working on binary classification problem to identify struggling students in university. I have some features that are correlated such as high_school_grade_1 that represents 75% of ...
4
votes
0
answers
46
views
How to detect issues with time series data from multiple related measuring devices?
This is quite a detailed problem I think, so let me provide some context first. I have a quite complex electrical circuit that I am regularly monitoring to make sure it is functioning properly. To do ...
0
votes
0
answers
32
views
Repeated Measures Correlation Question
I am currently using repeated measures correlation to calculate the correlation between 2 variables in repeated measures data link to paper
On the paper, equation 4 denotes how repeated measures ...
0
votes
0
answers
33
views
Can I use the slope of a regression to establish a correlation, if r_square is less than 20%?
I fit a regression line between a variable and target value. The coefficient of determination (R_square) between the two is very less < 20%. Does the calculated slope holds any significance in this ...
6
votes
1
answer
117
views
Regarding proper approach for cleaning data for correlation using scipy.stats
I have a dataframe with two columns:
'HAD_DISEASE' (which stands for if the subject has had said disease) and it has either 1 or 2 as a value, 1 stands for yes and 2 for no.
'VNR', also an integer (...
3
votes
0
answers
69
views
Finding clusters in sales data and predicting future sales based on those
I have monthly sales data from a set of online merchants that sell on an online shop using a cloud-based software solution. The data look something like this:
month
merchant_id
shop_id
shop_country
...
3
votes
0
answers
45
views
Variance / Influence over time from separate sources
Say I have 3 time series. Index X, Sentiment Y, and Rate Z, all floats, x-axis is time.
My hypothesis is, Index X is composed or influenced by both Y, Z plus some noise.
How can I get an estimate for ...
3
votes
1
answer
95
views
Best Tests for Correlation Between Categorical and Numeric Variables (Non-Normal Data)
I’m still learning data science and trying to improve my understanding of statistical tests. Right now, I’m working with a dataset where I have a categorical feature (e.g., “School Type” with values ...
1
vote
3
answers
305
views
What is the impact of low correlation on regression and classification problems, and how does it affect model performance?
I’m building two models (one for a regression problem and the other for a classification task) but I am facing low correlation in the data (lower in the classification problem than in the regression ...
2
votes
2
answers
379
views
Correlation between Continuous and Categorical Variables and Feature Selection
I want to make a classification model and to do this, at the end of my pre-processing and Feature creation, I end up with 167 continuous Features and a discrete target (5 modalities).
I'd like to ...
2
votes
0
answers
57
views
Optimized algorithms for correlation based feature elimination
I have a large dataframe with close to a million rows and 2000 columns. I'm trying to do feature elimination using the correlation between the variables. The problem of course is that for a set of n ...