Cobine p-values, same data, same test, different labels

Ask Question

Asked 6 years, 7 months ago

Modified 6 years, 7 months ago

Viewed 61 times

I have a dataset (300 samples vs 4000 features), I'm trying to extract meaningful features related to two condition A and B. Both conditions have levels 0, 1 and 2 that stand for none, mild, severe. Since features don't follow a normal distribution I used a non-parametric Kruskal test, now I have two lists of p-values pValA and pValB that contains significance levels of each feature in my dataset for condition A and condition B. How can I merge both lists and select meaningful features? I used to do the average but my supervisor faked a heart attack when he saw my code.

asked Sep 19, 2019 at 20:07

Giuseppe Minardi

1133 bronze badges

1

$\begingroup$ I don't think anything meaningful can come out of doing 4000 kruskal-wallace tests. You have massive type II error inflation issues.... If you are trying to determine which covariates are related to your conditions you should be using a hypothesis generation technique (i.e. some kind of dimensional compression) rather than t-tests, although even then it wont be able to handle the disparity between you sample size and the number of covariates you have. Can you post a sample of your data to show folks what you are working with? $\endgroup$

André.B
– André.B

2019-09-19 21:14:56 +00:00
Commented Sep 19, 2019 at 21:14
$\begingroup$ I don't think I can since this is a private dataset. Another idea would be to use the feature importance of a random forest $\endgroup$

Giuseppe Minardi
– Giuseppe Minardi

2019-09-19 21:47:05 +00:00
Commented Sep 19, 2019 at 21:47
$\begingroup$ If you can't release the data then try creating a fake set to post here (preferably one that can be directly imported into a stats program). It is very hard to help without seeing what one is dealing with. You could try a random forest but you would need to standardise the units and ensure that you do not have any overly noisy variables or they will drive your results as you have a very small number of observations relative to covariates. $\endgroup$

André.B
– André.B

2019-09-19 22:00:58 +00:00
Commented Sep 19, 2019 at 22:00
$\begingroup$ Ok then, I will try to build fake data $\endgroup$

Giuseppe Minardi
– Giuseppe Minardi

2019-09-19 22:04:02 +00:00
Commented Sep 19, 2019 at 22:04
1

$\begingroup$ I would try ordinal LASSO / RIDGE regression. See cran.r-project.org/web/packages/glmnetcr/vignettes/glmnetcr.pdf. It doesn't give you p-values, but in this case you shouldn't really look for them anyway. $\endgroup$

user2974951
– user2974951

2019-09-20 06:18:15 +00:00
Commented Sep 20, 2019 at 6:18

| Show 1 more comment

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Stack Exchange Network

Cobine p-values, same data, same test, different labels

0

Hot Network Questions

Cobine p-values, same data, same test, different labels

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Related

Hot Network Questions