Appropriate approach for checking concentration change for 80 substances with n = 3 for three time points

Question

I am relatively new to statistics and would appreciate some guidance on whether my analysis approach is appropriate.

What I have: I measured concentrations of 80 different substances. There are three time points: 6h, 24h, and 72h. For each time point, I have 3 replicates. Each replicate sample contains measured concentrations for all 76 substances. So for each substance individually: n = 3 per time point and total n = 9 observations per substance

What I want to answer: For each substance separately: Does concentration differ between 6h, 24h, and 72h?

Approach idea: For each substance concentration ~ time with ANOVA and then correct for multiple testing using FDR. However, looking at the low number of replicates, the assumption of normality/homogeneity of variance is difficult to answer (e.g. screen). Then I thought about using the Kruskal-Wallis test as an alternative: run it for each substance, then correct for multiple testing using FDR. I need to identify if the concentrations of substances significantly change after 72h.

Is this appropriate? Am I missing a better modelling strategy?

I wouldnt read too much into the modest spread differences. Being concentrations I'd lean toward a gamma GLM with log-link (perhaps GLMM given the replicates but if theyre 'true' replicates it might be unnecessary). That variance function should slightly improve the overall appearance of the spread location plot along the way but isnt primarily why I'd pick that model. — Glen_b
– Glen_b, Commented 2 days ago

Michael Lew · Accepted Answer · 2026-02-26 19:52:46Z

5

This sounds like it might be a preliminary study of a system intended to support hypothesis generation and the design of subsequent studies. If so then the types of analysis that might be helpful can be quite different to the analyses that help with other types of studies. You need to give more context than you have for anyone to really know which inferences you wish to gain from the data.

If the study is not intended as preliminary then answers to your question will have to deal with the fact that there are far too few replicates for the inferences to have much certainty.

If the study is preliminary and you wish to decide which substances are worth follow-up then you might be able to do so just by inspection of the data. Graph them all and choose the ones that look most promising by eye.

If you insist on statistical approaches then consider whether there is more information that you have about the system that might help with design of the analysis. There is likely too little data to be able to reliably infer the form of any time-related changes from the data themselves, so background knowledge and previous studies in the same or similar systems can be very helpful.

answered 2 days ago

Michael Lew

18.9k2 gold badges46 silver badges82 bronze badges

$\begingroup$ Thank you first of all. Yes, one could say this is somehow preliminary or that the experiment was rather done in order to test something. Basically, we currently have a specific extraction procedure of substances and we want to test if the drying duration affects the concentration; that's why we want to find out if there is a (significant) decline. It's only a very small project, that's why more replicates weren't possible $\endgroup$

hiz
– hiz

2026-02-26 20:30:45 +00:00
Commented 2 days ago
$\begingroup$ Sounds like you might be able to pool the values from multiple substances to get a clearer picture. Is there a reason to suppose that some substances are affected by drying time and not others? Can you run another study of one or two compounds and have more replicates? $\endgroup$

Michael Lew
– Michael Lew

2026-02-26 21:33:12 +00:00
Commented 2 days ago

Add a comment |

jginestet · Accepted Answer · 2026-02-27 21:04:03Z

A few initial remarks

You will obviously be (severely?) limited by your lack of power (only 3 replicates per time point), which you seem well aware of.
Given this, and as this seems more of an exploratory analysis, you may be better served by eyeballing the results, than by formal, rigorous "statistically significant" test results (because of the lack of power, you may not get many such significant results).
I would imagine that the concentrations would be monotonously decreasing (how could a concentration of an analyte increase? all a compound can do is decay?). This should make eyeballing easier.
Furthermore, given this, I would use single sided post-hoc comparisons (to give me a bit more power), and even use $2 \cdot \alpha$, to "simulate" a single-sided omnibus test.
I would not at all worry about heteroscedasticity, by using a Welch ANOVA (which does not require homoscedasticity). In fact, one probably should always use the Welch ANOVA, instead of the classical one, even when the variances are (close to) equal.
I would not worry too much more about normality of residuals. With only 9 datapoints, formal tests (e.g. Shapiro-Wilk) are of no use, and Q-Q plots are not much better. Given that ANOVA is pretty robust to "reasonable" departures from normality, and that you have an exploratory study, I would just "assume residuals are normal enough".
Last, I would point out that ANOVA and Kruskal-Wallis (KW) test different nulls (ANOVA tests equality of means; KW tests stochastic equivalence). Also KW is quite sensitive to heteroscedasticity (and I know of no "Welch version thereof).
Last, what you seem to be interested in is not whether there are statistically significant decreases in concentration, but whether there are "practically important* ones, two very different things. Only you (and your colleagues), based on specific context and domain knowledge, can define what is a practically important decrease.

So I advise you to define what an important decrease is (which could be different for different substances), and then eyeball your data using various plots (e.g. individual value plots, such as the one below -pure fictitious example) to see if any substance decreases more than its limit.

If you really want to then test statistical significance, use Welch ANOVAs, with single sided post-hoc tests and/or $2 \cdot \alpha$.
And yes, use FDR to deal with the multiple comparison issue. But even then... If you identify a handful of substances which decreased more than their practically important limits, just collect a few more replicates, for these few substances, and see if the result is statistically significant.

Christian Hennig · Accepted Answer · 2026-02-28 14:11:49Z

The role of model assumptions on statistics is often taught and communicated in a misleading way. No real data are ever normally distributed, so if it were the case that for applying ANOVA and other methods data would really have to be from a normal distribution, these methods could never be applied.

In fact the p-values that you get out of an ANOVA are computed under the normality assumption assuming the null hypothesis. What these tell you is whether, regarding potential differences in means, data are compatible with a model with homogeneous variances (homoscedasticity) and equal means, i.e. whether the data give you a strong indication that this is not the case. In case p-values are insignificant, it means you don't have indication against the model with equal means in the data, but it doesn't mean means are really equal, and neither that the homoscedastic normal model is realistic.

This means that the test can absolutely be computed and interpreted for data for which you don't know whether the data are normally distributed or homoscedastic, or even where you know (realistically) that this is not the case. If you don't find significance, you don't have evidence against the null model, but this doesn't mean the null model is true anyway. No interpretation problem here. If you find significance, the null model looks wrong. At this point you may be interested in what exactly is wrong with the null model. If indeed data were normally distributed and homoscedastic, what is wrong with the null model is that means are apparently not equal. However you don't know this, and you may wonder whether you get significance for deviations from the null model other than unequal means. At this point I'd just look at the data and see what they look like, i.e., whether rejection was apparently caused by differences in means rather than deviations from normality or homoscedasticity. Note that the latter is not likely, as ANOVA is quite robust, meaning that it is hard to reject even with deviations from normality/heteroscedasticity if underlying means are in fact equal. But in any case you can say that (a) the equal means null hypothesis was rejected and (b) that the data clearly seem to indicate that means are indeed different (if that is how the data look like), be they normal/homoscedastic or not. So I don't see a big problem with straight ANOVA here.

There are sometimes reasons to use methods that have weaker assumptions such as Kruskal-Wallis or Welch's ANOVA, particularly because the power of the ANOVA may become smaller with, e.g., outliers or some forms of heteroscedasticity. With three observations per time point however I don't see much if any advantages in using these unless you have some extreme outliers.

The thing is that Kruskal-Wallis reduces information (from raw data to ranks), which in itself loses some power, and this is not a good idea if your information is already very weak because of the small sample size. Welch ANOVA probably won't do any harm, but also has somewhat less power in some situations because weaker assumptions mean that the information going into computation of the p-values is weaker; it may help a bit in some situations (where we are not close to the standard ANOVA model assumptions) and may do a bit of damage in some others, and chances are it's hard to figure out with this small amount of data whether we are in the first or the second situation (it may also be that it agrees with standard ANOVA often or always in your situation).

That said, of course the weak amount of information that you have will be a problem with whatever you do. There is nothing in statistics that can extract something strong from weak data.

Stack Exchange Network

Appropriate approach for checking concentration change for 80 substances with n = 3 for three time points

3 Answers 3

Hot Network Questions

Appropriate approach for checking concentration change for 80 substances with n = 3 for three time points

3 Answers 3

Related

Hot Network Questions