Can I use raw data for mixed effects model when some subjects have more observations than others?

Question

I am analysing a repeated measures experiment with four conditions. I had 13 participants under each of the four conditions, and I was measuring the number of "clicks" they gave in each session, as well as the duration of each click. My dependent variable is the "inter-click time", i.e. the time between the end of a click until the start of the next click. I first tried to fit a mixed effects model on the log mean inter-click time, with a random intercept of subject id:

mx_av = lmer(logMeanInterclicktime ~ condition + (1|subject), data= averagedData)

However, each condition also had a different effect on each participant, so I think a random slope of subject is also necessary, for which the averaged data doesn't have enough observations. Hence, I tried fitting the raw data:

mx_raw = lmer(logInterclicktime ~ condition + (1+condition|subject), data= allData)

However, since there was no restriction in the number of clicks subjects could give, some have many data points and others a lot less, so I am not sure if this latter approach is correct? The first model has only one condition with a significant effect, while the second one gives two conditions with significant effects, and higher estimates for those two conditions as well. The raw data has many observations as well (30,861!), which has made it difficult to calculate things like df, boot confidence intervals, etc.

I just wanted to check if the second approach is correct? I would like each subject's data to be weighted equally, rather than each data point to be weighted equally.

Happy to provide the data if that would be helpful. Thanks in advance!

How did you construct the averagedData? What is the level of aggregation in that data? Conditions such that each participant has 4 observations corresponding to average clicks under that condition? Or something else? — Erik Ruzek
– Erik Ruzek, Commented May 19, 2021 at 20:11
@ErikRuzek Yes, the averagedData was created by finding the mean inter-click time for each participant and each experimental condition, so one subject has four mean Interclick times corresponding to the four experimental conditions. — BlueBird
– BlueBird, Commented May 19, 2021 at 20:45
How many random slopes does lmer report for your second model, mx_raw? — Erik Ruzek
– Erik Ruzek, Commented May 20, 2021 at 18:32
@ErikRuzek it reports 13 intercepts (for 13 subjects) and 39 slopes (13 subjects * 3 conditions) — BlueBird
– BlueBird, Commented May 20, 2021 at 19:28
Got it. And does the model with the random condition slopes fit the data better than a model with fixed condition slopes only? You can use anova(mx_raw, mx_raw2) for the likelihood ratio test comparing the two models, AIC, and BIC. — Erik Ruzek
– Erik Ruzek, Commented May 20, 2021 at 20:28

Erik Ruzek · Accepted Answer · 2021-05-21 15:12:00Z

It is almost always better to use all of your data when possible. Mixed effects models are designed to appropriately deal with unequal sample sizes across groups. They are able to handle those in both the fixed and random parts of the model. This is viewed as great advantage of these models in terms of predicting a cluster's outcome and their slope value (i.e., the 13 intercepts and 39 slopes you mentioned in your comment). These are called empirical Bayes predictions because they use the overall sample-estimated grand mean and variance as a prior for an individual cluster's mean. In effect, this weights a cluster's prediction by the amount of information a subject provides. The less information they provide, the more their prediction is pulled toward the group mean. For a bit more information on this, see here.

Stack Exchange Network

Can I use raw data for mixed effects model when some subjects have more observations than others?

1 Answer 1

Linked

Hot Network Questions

Can I use raw data for mixed effects model when some subjects have more observations than others?

1 Answer 1

Linked

Related

Hot Network Questions