2
$\begingroup$

I’m analyzing longitudinal data with three timepoints:

  • Time 0 = baseline
  • Time 12 = post-treatment
  • Time 24 = follow-up

Because the treatment occurs at Time 12, I’m modeling a potential change in trajectory using a piecewise linear fixed-effects structure with a knot at 12:

time1 <- pmin(time, 12)       
time2 <- pmax(0, time - 12)   

What I want to fit is:

  • Fixed effects: time1 and time2 (two slopes, one per segment)
  • Random effects: only one random linear slope for the overall continuous time variable
  • I do not want (nor can I estimate) separate random slopes for each piece, because I only have three timepoints

Example model:

model_piecewise <- lmer(
  BDI ~ time1 + time2 +
    (1 + time | id),   
  data = dat
)

My questions:

  1. Is this model specification statistically legitimate? I have two fixed slopes (piecewise), but allow subjects to vary only in a single random linear slope (i.e. the overall time of the study, week 0, 12 and 24).

  2. Is there any requirement in mixed-model theory or in lmer() that random slopes must match the fixed-effects structure (i.e., one random slope per piece)?

  3. Given only three measurement occasions, estimating two random slopes (for time1 and time2) is impossible. Is the above model the correct way to include subject-level slope variability without overspecifying the random structure?

  4. Are there papers or examples where piecewise fixed effects are combined with only one random linear slope of overall time?

$\endgroup$

2 Answers 2

1
$\begingroup$

Welcome to CV, @AndroidParanoid! Unfortunately, the model you have posted in your question is not tenable in the mixed effects modeling framework. However, see the edit below for a way to implement it using structural equation modeling (SEM).

When you specify a random slope for a variable that does not also have a fixed effect, then you are fixing the sample mean for that variable to be 0 when it almost certainly is not 0. With your time variable coded as 0, 12, and 24, this would mean that the mean outcome value at the first timepoint is 0. This is usually not accurate and has the effect of inflating the random slope variance. In other words, the only legitimate reason to model a variable as a random slope without a corresponding fixed effect is when you are certain the mean for that variable is 0.

If you parameterized your model in a simpler manner, you would get direct estimates of the difference in the outcome between 1) baseline and post-treatment and 2) baseline and follow-up:

alt_model <- lmer(
  BDI ~ as.factor(time) +
    (1 | id),   
  data = dat
)

You could use something like emmeans() to do the contrast/mean difference between post-treatment and follow-up. I would recode time to 0, 1, and 2 for this model such that a 1 unit change is meaningful - the 1 unit is the difference in means between measurement occasions. One reason not to do this would be if people were assessed at different times and you wanted to take that into account. That would lead you towards a linear growth curve model where you do not separate out the pre and post treatment period. However, this would not address your question about pre to post changes.

Piecewise growth models cannot be estimated with only three data points (edit: in the mixed modeling framework) as you do not have enough information to estimate pre and post slopes. See also here.

Edit: A little more investigation into this issue reveals that if you move from the mixed effects to the SEM framework (requires reshaping the data from long to wide), you can estimate a piecewise growth model with three time points. Work by Nese and colleagues show how this can be specified. It essentially accomplishes what you want to do, with two linear growth segments, but to identify the model you have to fix the item variances to be equal and the covariance between the two slopes to 0. Mplus code is provided in the second Nese and colleagues article, and you should be able to translate this to lavaan() quite easily.

$\endgroup$
5
  • 1
    $\begingroup$ Thanks Erik, Your answer is really helpful! I have indeed considered whether a model with time as factor could be a better fit. But, does adding linear random slopes over fixed time coded as a factor lead to the same problem or OK? alt_model <- lmer( BDI ~ as.factor(time) + (1+time | id), data = dat ) Also, I've seen models who use different splines (e.g. restricted cubic splines, natural splines, etc.) where they include linear random slopes. Do these models suffer from the same problems that you listed above? I would've thought so, but not sure? $\endgroup$ Commented Nov 26, 2025 at 11:15
  • $\begingroup$ No problem, @AndroidPandroid! A spline approach is probably also out of reach, however I added some additional information in an edit that you can estimate something like your original model using SEM. $\endgroup$ Commented Nov 26, 2025 at 15:38
  • 1
    $\begingroup$ Thanks Erik, I think moving on to SEM would be a bit too much for me at the moment. I'm really intrigued by your initial feedback since it seems to clash with what I've heard from others, so I truly appreciate it. If you do not mind, I've a couple of follow up questions: Would your critique still apply if I used restricted cubic splines (assuming I had more data, say around 20 time points and modeled random linear slopes)? And in your categorical model, would I need to model random time as a factor as well or could I use linear random time in that case? Again, assuming I had more data. $\endgroup$ Commented Nov 26, 2025 at 18:44
  • $\begingroup$ Pt.1. Assuming more time points, your options open up considerably. You could model the fixed (mean) effects of time flexibly using piecewise linear splines. With this model, you can allow for a continuous time random slope, such that for a given subject, the same amount is added to the slope of each line segment. For restricted cubic splines, you generally would want to model the spline variables themselves as random slopes (not continuous time). $\endgroup$ Commented Nov 26, 2025 at 20:04
  • $\begingroup$ Pt. 2. With the categorical model, you are essentially reproducing an ANOVA that can handle unbalanced data. You would not include random continuous time for the reasons discussed in my response, but you could look at random categorical time, This will be a tricky estimation problem, so be very careful. See stats.stackexchange.com/questions/78928/… $\endgroup$ Commented Nov 26, 2025 at 20:06
0
$\begingroup$

This should probably be a comment but I do not have enough cred to comment yet and cred is gained by answering...

If the primary reason one can't model time as a continuous random effect from T0-T24 is because it is missing as a fixed effect and would therefore center around 0, why can't it be included as a fixed effect?

Even in a saturated model that uses 3 time points to estimate 3 parameters (intercept, slope 1, and difference in slopes), isn't all the information still there to also model a single slope from T0-T24 as a fixed effect?

Then the random effects line would center around the fixed effect line rather than 0.

While this wouldn't let you make any conclusions about subject level variation at the bend, it would soak up some variance that is consistent across the entire treatment period.

I am working through a very similar question currently (which I have posted about here. All to say that I am not an expert on this topic in any way, so take everything with a grain of salt!

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.