cross-lagged panel model (lavaan)

Question

I am modelling the longitudinal relationships between three observed variables:

behavior (binary: 0/1)
affect (continuous)
wellbeing (5-point Likert scale)

The baseline year is 2008, and I have repeated measures every two years until 2018. Here is a version of my lavaan model:

model <- '

  behavior_2010 ~ affect_2008 + wellbeing_2008 + behavior_2008  
  behavior_2012 ~ affect_2010 + wellbeing_2010 + behavior_2010 
  behavior_2014 ~ affect_2012 + wellbeing_2012 + behavior_2012 
  behavior_2016 ~ affect_2014 + wellbeing_2014 + behavior_2014
  behavior_2018 ~ affect_2016 + wellbeing_2016 + behavior_2016
 
  affect_2010 ~ behavior_2008 + wellbeing_2008 + affect_2008
  affect_2012 ~ behavior_2010 + wellbeing_2010 + affect_2010
  affect_2014 ~ behavior_2012 + wellbeing_2012 + affect_2012
  affect_2016 ~ behavior_2014 + wellbeing_2014 + affect_2014
  affect_2018 ~ behavior_2016 + wellbeing_2016 + affect_2016
 
  wellbeing_2010 ~ behavior_2008 + affect_2008 + wellbeing_2008
  wellbeing_2012 ~ behavior_2010 + affect_2010 + wellbeing_2010
  wellbeing_2014 ~ behavior_2012 + affect_2012 + wellbeing_2012
  wellbeing_2016 ~ behavior_2014 + affect_2014 + wellbeing_2014
  wellbeing_2018 ~ behavior_2016 + affect_2016 + wellbeing_2016
  

  behavior_2012 ~~ behavior_2010
  behavior_2014 ~~ behavior_2012
  behavior_2016 ~~ behavior_2014
  behavior_2018 ~~ behavior_2016

  affect_2012 ~~ affect_2010
  affect_2014 ~~ affect_2012
  affect_2016 ~~ affect_2014
  affect_2018 ~~ affect_2016

  wellbeing_2012 ~~ wellbeing_2010
  wellbeing_2014 ~~ wellbeing_2012
  wellbeing_2016 ~~ wellbeing_2014
  wellbeing_2018 ~~ wellbeing_2016
  
  behavior_2010 ~~ wellbeing_2010
  behavior_2012 ~~ wellbeing_2012
  behavior_2014 ~~ wellbeing_2014
  behavior_2016 ~~ wellbeing_2016
 
  behavior_2010 ~~ affect_2010
  behavior_2012 ~~ affect_2012
  behavior_2014 ~~ affect_2014
  behavior_2016 ~~ affect_2016
  behavior_2018 ~~ affect_2018
 
  affect_2010 ~~ wellbeing_2010
  affect_2012 ~~ wellbeing_2012
  affect_2014 ~~ wellbeing_2014
  affect_2016 ~~ wellbeing_2016
  affect_2018 ~~ wellbeing_2018
'

fit <- sem(
  model,
  data = data,
  ordered = ordered_vars,
  estimator = "DWLS",
  parameterization = "theta"   
)

My questions are:

Does this specification make sense for modelling the cross-lagged relationships among these three observed variables?
Both behavior and wellbeing are consiedered ordinal (binary and 5-point scale, respectively). Should I include all waves of these variables in ordered_vars, including the baseline variables behavior_2008 and wellbeing_2008? Or I should should exclude the baseline variables (behavior_2008 and wellbeing_2008) from the ordered argument and treat them as continuous?

Terrence · Accepted Answer · 2025-08-12 12:53:04Z

Does this specification make sense for modelling the cross-lagged relationships among these three observed variables?

Your model will not be identified if you estimate both autoregressions (e.g., behavior_2012 ~ behavior_2010) and autocorrelations (e.g., behavior_2012 ~~ behavior_2010) between the same variables.

These are only Lag-1 coefficients, so if your model fits poorly, you could explore whether/which Lag-2 coefficients are necessary to reproduce the data well.

Both behavior and wellbeing are considered ordinal (binary and 5-point scale, respectively).

Be aware that the regression coefficients will not be comparable across occasions without threshold invariance to link the scales of the latent responses. If you are interested in testing equivalence of coefficients over time (e.g., tests of stability, equilibrium), you can test threshold equivalence for the 5-category variable by assigning the same 4 labels to each occasion's 4 thresholds. But you also need to then free the intercepts and residual variances for subsequent occasions (the default identification constraint of fixing them to 0 and 1 is only needed on the first occasion with equal thresholds).

## For example:
'
## threshold invariance
wellbeing_2010 | wb1*t1 + wb2*t2 + wb3*t3 + wb4*t4
wellbeing_2012 | wb1*t1 + wb2*t2 + wb3*t3 + wb4*t4
...
wellbeing_2018 | wb1*t1 + wb2*t2 + wb3*t3 + wb4*t4

## release unnecessary identification constraints
wellbeing_2012 ~  NA*1
wellbeing_2012 ~~ NA*wellbeing_2012
...
wellbeing_2018 ~  NA*1
wellbeing_2018 ~~ NA*wellbeing_2018
'

But binary variables only have 1 threshold, so equating it cannot yield a test of equivalence. In binary variables, changes in the latent response's level (intercept) and spread (residual or marginal variance) are confounded, so you can't compare regression coefficients over time without assuming one of those latent parameters doesn't change over time (and you get different answers depending which assumption you make).

Should I include all waves of these variables in ordered_vars, including the baseline variables behavior_2008 and wellbeing_2008? Or I should should exclude the baseline variables (behavior_2008 and wellbeing_2008) from the ordered argument and treat them as continuous?

The ordered= argument only applies to endogenous variables, so if you want to treat them as having latent responses, you could make them "endogenous" by having them load (with 0) on a phantom construct:

'
PHANTOM =~ 0*behavior_2008 + 0*affect_2008 + 0*wellbeing_2008
PHANTOM ~~ 1*PHANTOM   # any number, as long as it is fixed
'

Stack Exchange Network

cross-lagged panel model (lavaan)

1 Answer 1

Hot Network Questions

cross-lagged panel model (lavaan)

1 Answer 1

Related

Hot Network Questions