9
$\begingroup$

I am using Ordinal Semiparametric Regression (Frank Harrell's rms package) to model overall survival in patients with brain tumor. I am thinking of centering the Age covariate, because I want Age = 0 to represent the average person, not a newborn. But I am doubtful if I should standardize, because setting the variance to 1 will make the distances between ages change, thus making the interpretation of Age harder (since the units are no longer years). That is why centering only feels more natural to me, instead of standardizing. I am wondering if this is a valid concern.

Standardization:

df$Age_c <- as.numeric(scale(df$Age))

Centering only:

df$Age_c <- as.numeric(scale(df$Age, scale = FALSE))

I would really appreciate your guidance.

$\endgroup$

4 Answers 4

14
$\begingroup$

Any of the options can make sense.

Leaving age as is makes it easiest to interpret. Centering does, indeed, make the average age 0, but that's the average age for your data set. It's similar with standardizing: It makes the year variable act as the sd of year. There are arguments for doing this if you want to compare variables (this has been discussed here, no need to have those debates again) but the sd will be for your data.

I am a fan of leaving variables as they are. To me, that makes it easiest to interpret. "A 68 year old" is easier than "someone who is 3 years older than the mean" and "per year" is easier than "per sd of years, which is 2.35".

But arguments can be and have been made for other choices.

$\endgroup$
1
  • 6
    $\begingroup$ @ÇağanKaplan Note that as this kind of regression model is equivariant against linear transformation of the x-variables, solutions will be mathematically equivalent for any of these options. So interpretation incl. regarding variable comparison can be a concern, but "mathematically right or wrong" isn't. The answer implicitly uses this fact but doesn't say it explicitly, and it could be worthwhile to add that. $\endgroup$ Commented Nov 30, 2025 at 18:15
11
$\begingroup$

All the answers here are very good in general. Just some specific points about centering v scaling v doing nothing.

  1. In a model with an intercept, centering variables can make the intercept and other contrasts in your model more interpretable, but there are many situations in which the interest is only in additive model terms which don't depend on centering. So again, doing nothing is just fine.
  2. Moreover, as others in this thread note, in many regression situations, mathematically, all of the centering / scaling options are equivalent, so the choice is really yours with respect to scaling predictors
  3. The scaling of predictors matters more in certain penalized / regularized regression and classification situations where you might want each coefficient on “equal footing” when it comes to shrinkage.
  4. One final place where centering does make a bigger difference is in multilevel models, where certain tests of coefficients (and hypotheses that go with them) depend on group-mean-centering predictors, especially if an aggregate version of the predictor is used at a different level.
$\endgroup$
10
$\begingroup$

One more consideration is that you can "center"/"standardize" in a way to maintain easier interpretability and not bind your transformation to the specific dataset. E.g. you can make age_std = (age - 40) / 10. Now your model intercept is the value for a 40 year old (interpretable!) and your age coefficient is for a change of 10 years (interpretable and typically gets you nicer numbers as changes per single year tend to be small in real datasets). Obviously you may pick other centers/scaling depending on the specifics of your problem.

As others noted, the model is mathematically equivalent, so when focusing on model predictions for interpretation (e.g. via emmeans) nothing of this matters.

$\endgroup$
1
  • 4
    $\begingroup$ Changing units is often helpful for interpretation. Of course it means exactly the same, but people don't like looking at lots of zeros after the decimal point, or lots of digits before it, either. The only potential issue is making it clear, everywhere, that this is what you have done. I've seen failures to state units in tables, and that can lead to confusion. $\endgroup$ Commented Dec 2, 2025 at 11:14
8
$\begingroup$

I echo Peter's answer. Separate parameterization from interpretation. Interpretation is a post-fitting exercise. For example in the rms package, no matter how the model parameters were set up, one can get point and interval estimates of age=60 vs. age=40 using `contrast(fit, list(age=60), list(age=40)). Importantly, this works when the age effect is modeled nonlinearly and interacts with other factors. Pre-fitting standardization usually requires assumptions of linearity and symmetry. The symmetry assumption is needed when standard deviations are used in the standardization.

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.