I have been trying to understand a bit better polynomial contrasts using this resource/example : https://library.virginia.edu/data/articles/understanding-ordered-factors-in-a-linear-model I know this is for linear model but I was trying to follow the same steps/logic using Cox regressions for time-to-event outcomes. My understanding from the example on the webpage above (please correct me if I am wrong) is that for linear models, using an ordered factor is equivalent to using a polynomial of degree # of categories -1 and that both approaches lead to the exact same p-values despite different coefficients/SE (that I assume have to do with different design matrices when using poly() vs ordered()).
I have been trying to apply this to an example of Cox regression, and in that case, the two approaches do not seem to be equivalent. Here is a reproducible example:
library(survival)
data(lung)
lung <- lung[complete.cases(lung),]
lung$surv_obj <- with(lung, Surv(time, status == 2))
lung$ph.ecog_o <- ordered(lung$ph.ecog)
cox1 <- coxph(surv_obj~ph.ecog_o,data=lung)
cox2 <- coxph(surv_obj~poly(unclass(ph.ecog_o), 3),data=lung)
I am not able to insert images, but the results are for cox1:
Call:
coxph(formula = surv_obj ~ ph.ecog_o, data = lung)
n= 167, number of events= 120
coef exp(coef) se(coef) z Pr(>|z|)
ph.ecog_o.L 1.57386 4.82525 0.69609 2.261 0.0238 *
ph.ecog_o.Q 0.48669 1.62692 0.52681 0.924 0.3556
ph.ecog_o.C 0.08127 1.08467 0.27246 0.298 0.7655
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
exp(coef) exp(-coef) lower .95 upper .95
ph.ecog_o.L 4.825 0.2072 1.2331 18.881
ph.ecog_o.Q 1.627 0.6147 0.5794 4.569
ph.ecog_o.C 1.085 0.9219 0.6359 1.850
Concordance= 0.615 (se = 0.028 )
Likelihood ratio test= 13.65 on 3 df, p=0.003
Wald test = 15.48 on 3 df, p=0.001
Score (logrank) test = 16.99 on 3 df, p=7e-04
While for cox2:
Call:
coxph(formula = surv_obj ~ poly(unclass(ph.ecog_o), 3), data = lung)
n= 167, number of events= 120
coef exp(coef) se(coef) z Pr(>|z|)
poly(unclass(ph.ecog_o), 3)1 4.2891 72.9001 1.2037 3.563 0.000366 ***
poly(unclass(ph.ecog_o), 3)2 1.2843 3.6122 1.1649 1.103 0.270224
poly(unclass(ph.ecog_o), 3)3 0.3106 1.3643 1.0413 0.298 0.765477
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
exp(coef) exp(-coef) lower .95 upper .95
poly(unclass(ph.ecog_o), 3)1 72.900 0.01372 6.8886 771.48
poly(unclass(ph.ecog_o), 3)2 3.612 0.27684 0.3683 35.42
poly(unclass(ph.ecog_o), 3)3 1.364 0.73299 0.1772 10.50
Concordance= 0.619 (se = 0.028 )
Likelihood ratio test= 13.65 on 3 df, p=0.003
Wald test = 15.48 on 3 df, p=0.001
Score (logrank) test = 16.99 on 3 df, p=7e-04
What leads to the discordances in p-values in that context that is not the case in LM? Let's say one would want to test linear or quadratic trends in ordered categorical variables for which we do not have the numerical raw data, which of the 2 models would be more accurate and why?
Thanks in advance