0
$\begingroup$

In the vignette for the lspline package in R it says that the package computes

Linear splines with convenient parametrisations such that:

  • coefficients are slopes of consecutive segments
  • coefficients capture slope change at consecutive knots

This is straightforward. However, the regression coefficients for both options differ from the results of the bs function in splines despite setting degree = 1.

Spline function OR 95% CI p-value
Lspline (consecutive) 0.201
xlin_con1 0.17 0.02, 0.93
xlin_con2 1.50 0.79, 2.88
xlin_con3 1.13 0.48, 2.65
xlin_con4 0.88 0.35, 2.16
xlin_con5 0.91 0.47, 1.77
xlin_con6 0.47 0.14, 1.42
Lspline (marginal) 0.201
xlin_mar1 0.17 0.02, 0.93
xlin_mar2 8.74 1.07, 88.7
xlin_mar3 0.75 0.19, 2.89
xlin_mar4 0.78 0.16, 3.81
xlin_mar5 1.04 0.26, 4.19
xlin_mar6 0.52 0.10, 2.46
Bspline 0.201
xbsplin1 0.14 0.02, 0.92
xbsplin2 0.22 0.03, 1.13
xbsplin3 0.23 0.03, 1.28
xbsplin4 0.22 0.03, 1.17
xbsplin5 0.20 0.03, 1.12
xbsplin6 0.05 0.00, 0.58

What does bs convey, and which function is preferable for a regression table?

Reproducible example

#Packages used
library(lspline)
library(splines)
library(gtsummary)

#Generate data
set.seed(123)
y <- sample(0:1, 1000, replace = T)
x <- rnorm(1000)

#Spline functions (k = 5, uniform from p5 to p95)
xlin_con <- lspline::qlspline(x, q = seq(0.05, 0.95, length.out = 5))
xlin_mar <- lspline::qlspline(x, q = seq(0.05, 0.95, length.out = 5), marginal = T)
xbsplin <- splines::bs(x, knots = quantile(x, probs = seq(0.05, 0.95, length.out = 5)),
                       degree = 1)

#All knots are in the same place
identical(attr(xlin_con, "knots"), attr(xlin_mar, "knots"), attr(xbsplin, "knots"))
## [1] TRUE

#Fit splines
mod1 <- glm(y ~  xlin_con, family = binomial())
mod2 <- glm(y ~ xlin_mar, family = binomial())
mod3 <- glm(y ~ xbsplin, family = binomial())

#All models give same predictions
all.equal(fitted(mod1), fitted(mod2), fitted(mod3))
##[1] TRUE

#Table
print(as_kable(tbl_stack(list(tbl_regression(mod1, exponentiate = T,
               pvalue_fun = ~style_pvalue(.x, digits = 3),
               label = list(xlin_con ~ "Lspline (consecutive)"))%>%
  add_global_p(quiet = T) %>%
    bold_labels() %>%
    modify_header(label = "**Spline function**"),
tbl_regression(mod2, exponentiate = T,
               pvalue_fun = ~style_pvalue(.x, digits = 3),
               label = list(xlin_mar ~ "Lspline (marginal)")) %>%
  add_global_p(quiet = T) %>%
  bold_labels(),
tbl_regression(mod3, exponentiate = T,
               pvalue_fun = ~style_pvalue(.x, digits = 3),
               label = list(xbsplin ~ "Bspline")) %>%
  add_global_p(quiet = T) %>%
  bold_labels()), quiet = T)))  
$\endgroup$
7
  • 2
    $\begingroup$ Coefficients are irrelevant, because different bases exist even for the same spline. What matters is the fit. $\endgroup$ Commented May 6, 2022 at 20:36
  • $\begingroup$ @whuber but isn't the fit equivalent across the three options in the provided example? $\endgroup$ Commented May 7, 2022 at 11:07
  • $\begingroup$ Frankly, I don't care to read through your code to try to figure out the answer to that question. The three equal p-values suggest your three splines are yielding identical fits and that's all that matters. Your underlying question concerns which form of a spline might be "better." That answer depends on why you are using splines and how you are specifying their characteristics (such as numbers of knots). $\endgroup$ Commented May 7, 2022 at 13:33
  • $\begingroup$ No problem @whuber. I am using splines to relax the linearity assumption in logistic regression for a survey analysis on healthcare utilisation data. K = 5 placed uniformly from the 5th to 95th percentile. I decided on linear splines vs e.g., cubic splines for easier interpretation. $\endgroup$ Commented May 7, 2022 at 13:57
  • 1
    $\begingroup$ Not a stupid question. (B) is a linear spline. It's not necessarily worse than (A)--that depends on the nature of the response and your objectives. For instance, sometimes I choose such crude-looking representations to impress on the audience that there's substantial uncertainty in the fit. $\endgroup$ Commented May 12, 2022 at 11:52

1 Answer 1

1
$\begingroup$

Updating based on reply from Thomas Lumley. lsplines and bs have different parameterisations, so they will not yield the same answer. As to which one is preferred, it depends on what you are trying to achieve (as whuber said). Personally, if I am interpreting the coefficients I would prefer lsplines (but first I would check the model's fit). If I am tying to fit a splines model to estimate a nuisance function then I would opt for bs.

$\endgroup$
1
  • 2
    $\begingroup$ That's not it: neither of the two parameterisations from lsplines is the b-spline basis $\endgroup$ Commented Sep 3, 2024 at 1:04

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.