Let's assume the non-monotonic data below (right graph and data from here).
I would like to test if the two variables x and y are correlated or at least not independent, given the non-monotonic regression.
Based on this exchange, it seems that the function Hmisc::spearman2() or XICOR::xicor() (Chatterjee's xi correlation coefficient) could be appropriate.
However, when using them, I encounter two problems:
- The first yields a non-significant -but fixed- p-value (rho2=0.01; p=0.3106).
- The second yields a p-value that is generally significant, but highly variable when the test is rerun multiple times (sometimes it is non-significant).
My (naive) questions are:
- Which test should I use? The first one? The second one? Another one?
- If the xi correlation coefficient is more appropriate, is it feasible and acceptable to perform multiple reruns (e.g., using replicate?) and calculate the mean +/- SD for xi and p-value?
Code:
library(dplyr)
library(ggplot2)
library(mgcv)
library(gratia)
library(Hmisc)
library(XICOR) # for xicor() function also available on
# https://github.com/cran/XICOR/blob/master/R/xicor.R
# gam
g0 <- gam(y ~ s(x, bs="cr", k=5), data = dat0, method = "REML",
family = gaussian(link="identity"))
summary(g0)
sm0 <-
smooth_estimates(g0)
draw(sm0) +
geom_point(aes(x = x, y = y), data = dat0, cex = 1.5,
colour = "steelblue3") +
geom_line(aes(x = x, y = .estimate), lwd = 1)
# correlation tests
spearman2(y ~ x, data = dat0) # fixed values of rho2 and p-value
xicor(x=dat0$x, y=dat0$y, pvalue=TRUE, ties=TRUE, nperm=1000)
# rerun many times: variable values of xi and p-values
Data:
dat0 <-
structure(list(x = c(0.63, 0.54, 0.32, 0.28, 0.69, 0.93, 0.94,
0.81, 0.99, 0.45, 0.69, 0.99, 0.37, 0.13, 0.25, 0.53, 0.77, 0.28,
0.18, 0.71, 0.52, 0.94, 0.64, 0.91, 0.2, 0.34, 0.5, 0.9, 0.31,
0.79, 0.86, 0.17, 0.41, 0.75, 0.62, 0.75, 0.76, 0.03, 0.73, 0.36,
0.76, 0.94, 0.07, 0.43, 0.73, 0.92, 0.85, 0.55, 0.85, 0.35, 0.72,
0.67, 0.52, 0.5, 0, 0.26, 0.7, 0.15, 0.58, 0.81, 0.49, 0.96,
0.4, 0.15, 0.76, 0.4, 0.2, 0.93, 0.48, 0.58, 0.68, 0.17, 0.7,
0.24, 0.33, 0.82, 0.93, 0.38, 0.48, 0.07, 0.21, 0.41, 0.53, 0.93,
0.25, 0.13, 0.39, 0.74, 0.37, 0.35, 0.7, 0.14, 0.77, 0.36, 0.14,
0.61, 0.19, 0.48, 0.12, 0.35), y = c(0.04, -0.09, -0.06, -0.06,
0.14, 0.29, 0.06, 0.17, 0.06, -0.06, -0.09, 0.14, -0.14, 0.08,
-0.03, -0.11, -0.01, 0.23, 0.08, -0.01, -0.16, 0.15, 0.02, 0.22,
0.09, 0.14, 0.11, 0.14, 0.01, 0.24, 0.03, -0.07, 0.1, -0.13,
0.1, 0.09, 0, 0.31, -0.04, 0.15, 0, 0.35, 0.35, -0.09, 0.16,
0.23, 0.02, 0.01, 0.07, -0.16, 0.13, 0.2, -0.03, 0.02, 0.36,
0.17, -0.03, 0.06, 0.07, 0.26, -0.04, 0.24, -0.04, -0.02, -0.07,
0.21, -0.02, 0.2, 0, 0.03, 0.13, 0.06, -0.02, 0.15, -0.07, 0.18,
0.27, 0.19, 0.03, 0.13, 0, 0.19, 0.04, 0.02, 0.13, 0.06, 0.11,
0.06, -0.1, -0.11, -0.04, -0.06, 0.1, 0.06, 0.33, -0.08, 0.09,
0.13, 0.3, 0.08)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-100L))

