Spearman2's rho or Chatterjee's xi correlation coefficient for non-monotonic data?

Question

Let's assume the non-monotonic data below (right graph and data from here).

I would like to test if the two variables x and y are correlated or at least not independent, given the non-monotonic regression.
Based on this exchange, it seems that the function Hmisc::spearman2() or XICOR::xicor() (Chatterjee's xi correlation coefficient) could be appropriate.

However, when using them, I encounter two problems:

The first yields a non-significant -but fixed- p-value (rho2=0.01; p=0.3106).
The second yields a p-value that is generally significant, but highly variable when the test is rerun multiple times (sometimes it is non-significant).

My (naive) questions are:

Which test should I use? The first one? The second one? Another one?
If the xi correlation coefficient is more appropriate, is it feasible and acceptable to perform multiple reruns (e.g., using replicate?) and calculate the mean +/- SD for xi and p-value?

Code:

library(dplyr)
library(ggplot2)
library(mgcv)
library(gratia)
library(Hmisc)
library(XICOR) # for xicor() function also available on        
               # https://github.com/cran/XICOR/blob/master/R/xicor.R

# gam
g0 <- gam(y ~ s(x, bs="cr", k=5), data = dat0, method = "REML", 
              family = gaussian(link="identity"))
summary(g0) 
sm0 <-
  smooth_estimates(g0) 
draw(sm0) +
  geom_point(aes(x = x, y = y), data = dat0, cex = 1.5, 
                 colour = "steelblue3") +
  geom_line(aes(x = x, y = .estimate), lwd = 1) 

# correlation tests
spearman2(y ~ x, data = dat0) # fixed values of rho2 and p-value
xicor(x=dat0$x, y=dat0$y, pvalue=TRUE, ties=TRUE, nperm=1000) 
    # rerun many times: variable values of xi and p-values

Data:

dat0 <-
structure(list(x = c(0.63, 0.54, 0.32, 0.28, 0.69, 0.93, 0.94, 
0.81, 0.99, 0.45, 0.69, 0.99, 0.37, 0.13, 0.25, 0.53, 0.77, 0.28, 
0.18, 0.71, 0.52, 0.94, 0.64, 0.91, 0.2, 0.34, 0.5, 0.9, 0.31, 
0.79, 0.86, 0.17, 0.41, 0.75, 0.62, 0.75, 0.76, 0.03, 0.73, 0.36, 
0.76, 0.94, 0.07, 0.43, 0.73, 0.92, 0.85, 0.55, 0.85, 0.35, 0.72, 
0.67, 0.52, 0.5, 0, 0.26, 0.7, 0.15, 0.58, 0.81, 0.49, 0.96, 
0.4, 0.15, 0.76, 0.4, 0.2, 0.93, 0.48, 0.58, 0.68, 0.17, 0.7, 
0.24, 0.33, 0.82, 0.93, 0.38, 0.48, 0.07, 0.21, 0.41, 0.53, 0.93, 
0.25, 0.13, 0.39, 0.74, 0.37, 0.35, 0.7, 0.14, 0.77, 0.36, 0.14, 
0.61, 0.19, 0.48, 0.12, 0.35), y = c(0.04, -0.09, -0.06, -0.06, 
0.14, 0.29, 0.06, 0.17, 0.06, -0.06, -0.09, 0.14, -0.14, 0.08, 
-0.03, -0.11, -0.01, 0.23, 0.08, -0.01, -0.16, 0.15, 0.02, 0.22, 
0.09, 0.14, 0.11, 0.14, 0.01, 0.24, 0.03, -0.07, 0.1, -0.13, 
0.1, 0.09, 0, 0.31, -0.04, 0.15, 0, 0.35, 0.35, -0.09, 0.16, 
0.23, 0.02, 0.01, 0.07, -0.16, 0.13, 0.2, -0.03, 0.02, 0.36, 
0.17, -0.03, 0.06, 0.07, 0.26, -0.04, 0.24, -0.04, -0.02, -0.07, 
0.21, -0.02, 0.2, 0, 0.03, 0.13, 0.06, -0.02, 0.15, -0.07, 0.18, 
0.27, 0.19, 0.03, 0.13, 0, 0.19, 0.04, 0.02, 0.13, 0.06, 0.11, 
0.06, -0.1, -0.11, -0.04, -0.06, 0.1, 0.06, 0.33, -0.08, 0.09, 
0.13, 0.3, 0.08)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-100L))

Do you use "correlated" in the standard sense of "arising from a bivariate distributiion with nonzero [Pearson] correlation coefficient," or in some other sense? Could you clarify what you mean by "independent...given the non-monotonic regression"? That sounds like some form of conditional independence, but is that what you intend? And, despite the labels "s(x)" and "partial effect" on your plot, isn't the vertical axis "y" itself? — whuber
– whuber ♦, Commented 12 hours ago
My short answer is neither. as neither kind of correlation is tied to the kind of curve you've fitted. But I would be happy myself to start with the correlation between observed data points and predicted data values as implied by the fit. As always, there are reservations, as for example that a curve which interpolated the data points would yield a correlation of 1. (Chatterjee's correlation is to me an interesting idea with too many limitations to join the classic measures.) — Nick Cox
– Nick Cox, Commented 12 hours ago
@whuber: Yes, "correlated" in the common sense. Put another way: is there a significant link between these two variables? And yes, you're right, "s(x)" represents "y", thank you. — denis
– denis, Commented 12 hours ago
"Significant" in a statistical context usually means discernible, a quality that is often tested by adopting a probability model for the data. I take it that (a) there is a visually evident link between $x$ and $y$ but (b) you wish to make a formal test of that. Thus, it looks like you need to tell us what kinds of probability models you have in mind for the possible links. You have rejected a standard non-parametric method (Spearman) because it doesn't comport with what we can see--it's not sufficiently powerful. You can increase power only by making stronger assumptions. What are they? — whuber
– whuber ♦, Commented 11 hours ago
Sorry. but I can't follow what you want here. Is no relationship at all really your benchmark? When I see your data and fit, I just think of other ways to model the relationship and whether they differ and which summary I prefer on any number of grounds. More positively, the best check on a fitted relationship is whether other kinds of fit tell similar stories. Wanting to reduce the problem to a significance test raises more questions than it answers. — Nick Cox
– Nick Cox, Commented 10 hours ago

Glen_b · Accepted Answer · 2026-04-01 00:28:07Z

I see Nick Cox has posted an answer; I haven't looked at it closely yet, but it likely already addresses many of my points, but hopefully there's some additional useful points here or there in my answer.

General comments:

- While pedagogically it can be initially useful to begin by approaching these things as if the data were generic (relationship between some "y" and some "x") without very much consideration for what is being measured and why, and not a lot of thought about how the variables should be expected to behave, in practice such considerations are often important or even essential.

- Noticing something interesting or weird and saying "wow, what are the chances I'd see this particular weird thing?!?" is a very common issue (in various guises). Such a calculation has to take account of the impact of that data-based choice o question; if your calculations treat the question as if it was pre-specified, it will understate those chances. This may be relevant here, and I'll address it in a couple of places.

Some general points about non-linear association

Monotonic association

Spearman's correlation measures monotonic association. There are a variety of other monotonic-association measures.

If (absent noise around the relationship) more than one "x" can produce the same "y" (such as $E(Y|X=x_1)=E(Y|X=x_2)$ for $x_1\neq x_2$, though we don't have to be talking about expectation, necessarily), then the association isn't monotonic in the sense it tries to pick up[1].

If the relationship is not monotonic, it only picks up the "monotonic" part and effectively treats the remaining association as if it were noise. Since the relationship of interest is not even close to monotonic, with the data values being almost symmetric about the minimum, it's not going to result in a useful description; it simply cannot capture the obvious relationship.

Functional association

For general functional association, where $y = f(x) + \text{noise}$, say, and $f$ needn't be monotonic (at least over the values of $x$ that are of interest; there can be cases where you're interested in an interval of $x$ in which a relationship will be monotonic even when it's not monotonic outside that), you would seek some way of measuring such potentially non-monotonic association.

Chatterjee's correlation does try to do that, but it is in a sense more general than the apparent case here (where it appears that relative increments in $x$ and $y$ carry meaning, as opposed to meaning devolving to simple $>$ relations).

I think you are probably better off with a measure of association based on a simple model. What kind of model depends on what sort of function might make sense in context. If you are considering some kind of inference or prediction, you'd generally base the choice on prior understanding (perhaps based on simple considerations of the kind of things being measured, theory, expert knowledge, previous data, etc) combined with consideration of the purpose of the analysis. If you were to know nothing whatever of the variables (a strange circumstance), you might use some form of data splitting to avoid the impact of data leakage on inference.

If he intent of the study is purely descriptive and you are simply trying to summarize what you see, without a view to subsequent inference or prediction - no p-values, no calculation of standard errors, etc (unless you somehow account for data-based model choice) - this may not be important. However, very often one soon leads to the other.

If you expect a quadratic relationship, that can be done (and even if approximate may be an adequate prediction), or theory might suggest some other form, which again can be fitted. In the absence of a specific functional choice, for many contexts a general, smooth functional fit will be a reasonable descriptive option, just as you have done in your question. In that case there are a number of ways to obtain a more-or-less smooth fit to data, such as some form of splines, or local linear regression for example.

If the noise about the relationship is reasonably near to additive, homoskedastic, and not very skew or heavy-tailed, fitting such a functional approximation via least squares makes sense.

In that case, a pretty natural measure of correlation is available - the ordinary Pearson correlation between $y$ and the functional/smooth fit ($\hat{f}(x)$). Typically a smoother will have some parameter that determines how it trades off between smoothness and closeness of fit, and if (as is often the case) that is chosen with reference to the data, then the above caveats about using inference that assumes the function to be prespecified apply.

Naturally a greater ability to follow the data (more "model degrees of freedom") will produce higher measure of correlation; there are some ways to adjust for this.

If you are in a situation where ordinary least squares doesn't make much sense, you can opt for either more tailored-to-situation loss function or a more robust loss function (and a correspondingly tailored or robust measure of correlation with the fit).

In this case an ordinary quadratic fit produces a correlation between $y$ and $\hat{y}$ of about $0.53$ (it's just the square root of $R^2$) and your selected additive model fit also has a correlation of about $0.53$. While it is possible to obtain a p-value for either fit, the above caution about data-based model selection again apply. There are some inferential procedures that attempt to deal with such data-based choices but in my mind you're generally better to reconsider what it's being done for; there may well be a more sensible choice to be made.

With a fit like the quadratic one you can test against a null model in standard fashion and that is a test of that specified form of functional correlation. It's not that hard to test against a smooth model (accounting for the df of the smooth itself is simple enough) if either the smoothing parameter(s) are prespecified or the calculation incorporates the effect of using the data to choose it (in effect, the freedom to select the amount of smoothing based on what you find in the data has "modeller degrees of freedom" of its own).

More general association than smooth functions

Variables can certainly be dependent in ways that are not necessarily smooth or even continuous, in which case more general kinds of $f$ might be considered - though choice of form of relationship can become important, since the most general case tends to preclude simple notions of 'fit' unless you have replication at the values of $x$ (if you replicate every $x$, you can fit each of them without reference to nearby values). A piecewise-smooth fit allowing a fairly limited number of jumps can work without needing replication.

More generally still, the relationship might not be of a functional form at all; as a simple example, consider a scatter of points near to a circle or perhaps an "S" shape. There's certainly dependence (if you identify a particular "x" we can find a limited number of intervals where you would tend to find y's). There are a few ways to get measures of association for such general contexts, but this doesn't appear to be needed here.

More specific comments

Which test should I use?

It really depends on the purpose of the analysis. If, as it seems, your aim is merely to claim statistical "significance" I urge you to think more carefully about what the point of having such significance would be (what is this "significance" being used for?).

It's important to keep in mind that statistical significance is not at all the same as ordinary-English significance - it doesn't necessarily imply, useful, important, relevant etc.

Very often a test is not answering a useful question; "how much" is a much more informative question to address than "is this different from zero?". Typically the answer should be "well, it won't quite be exactly zero, though it might well be very small" In such a circumstance, failing to reject a point-null is merely a type II error. If you want to avoid picking up trivial association even in large samples, you are simply answering the wrong question with a point null. This is an extremely common problem. If you ever see someone complaining that a test is "too powerful", they're almost certainly making this mistake.

In such an instance a confidence interval around an estimated fit may be considerably more valuable than merely stating the otherwise obvious (that it wasn't exactly zero). If a test really is relevant, and you can distinguish between an effect too trivial to matter and one that is at least minimally consequential, you can do something with that (e.g. perhaps some form of equivalence test).

The Chatterjee correlation certainly isn't wrong, it more or less does what it says on the tin (albeit not being especially powerful), but I don't know that it's particularly helpful here.

A plot on a suitable scale will typically be considerably more informative, and with a reasonably large data set like this, obvious visual functional association such as you have in the example here will be "significant" in that sense without any need to test. If the data set is not so small that moving a point or two about would substantively change the overall impression, it's typically large enough that what you see would be statistically significant if you had measured that kind of association (chosen prior to looking at it)

If the xi correlation coefficient is more appropriate, is it feasible and acceptable to perform multiple reruns (e.g., using replicate?) and calculate the mean +/- SD for xi and p-value?

For the resampled estimates of the standard error of coefficient and/or its p-value? You can compute a standard error of the p-value directly (if the resampled estimates are independent of each other, as in a permutation test for example, the estimated p-value is a binomial proportion, and the calculation is simple enough).

I wouldn't use a simulation/resampling count of $1000$[2]. Imagine you're using $\alpha=0.05$ and it happens that your observed p-value is anywhere near that. Then two standard errors around your estimate of the p-value is considerably more than $0.01$, so you might easily see $\hat{p}=0.039$ when the actual $p$ for that test should have $0.055$. Someone else looking at your data who checks more carefully would rightfully mock your claim of "significance". Naturally if p is extremely small or extremely large you might not need so many, but I like to be able to quote my p-values to a couple of significant figures and if they happen to be - say - around 1% that will mean you could need near to 50,000 samples. Where feasible I tend to use 100,000 or more so that I avoid claiming more than I can justify.

If you do want a standard error for your standard error and you're using the asymptotic calculation, you should be able to get a direct estimate of it (though I haven't looked into doing it). Of course you can break your number of simulations into multiple subsets and use that information to estimate the variability in the standard error for the entire set.

[1]: More precisely, it won't be strictly monotonic, but the monotonic correlation measures are no good at picking up flat relationships, which are still monotonic but actually in your null. I could make the parenthetic explanation more precise - placing an intervening x-value with a different "expected" y - but I didn't want to obscure the main point.

[2]: (though I might start there to get a sense of whether I would only go to say 10000 or might need 100000, or more)

Nick Cox · Accepted Answer · 2026-03-31 18:37:08Z

1

Thanks for posting the data.

Just by way of conversation, here are the results of a plain quadratic regression and a local polynomial regression with -- quite arbitrarily -- biweight kernel and bandwidth 0.2.

I will conjecture that most flavours of spline or fractional polynomials would tell similar stories.

Facetiously phrased, but seriously meant, the idea is as Richard Levins wrote in another context: truth is the intersection of independent lies.

answered 9 hours ago

Nick Cox

63.1k8 gold badges151 silver badges240 bronze badges

$\begingroup$ There is a legitimate question of whether, and to what degree, these curvilinear fits improve on a model of independence. Now that you have established (at least visually) that the residuals are nicely behaved--symmetric, short-tailed, no outliers--why not formally test your regression against the intercept-only null? $\endgroup$

whuber
– whuber ♦

2026-03-31 19:06:17 +00:00
Commented 9 hours ago
$\begingroup$ @whuber, what is the best tool/function to test this? $\endgroup$

denis
– denis

2026-03-31 20:49:00 +00:00
Commented 7 hours ago
$\begingroup$ For these data, the standard linear regression F test would work fine. If you want a nonparametric solution (requiring fewer assumptions), a permutation test inspired by the standard test would be excellent. $\endgroup$

whuber
– whuber ♦

2026-03-31 21:15:33 +00:00
Commented 7 hours ago
$\begingroup$ @whuber I am happy to regard these tests as suggestions for the OP. My point is only to guess that different techniques will confirm the general shape. $\endgroup$

Nick Cox
– Nick Cox

2026-03-31 22:00:57 +00:00
Commented 6 hours ago
$\begingroup$ Well, sure--but that doesn't imply that the "general shape" is an indicator of non-independence. The issue is whether this shape might have arisen through chance by sampling a pair of independent variables. $\endgroup$

whuber
– whuber ♦

2026-03-31 22:05:31 +00:00
Commented 6 hours ago

Add a comment |

Stack Exchange Network

Spearman2's rho or Chatterjee's xi correlation coefficient for non-monotonic data?

2 Answers 2

Some general points about non-linear association

More specific comments

Linked

Hot Network Questions

Spearman2's rho or Chatterjee's xi correlation coefficient for non-monotonic data?

2 Answers 2

Some general points about non-linear association

More specific comments

Linked

Related

Hot Network Questions