Newest Questions - Cross Validated

1 vote

0 answers

7 views

Is GVIF meaningful for a reduced interaction block created from manually coded factor × treatment terms?

I’m fitting a survival model with a multi-level factor (HistologyClass) and binary treatment variables (Radiotherapy, Chemotherapy). I do not want the full HistologyClass * Treatment interaction, ...

Kavalali

473

asked 10 hours ago

2 votes

1 answer

26 views

Given a dataset of human patients which includes their age, sex, and over 100 measured disease biomarkers, what type of statistical analysis is best?

I was given a dataset of close to 100 human participants. This dataset includes values for 100+ measured hypothesized biomarkers of neurodegeneration, their age, sex, disease state (clinically normal, ...

Justin Mendiola

21

asked 11 hours ago

3 votes

1 answer

27 views

How to handle calendar year as a continuous predictor with a mismatched train/test time horizon?

I am using Ordinal Semiparametric Regression (Frank Harrell's rms package) to model overall survival in patients with brain tumor. My training data is from the SEER database (covering years 2004 to ...

Çağan Kaplan

171

asked 12 hours ago

1 vote

1 answer

22 views

What does the base prediction value actually imply in SHAP?

I know SHAP (or shapley) values are the contribution of each input variable to the model prediction. Adding the base values to the sum of all SHAP values gives you the model prediction for any data ...

lsr729

261

asked 12 hours ago

1 vote

2 answers

53 views

Spearman2's rho or Chatterjee's xi correlation coefficient for non-monotonic data?

Let's assume the non-monotonic data below (right graph and data from here). I would like to test if the two variables x and y are correlated or at least not independent, given the non-monotonic ...

denis

305

asked 15 hours ago

2 votes

0 answers

14 views

Sample size calculation for Paired Data using ordinal models [closed]

I came across this blog post by Prof. Harrell Ordinal Models for Paired Data – Statistical Thinking. I was trying to replicate the simulation in order to estimate the sample size for my paired study. ...

mahmoud hamza

21

asked 21 hours ago

4 votes

1 answer

87 views

Which test do I use to estimate the preference of species?

I have a question about how to analyse my dataset and would really appreciate your advice. My data consist of observations of a set of target plant species collected during field surveys. The surveys ...

fleur

41

asked 22 hours ago

2 votes

0 answers

20 views

Is this train/validation/test split method considered a data leakage?

Training has time steps 0-300. Validation has time steps 200-400. Testing has time steps 300-500. The method uses N time steps as the observed past and the next N time steps as the future. For example,...

cdt123

21

asked 22 hours ago

2 votes

0 answers

26 views

Markov State Transition Models vs the Win Ratio in clinical trials

I have recently become interested in Markov State Transition Models for the analysis of clinical trials with composite endpoints, such as the Markov Longitudinal Ordinal Model described in Frank ...

underflow

355

asked yesterday

5 votes

1 answer

50 views

Exact maximum likelihood for ARMA models

I am trying to understand the unconditional or exact least squares and maximum likelihood estimation methods for ARMA models. I am struggling to reconcile the different formulations given in standard ...

Maverick Meerkat

4,284

asked yesterday

1 vote

0 answers

45 views

Find the Fisher information of Gamma $(\alpha_0, \theta)$. Where the first parameter is known and the second is not, for a sample $(x_1,\dots,x_n)$

Find the Fisher information of Gamma $(\alpha_0, \theta)$. Where the first parameter is known and the second is not, for a sample $(x_1,\dots,x_n)$. Or to state the original question( which I restated ...

math forever

175

asked yesterday

0 votes

0 answers

28 views

Testing the fit of one model on two separate datasets [closed]

I'm trying to compare the shape of two curves (from two datasets) to an expected exponential shape. Specifically, I am interested in where people allocate their attention to during a cognitive task. ...

L Ling

1

asked yesterday

3 votes

2 answers

69 views

Testing performance of numerical algorithm for penalised estimators

Assume $y \in \mathbb{R}^{n}$, $\beta \in \mathbb{R}^{k}$, and $X \mathbb{R}^{n\times k}$ and we are solving the following strictly convex optimisation problem $$ \hat{\beta} = \arg \min ||y - X\beta |...

ABK

708

asked yesterday

7 votes

1 answer

256 views

Survival proportion for censored data

Let's consider the below example as a life table for survival analysis: At time 124, we have censored patinet, lost to follow up. Generally, I have noticed that we don't get from the software ...

AgnieszkaTomczyk

201

asked 2 days ago

2 votes

1 answer

26 views

Best method to determine sub-sensor error

I have a system measuring several outputs by sub-sensors and the total input to the sub-sensors is measured from a main sensor. The above chart shows the error between the sum of sub-sensor readings ...

mpcengineer

21

asked 2 days ago

0 votes

1 answer

43 views

Sample size estimation to compare three group means given a coefficient of variation and a percentage change taking pairwise comparisons into account

I would like to estimate asample size to compare the means of three groups(A,B and C) taking into account three pairwise comparisons (A/B, B/C, A/C) given a coefficient variation and an effect size as ...

Mubita

191

asked Mar 28 at 21:25

2 votes

0 answers

43 views

How is correlation in a 3-dimensional space possible (between 3 variables)? [closed]

I have a dataset with windspeed and stability measurements over 3 years and in 5 different locations (every location represents a turbine). This means I have the variables wsp and Ri (stability) which ...

Gartongschwärt

21

asked Mar 28 at 19:00

8 votes

2 answers

530 views

Significance testing: how to say if a given p-value is "strong" or "weak" evidence against the null?

After conducting a statistical test, say I get a p-value of 0.001 (this is just an example, as I'm learning and don't have actual data). I know that in the Fisherian framework, the smaller a p-value, ...

Cori

81

asked Mar 28 at 15:24

6 votes

0 answers

57 views

What are good references that communicate the idea of 'theory before statistics'?

In my limited experience with data analysis, I realised that there are many decisions an analyst has to make to claim that a given model can be reliably assumed to have generated the data at hand and, ...

medium-dimensional

263

asked Mar 28 at 8:12

0 votes

0 answers

33 views

Confidence intervals on predictions [closed]

I have a model ( clustering) that forecasts revenue. It does so for y1 and 2. For all other years we use an existing model for all customers and use the forecasrs from that xgbRegressor to stick onto ...

Maths12

589

asked Mar 27 at 22:37

4 votes

1 answer

183 views

Power analysis for determining what amount of time to analyze data

I have collected data for a project on animal behavior. My question is, how do I decide what duration of time I should analyze the data at? I'm trying to understand the animal movement in response to ...

anzac21

47

asked Mar 27 at 16:03

0 votes

0 answers

37 views

Problem with data cleaning

The Union of India has undergone frequent political re-organizations since independence. The problem today (for me) is that, I've been unable to account for certain data values of the following states/...

Mithu

1

asked Mar 27 at 14:07

2 votes

0 answers

20 views

Optimal Tournament Design (3v3): Does overlapping "Partners as Opponents" minimize variance in top-tail skill estimation?

I am designing a tournament schedule for $N=61$ players competing in 3v3 matches. The goal is to maximize the accuracy of the final rank for the top $k$ players (specifically seeds 1-4 and 9-12). The ...

Joel

21

asked Mar 27 at 14:00

0 votes

0 answers

23 views

Can I pool significant values after linear mixed model? [closed]

I have a set of data with 5 imputations consisting of two groups of students, one in bachelor's and master's, who both went through a program. I did a linear mixed model analysis to show that both ...

user509303

1

asked Mar 27 at 7:08

5 votes

6 answers

325 views

Updating NHST with power analysis, ASA's p-value interpretation and effect sizes [closed]

I am a statistics teacher in a psychology program. The field of study is important here because, in my country's education system, students who choose psychology often lack a solid foundation in ...

Lil'Lobster

1,684

asked Mar 26 at 18:46

6 votes

2 answers

208 views

Singularity Problem with gls that isn't present in lm

I'm performing an IPD meta-analysis, and need to fit my models with study-specific variances (which is why I need to fit with nlme::gls instead of ...

slammaster

181

asked Mar 26 at 18:12

7 votes

2 answers

275 views

Should I avoid a Dunn's test if my groups have different variability?

Context: I'm trying to understand whether it is sensible to conduct a Kruskal-Wallis then a post-hoc Dunn's test, or a Wilcoxon rank sum on my data. (My understanding is that it isn't appropriate to ...

curiousfox

73

asked Mar 26 at 16:36

7 votes

2 answers

275 views

How to decide if an interaction exists: graphically/interaction terms/contrasts of slopes

I have fit interaction models of the form: phenotype ~ genotype * environment, based on theory. I am assigning environment (GFR, in this case) as the moderator. I have three scenarios: A: Non-parallel ...

Mubita

191

asked Mar 26 at 15:48

3 votes

0 answers

29 views

constrain spline to go through two points

I would like to generate a function using splines using the coefficients (i.e. not from a regression), under the constraint that the function passes through (0,0) and (1,1). For instance, ...

richarddmorey

646

asked Mar 26 at 15:31

2 votes

1 answer

44 views

How to handle low reliability and missing measurement invariance in small sample?

I’m working with a longitudinal dataset with a small sample size at level 2 (n = 30 / 50 teachers) and a bigger sample size at level 1 (n = 600 students). For the students data, longitudinal MI and ...

Emilia

111

asked Mar 26 at 13:18

0 votes

0 answers

18 views

Modelling G x E interactions in linear and mixed models

I have performed two GWAS for a phenological trait over two year (one analysis for each year) and got different SNPs driving the trait each year. I want to assess whether there is a "year effect&...

Alkaligrass

11

asked Mar 26 at 8:26

2 votes

1 answer

34 views

C-Index less than 0.5 for NNnPH Survival Model

I am working with Neural Network violating Proportional Hazard survival model. This model directly incorporates the proportional hazard into its architecture. I have applied this model in UnempDur ...

coderoid

275

asked Mar 26 at 4:21

0 votes

0 answers

25 views

How to compare non-reference category dummy variables with each other? [closed]

I am running a hierarchical multiple regression, DV= continuous, IV1 = nominal, dummy coded with 3 groups, IV2 = continuous. The SPSS output only provides K-1 comparisons with the reference group (0 v ...

Guesty McGuestyface

1

asked Mar 26 at 3:56

6 votes

2 answers

208 views

Does perfect deterministic dependence of an IV on controls cause multicollinearity in 2SLS first stage?

Context I am estimating the causal effect of mosquito net use on dengue risk using an IV strategy. A government programme provides free mosquito nets to households satisfying both of the following ...

Yash Burman

61

asked Mar 25 at 20:04

2 votes

1 answer

51 views

Several 2D factor smooths of same variables in GAMs with mgcv

I aim to build a model that includes 2D smooth by different factors (to check for smooth differences; e.g., level 1 of factor 1 vs. level 2 of factor 1, level 1 of factor 2 vs. level 2 of factor 2), ...

David

135

asked Mar 25 at 13:13

3 votes

1 answer

54 views

What is the effect on power of departures from target recruitment rates in a stepped-wedge study design?

I performed a sample size calculation for project with a stepped-wedge study design testing difference in duration of hospital stay pre- and post-intervention. This is the design. I calculated that ...

llewmills

2,507

asked Mar 25 at 6:36

2 votes

1 answer

45 views

How does the Cox hazard regression model change with a main vs. interactive model?

How do these two differ in terms of interpretation? When should one be used over the other? ...

esss123

21

asked Mar 25 at 1:58

7 votes

2 answers

261 views

Multiple regression when some predictors are identical across groups

I am working on a multiple regression model examining the effects of several predictors on morphological traits to make interspecific comparisons across many species: Length ~ Height + Average Weight +...

Groundfall

71

asked Mar 25 at 0:32

0 votes

0 answers

26 views

Using total least squares (TLS), finding right standard deviation if x,y scales change after fit

I am looking to fit experimental data to a line that predicts whether a circular pad will slip. This is done considering the shear force and the moment applied on the pad. I would also like to get a ...

user509159

asked Mar 24 at 22:44

3 votes

1 answer

31 views

Question regarding using Survey Weights

I wanted to ask for some guidance on the use of survey weights. My current regression examines how income inequality perceptions vary across different demographic groups. When I do not use weights, ...

chunguc1004

631

asked Mar 24 at 19:17

1 vote

1 answer

50 views

Standardising predictors before using poly() in a GLMM

Crossposting from https://stackoverflow.com/q/79913674/19231816 I am building a GLMM to answer an ecological question. Most of my predictors were log-transformed and then z-score standardised ((x - ...

msug

11

asked Mar 24 at 19:03

2 votes

1 answer

76 views

Normal prior on parameter of binomial GLM gives "flat" predictive prior draws when applied to scaled variable

I'm using the brms package on R to model a binomial GLM of probability of presence of a species (PA) predicted by distance to a ...

Mag

123

asked Mar 24 at 16:59

0 votes

0 answers

30 views

Variance of sum of n variables with known variance [closed]

If I sum up the values of n values, where the each of the values have same known variance, what is the value of the sums, and how does it relate to the number of values?

Hordur Einars

1

asked Mar 24 at 15:05

16 votes

2 answers

2k views

Why do some LLMs like ChatGPT add random words in foreign languages?

I've seen some posts where (English language) ChatGPT answers have certain words written in Arabic, Hindi or Russian. I wonder why that happens. Also older posts see the same for Spanish. An example ...

Mo_

261

asked Mar 24 at 14:57

6 votes

2 answers

252 views

Do Spline Terms Affect the Interpretation of Linear Terms in Logistic Regression?

In a multivariable logistic regression model, some continuous predictors are modeled using spline transformations to allow for nonlinear effects, while other continuous predictors are entered as ...

Konstantinos Gkirgkiris

633

asked Mar 24 at 10:28

0 votes

0 answers

50 views

If the population having normal distribution, small sample, standard deviation of population is not known. What interval estimation is used? [closed]

I am preparing for a midterm test, and I am a bit confused. This question is in our practice test material twice, but with different answers: If population having normal distribution, small sample, ...

Guest

1

asked Mar 24 at 8:41

5 votes

2 answers

181 views

Inflation of p-values of likelihood-ratio tests in longitudinal data analysis, Part 2

This post is a follow‑up to my earlier question: Inflation of p-values of likelihood-ratio tests in longitudinal data analysis. We assume a parallel group design of two randomized groups where ...

Dominik Grathwohl

127

asked Mar 24 at 8:31

2 votes

0 answers

20 views

Why would adding several identical constant node features improve GNN performance?

I am training a graph neural network (GATv2) for graph-level regression on a dataset where all graphs share the same topology, but node features vary across samples. I noticed a strange result: ...

Yaf Rooki

21

asked Mar 23 at 16:32

2 votes

0 answers

31 views

Use of total least squares to get standard deviation from fitted line [duplicate]

I am looking to fit experimental data to a line that predicts whether a circular pad will slip. This is done considering the shear force and the moment applied on the pad. I would also like to get a ...

user509083

21

asked Mar 23 at 15:55

1 vote

0 answers

41 views

How to interpret the two equations of a linear regression?

I have $n$ data points $(x_i, y_i)$ and I am looking for two coefficients $(a, b)$ such that $$\forall i,\quad y_i \approx a x_i + b.$$ I define the mean square error function $$L(a, b) = \sum_i (y_i -...

Stef

499

asked Mar 23 at 14:40