Newest Questions
219,352 questions
1
vote
0
answers
7
views
Is GVIF meaningful for a reduced interaction block created from manually coded factor × treatment terms?
I’m fitting a survival model with a multi-level factor (HistologyClass) and binary treatment variables (Radiotherapy, Chemotherapy). I do not want the full HistologyClass * Treatment interaction, ...
2
votes
1
answer
26
views
Given a dataset of human patients which includes their age, sex, and over 100 measured disease biomarkers, what type of statistical analysis is best?
I was given a dataset of close to 100 human participants. This dataset includes values for 100+ measured hypothesized biomarkers of neurodegeneration, their age, sex, disease state (clinically normal, ...
3
votes
1
answer
27
views
How to handle calendar year as a continuous predictor with a mismatched train/test time horizon?
I am using Ordinal Semiparametric Regression (Frank Harrell's rms package) to model overall survival in patients with brain tumor.
My training data is from the SEER database (covering years 2004 to ...
1
vote
1
answer
22
views
What does the base prediction value actually imply in SHAP?
I know SHAP (or shapley) values are the contribution of each input variable to the model prediction. Adding the base values to the sum of all SHAP values gives you the model prediction for any data ...
1
vote
2
answers
53
views
Spearman2's rho or Chatterjee's xi correlation coefficient for non-monotonic data?
Let's assume the non-monotonic data below (right graph and data from here).
I would like to test if the two variables x and y are correlated or at least not independent, given the non-monotonic ...
2
votes
0
answers
14
views
Sample size calculation for Paired Data using ordinal models [closed]
I came across this blog post by Prof. Harrell Ordinal Models for Paired Data – Statistical Thinking.
I was trying to replicate the simulation in order to estimate the sample size for my paired study. ...
4
votes
1
answer
87
views
Which test do I use to estimate the preference of species?
I have a question about how to analyse my dataset and would really appreciate your advice.
My data consist of observations of a set of target plant species collected during field surveys. The surveys ...
2
votes
0
answers
20
views
Is this train/validation/test split method considered a data leakage?
Training has time steps 0-300.
Validation has time steps 200-400.
Testing has time steps 300-500.
The method uses N time steps as the observed past and the next N time steps as the future.
For example,...
2
votes
0
answers
26
views
Markov State Transition Models vs the Win Ratio in clinical trials
I have recently become interested in Markov State Transition Models for the analysis of clinical trials with composite endpoints, such as the Markov Longitudinal Ordinal Model described in Frank ...
5
votes
1
answer
50
views
Exact maximum likelihood for ARMA models
I am trying to understand the unconditional or exact least squares and maximum likelihood estimation methods for ARMA models. I am struggling to reconcile the different formulations given in standard ...
1
vote
0
answers
45
views
Find the Fisher information of Gamma $(\alpha_0, \theta)$. Where the first parameter is known and the second is not, for a sample $(x_1,\dots,x_n)$
Find the Fisher information of Gamma $(\alpha_0, \theta)$. Where the first parameter is known and the second is not, for a sample $(x_1,\dots,x_n)$.
Or to state the original question( which I restated ...
0
votes
0
answers
28
views
Testing the fit of one model on two separate datasets [closed]
I'm trying to compare the shape of two curves (from two datasets) to an expected exponential shape.
Specifically, I am interested in where people allocate their attention to during a cognitive task. ...
3
votes
2
answers
69
views
Testing performance of numerical algorithm for penalised estimators
Assume $y \in \mathbb{R}^{n}$, $\beta \in \mathbb{R}^{k}$, and $X \mathbb{R}^{n\times k}$ and we are solving the following strictly convex optimisation problem
$$
\hat{\beta} = \arg \min ||y - X\beta |...
7
votes
1
answer
256
views
Survival proportion for censored data
Let's consider the below example as a life table for survival analysis:
At time 124, we have censored patinet, lost to follow up.
Generally, I have noticed that we don't get from the software ...
2
votes
1
answer
26
views
Best method to determine sub-sensor error
I have a system measuring several outputs by sub-sensors and the total input to the sub-sensors is measured from a main sensor.
The above chart shows the error between the sum of sub-sensor readings ...
0
votes
1
answer
43
views
Sample size estimation to compare three group means given a coefficient of variation and a percentage change taking pairwise comparisons into account
I would like to estimate asample size to compare the means of three groups(A,B and C) taking into account three pairwise comparisons (A/B, B/C, A/C) given a coefficient variation and an effect size as ...
2
votes
0
answers
43
views
How is correlation in a 3-dimensional space possible (between 3 variables)? [closed]
I have a dataset with windspeed and stability measurements over 3 years and in 5 different locations (every location represents a turbine). This means I have the variables wsp and Ri (stability) which ...
8
votes
2
answers
530
views
Significance testing: how to say if a given p-value is "strong" or "weak" evidence against the null?
After conducting a statistical test, say I get a p-value of 0.001 (this is just an example, as I'm learning and don't have actual data). I know that in the Fisherian framework, the smaller a p-value, ...
6
votes
0
answers
57
views
What are good references that communicate the idea of 'theory before statistics'?
In my limited experience with data analysis, I realised that there are many decisions an analyst has to make to claim that a given model can be reliably assumed to have generated the data at hand and, ...
0
votes
0
answers
33
views
Confidence intervals on predictions [closed]
I have a model ( clustering) that forecasts revenue. It does so for y1 and 2. For all other years we use an existing model for all customers and use the forecasrs from that xgbRegressor to stick onto ...
4
votes
1
answer
183
views
Power analysis for determining what amount of time to analyze data
I have collected data for a project on animal behavior. My question is, how do I decide what duration of time I should analyze the data at? I'm trying to understand the animal movement in response to ...
0
votes
0
answers
37
views
Problem with data cleaning
The Union of India has undergone frequent political re-organizations since independence. The problem today (for me) is that, I've been unable to account for certain data values of the following states/...
2
votes
0
answers
20
views
Optimal Tournament Design (3v3): Does overlapping "Partners as Opponents" minimize variance in top-tail skill estimation?
I am designing a tournament schedule for $N=61$ players competing in 3v3 matches. The goal is to maximize the accuracy of the final rank for the top $k$ players (specifically seeds 1-4 and 9-12).
The ...
0
votes
0
answers
23
views
Can I pool significant values after linear mixed model? [closed]
I have a set of data with 5 imputations consisting of two groups of students, one in bachelor's and master's, who both went through a program. I did a linear mixed model analysis to show that both ...
5
votes
6
answers
325
views
Updating NHST with power analysis, ASA's p-value interpretation and effect sizes [closed]
I am a statistics teacher in a psychology program. The field of study is important here because, in my country's education system, students who choose psychology often lack a solid foundation in ...
6
votes
2
answers
208
views
Singularity Problem with gls that isn't present in lm
I'm performing an IPD meta-analysis, and need to fit my models with study-specific variances (which is why I need to fit with nlme::gls instead of ...
7
votes
2
answers
275
views
Should I avoid a Dunn's test if my groups have different variability?
Context: I'm trying to understand whether it is sensible to conduct a Kruskal-Wallis then a post-hoc Dunn's test, or a Wilcoxon rank sum on my data. (My understanding is that it isn't appropriate to ...
7
votes
2
answers
275
views
How to decide if an interaction exists: graphically/interaction terms/contrasts of slopes
I have fit interaction models of the form: phenotype ~ genotype * environment, based on theory.
I am assigning environment (GFR, in this case) as the moderator.
I have three scenarios:
A: Non-parallel ...
3
votes
0
answers
29
views
constrain spline to go through two points
I would like to generate a function using splines using the coefficients (i.e. not from a regression), under the constraint that the function passes through (0,0) and (1,1). For instance,
...
2
votes
1
answer
44
views
How to handle low reliability and missing measurement invariance in small sample?
I’m working with a longitudinal dataset with a small sample size at level 2 (n = 30 / 50 teachers) and a bigger sample size at level 1 (n = 600 students). For the students data, longitudinal MI and ...
0
votes
0
answers
18
views
Modelling G x E interactions in linear and mixed models
I have performed two GWAS for a phenological trait over two year (one analysis for each year) and got different SNPs driving the trait each year. I want to assess whether there is a "year effect&...
2
votes
1
answer
34
views
C-Index less than 0.5 for NNnPH Survival Model
I am working with Neural Network violating Proportional Hazard survival model. This model directly incorporates the proportional hazard into its architecture.
I have applied this model in UnempDur ...
0
votes
0
answers
25
views
How to compare non-reference category dummy variables with each other? [closed]
I am running a hierarchical multiple regression, DV= continuous, IV1 = nominal, dummy coded with 3 groups, IV2 = continuous.
The SPSS output only provides K-1 comparisons with the reference group (0 v ...
6
votes
2
answers
208
views
Does perfect deterministic dependence of an IV on controls cause multicollinearity in 2SLS first stage?
Context
I am estimating the causal effect of mosquito net use on dengue risk using an IV strategy. A government programme provides free mosquito nets to households satisfying both of the following ...
2
votes
1
answer
51
views
Several 2D factor smooths of same variables in GAMs with mgcv
I aim to build a model that includes 2D smooth by different factors (to check for smooth differences; e.g., level 1 of factor 1 vs. level 2 of factor 1, level 1 of factor 2 vs. level 2 of factor 2), ...
3
votes
1
answer
54
views
What is the effect on power of departures from target recruitment rates in a stepped-wedge study design?
I performed a sample size calculation for project with a stepped-wedge study design testing difference in duration of hospital stay pre- and post-intervention. This is the design.
I calculated that ...
2
votes
1
answer
45
views
How does the Cox hazard regression model change with a main vs. interactive model?
How do these two differ in terms of interpretation? When should one be used over the other?
...
7
votes
2
answers
261
views
Multiple regression when some predictors are identical across groups
I am working on a multiple regression model examining the effects of several predictors on morphological traits to make interspecific comparisons across many species:
Length ~ Height + Average Weight +...
0
votes
0
answers
26
views
Using total least squares (TLS), finding right standard deviation if x,y scales change after fit
I am looking to fit experimental data to a line that predicts whether a circular pad will slip. This is done considering the shear force and the moment applied on the pad. I would also like to get a ...
3
votes
1
answer
31
views
Question regarding using Survey Weights
I wanted to ask for some guidance on the use of survey weights. My current regression examines how income inequality perceptions vary across different demographic groups.
When I do not use weights, ...
1
vote
1
answer
50
views
Standardising predictors before using poly() in a GLMM
Crossposting from https://stackoverflow.com/q/79913674/19231816
I am building a GLMM to answer an ecological question. Most of my predictors were log-transformed and then z-score standardised ((x - ...
2
votes
1
answer
76
views
Normal prior on parameter of binomial GLM gives "flat" predictive prior draws when applied to scaled variable
I'm using the brms package on R to model a binomial GLM of probability of presence of a species (PA) predicted by distance to a ...
0
votes
0
answers
30
views
Variance of sum of n variables with known variance [closed]
If I sum up the values of n values, where the each of the values have same known variance,
what is the value of the sums, and how does it relate to the number of values?
16
votes
2
answers
2k
views
Why do some LLMs like ChatGPT add random words in foreign languages?
I've seen some posts where (English language) ChatGPT answers have certain words written in Arabic, Hindi or Russian. I wonder why that happens. Also older posts see the same for Spanish. An example ...
6
votes
2
answers
252
views
Do Spline Terms Affect the Interpretation of Linear Terms in Logistic Regression?
In a multivariable logistic regression model, some continuous predictors are modeled using spline transformations to allow for nonlinear effects, while other continuous predictors are entered as ...
0
votes
0
answers
50
views
If the population having normal distribution, small sample, standard deviation of population is not known. What interval estimation is used? [closed]
I am preparing for a midterm test, and I am a bit confused.
This question is in our practice test material twice, but with different answers:
If population having normal distribution, small sample, ...
5
votes
2
answers
181
views
Inflation of p-values of likelihood-ratio tests in longitudinal data analysis, Part 2
This post is a follow‑up to my earlier question:
Inflation of p-values of likelihood-ratio tests in longitudinal data analysis.
We assume a parallel group design of two randomized groups where
...
2
votes
0
answers
20
views
Why would adding several identical constant node features improve GNN performance?
I am training a graph neural network (GATv2) for graph-level regression on a dataset where all graphs share the same topology, but node features vary across samples. I noticed a strange result:
...
2
votes
0
answers
31
views
Use of total least squares to get standard deviation from fitted line [duplicate]
I am looking to fit experimental data to a line that predicts whether a circular pad will slip. This is done considering the shear force and the moment applied on the pad. I would also like to get a ...
1
vote
0
answers
41
views
How to interpret the two equations of a linear regression?
I have $n$ data points $(x_i, y_i)$ and I am looking for two coefficients $(a, b)$ such that $$\forall i,\quad y_i \approx a x_i + b.$$
I define the mean square error function $$L(a, b) = \sum_i (y_i -...