Skip to main content

Questions tagged [large-data]

'Large data' refers to situations where the number of observations (data points) is so large that it necessitates changes in the way the data analyst thinks about or conducts the analysis. (Not to be confused with 'high dimensionality'.)

3 votes
0 answers
78 views

I am creating several GAMM models with similar structures to dynamically model an acoustic parameter across realizations from multiple subjects, who are included as random smooths in my models. In the ...
Jul2415's user avatar
  • 31
-1 votes
1 answer
106 views

Is it possible to run a regression on a panel data set with 10,000 objects ($N$), each with $2,000$ observations (time series length $T$)? If so, what package in R can handle this?
Dane's user avatar
  • 559
2 votes
1 answer
615 views

I have very big dataset (around 10 million rows) with repeated measures of around 500 000 individuals, irregularly spaced through time. My final goal is to do IPTW and fit a weighted cox regression ...
Tasosmav's user avatar
3 votes
1 answer
694 views

Summary: I am trying deal with non-proportional hazards in a Cox model on a large dataset. My question is whether the proportional hazards assumption really does not hold? If no, is the second model ...
Thomas's user avatar
  • 600
2 votes
0 answers
51 views

I am working on a classification problem with what I understand as a big dataset. I have first of all splitted it in my "train" dataset and the "test" one. (Actually I am convinced ...
Videgain's user avatar
  • 121
0 votes
1 answer
126 views

Statistics goal: Determine if the difference between two datasets is statistically significant. Dataset description: The data is available in the form of particle size (mm) v. particle count (...
Dana Tran's user avatar
6 votes
1 answer
625 views

I am doing my undergrad research, aiming to know the difference of before and after an intervention. our sample size is 37 which is already considered as a large sample right? However, when we test ...
Chilenesa's user avatar
5 votes
2 answers
561 views

I have a dataset of a few million observations of a binary response with a low "Success"-probability of on average 1% to 2%. The dataset encompasses several categorical (~20 some with up to ...
g g's user avatar
  • 2,954
1 vote
0 answers
73 views

I have a question about using EFA on a large data set of survey questions. The goal is to form an index from over 200 items, and partly also as a form of dimension reduction (i understand PCA is also ...
Ewen Tan's user avatar
1 vote
0 answers
39 views

I have a large dataset (n > 500,000) which I'm building a linear model with lm(PV1READ ~ PV1MATH + PV1SCIE + ST004D01T). Tests for Normality, No ...
pluke's user avatar
  • 159
0 votes
1 answer
75 views

Consider the VAR(1) process $X_t = \Phi X_{t-1} + \epsilon_t.$ Is there a generally accepted decomposition for the coefficient matrix $\Phi$ that would decrease the degrees of freedom? My initial ...
Ville's user avatar
  • 81
2 votes
0 answers
114 views

I have a large dataset (1.3M rows) where I want to ensure that both Age and Duration increase monotonically for each by factor level (Male, Female). Here is the setup of the model: ...
Colin's user avatar
  • 99
0 votes
1 answer
88 views

I have a list of hotel names which may or may not be correct, and with different spellings (such as '&' instead of 'and'). I want to use clustering in order to group the hotels with different ...
user480840's user avatar
6 votes
2 answers
439 views

In my line of work, I work with large data and often run stat tests to compare differences between groups. The problem I am facing is that if I use a $t$-test to measure any difference, the result ...
baz's user avatar
  • 61
2 votes
2 answers
147 views

I have a DNA methlation data for 32 samples. For each sample I have DNA methylation avaialble for >10000's of cpg bases (ie C nucleotides on DNA). I also have gene expression data from which I have ...
Saad Khan's user avatar
  • 101

15 30 50 per page
1
2 3 4 5
38