Skip to main content

Questions tagged [modeling]

This tag describes the process of creating a statistical or machine learning model. Always add a more specific tag.

302 votes
3 answers
37k views

Imagine a standard machine-learning scenario: You are confronted with a large multivariate dataset and you have a pretty blurry understanding of it. What you need to do is to make predictions ...
Tim's user avatar
  • 144k
138 votes
18 answers
124k views

Is it ever valid to include a two-way interaction in a model without including the main effects? What if your hypothesis is only about the interaction, do you still need to include the main effects?
Glen's user avatar
  • 7,610
296 votes
13 answers
275k views

The AIC and BIC are both methods of assessing model fit penalized for the number of estimated parameters. As I understand it, BIC penalizes models more for free parameters than does AIC. Beyond a ...
russellpierce's user avatar
137 votes
5 answers
144k views

Question: I want to be sure of something, is the use of k-fold cross-validation with time series is straightforward, or does one need to pay special attention before using it? Background: I'm ...
Mickaël S's user avatar
  • 1,538
80 votes
7 answers
56k views

I am actually reviewing a manuscript where the authors compare 5-6 logit regression models with AIC. However, some of the models have interaction terms without including the individual covariate terms....
djhocking's user avatar
  • 2,071
91 votes
14 answers
73k views

"Essentially, all models are wrong, but some are useful." --- Box, George E. P.; Norman R. Draper (1987). Empirical Model-Building and Response Surfaces, p. 424, Wiley. ISBN 0471810339. What ...
gpuguy's user avatar
  • 1,143
83 votes
4 answers
29k views

I have produced generalized additive models for deforestation. To account for spatial-autocorrelation, I have included latitude and longitude as a smoothed, interaction term (i.e. s(x,y)). I've based ...
gisol's user avatar
  • 1,033
63 votes
4 answers
101k views

In linear regression, each predicted value is assumed to have been picked from a normal distribution of possible values. See below. But why is each predicted value assumed to have come from a normal ...
luciano's user avatar
  • 14.7k
6 votes
1 answer
3k views

I have customer data from 2 brands. The data structure are the same, but I expected the customer behaviour to be different in different brand. So I could train 2 models, 1 for each brand, or I could ...
BigName's user avatar
  • 163
70 votes
3 answers
36k views

In what circumstances would you want to, or not want to scale or standardize a variable prior to model fitting? And what are the advantages / disadvantages of scaling a variable?
Andrew's user avatar
  • 6,378
35 votes
3 answers
13k views

Common data-based variable selection procedures (for example, forward, backward, stepwise, all subsets) tend to yield models with undesirable properties, including: Coefficients biased away from zero. ...
user avatar
18 votes
1 answer
5k views

Recently, randomly browsing questions triggered a memory of on off-hand comment from one of my professors a few years back warning about the usage of ratios in regression models. So I started reading ...
Affine's user avatar
  • 2,457
84 votes
6 answers
13k views

This question has been asked on CV some yrs ago, it seems worth a repost in light of 1) order of magnitude better computing technology (e.g. parallel computing, HPC etc) and 2) newer techniques, e.g. [...
horaceT's user avatar
  • 3,382
73 votes
7 answers
112k views

What is meant when we say we have a saturated model?
Graham Cookson's user avatar
16 votes
2 answers
4k views

I'm trying to model some data on train arrival times. I'd like to use a distribution that captures "the longer I wait, the more likely the train is going to show up". It seems like such a distribution ...
foobar's user avatar
  • 733

15 30 50 per page
1
2 3 4 5
23