Questions tagged [model-selection]
Model selection is a problem of judging which model from some set performs best. Popular methods include $R^2$, AIC and BIC criteria, test sets, and cross-validation. To some extent, feature selection is a subproblem of model selection.
2,037 questions
0
votes
0
answers
44
views
How to plot AIC, BIC of all possible models?
Suppose I was given a data set, say, golf, in the form of an MLR model. Given that best subset selection is choosing the top 5 best models of each size, how would ...
1
vote
0
answers
22
views
3-way holdout for performance evaluation but 2-way for model selection
The paper https://arxiv.org/pdf/1811.12808 by Sebastian Raschka explains how to perform 3-way holdout method, and also how to compute the final model (used in production).
During computation of the ...
2
votes
0
answers
61
views
Do k-folds risk sampling bias and, if so, how do we avoid it?
In cross-validation, $k$-folds are a common way to train, compare and validate models. Often we want to find an optimal set of hyperparameters for our models. There are many ways to probe the ...
0
votes
0
answers
52
views
Dealing with high concurvity and variable selection in GAMMs with imbalanced data (mgcv::bam)
I am using GAMMs to model the probability of occurrence of a species, applying logistic regressions with mgcv::bam() to presence-pseudoabsence data. The dataset ...
0
votes
0
answers
41
views
How do I conduct backward selection on my OLS regression with Newey-West standard errors?
I have run an OLS regression and detected that it contains autocorrelation and heteroskedasticity. To deal with this I intend to use Newey-West standard errors.
But I am not sure what is the proper ...
0
votes
0
answers
55
views
LASSO and cross validation when dealing with missing data
I want to simulate data with missing values and use them to compare the predictive performance of several machine learning algorithms, including LASSO. All analyses will be performed in R, using the ...
0
votes
1
answer
76
views
How to model feeder choice in bees while ignoring unbalanced feeding events per bout?
I'm analyzing an experiment I ran with bumblebees, and really struggling with choosing the appropriate model.
In the experiment, each bee made feeder choices across two temperature conditions:
...
1
vote
0
answers
63
views
How to justify the number of background points in MaxEnt species distribution modeling?
I'm building a species distribution model using MaxEnt with 260 presence points, collected opportunistically within a relatively small study area (a single administrative department in France).
I'm ...
0
votes
0
answers
41
views
How to interpret AIC model selection and uninformative parameters
I have a model set with 36 candidate models and 4 models with an AIC less than or equal to 2.0. I do not want to model average because I don't think my candidate set really fits in with the caveats ...
1
vote
1
answer
43
views
DCC-GARCH: Valid to have different GARCH models for each series?
Most DCC-GARCH tutorials and guides I found online often use "replicate" in creating their DCC specification, i.e. ...
0
votes
1
answer
93
views
DCC-GARCH: Correct way of choosing between the normal distribution and t-distribution
DCC-GARCH is comprised of two stages: (1) estimating the univariate GARCH and (2) estimating the correlations through DCC.
My time series (bond yields) is not normally distributed, as they rejected ...
1
vote
1
answer
65
views
DCC GARCH - Is there any merit in setting omega to zero?
I estimated the univariate GARCH models for each series, and all coefficients are statistically significant. However, upon putting them into one DCC-GARCH model with a DCC(1,1) spec, the individual ...
1
vote
1
answer
79
views
Can Goodness-of-Fit Test be used for Model Selection?
I would like to know whether Goodness of Fit Tests (like Pearson's Chi-squared test or Kolmogorov-Smirnov Test) be used to select which probabilistic distribution model certain empirical observation ...
0
votes
1
answer
52
views
Why do overfitted models in finite mixture regression sometimes have the smallest BIC despite the true number of components being selected frequently?
Learning about EM algorithms and finite mixture models and I've run into a particularly unintuitive problem. I'm trying to fit a finite mixture regression model on simulated data, where the true ...
0
votes
0
answers
76
views
Linear regression after multiple imputation: Should assumptions be checked before or after AIC-based model selection?
I’m currently working on multiple regression analyses with a small sample (n = 36), using multiple imputation via the mice package in R (5 imputed datasets). The ...