Frequent Questions

258 votes
12 answers
145k views

This is a general question that was asked indirectly multiple times in here, but it lacks a single authoritative answer. It would be great to have a detailed answer to this for the reference. ...
Tim's user avatar
  • 144k
258 votes
8 answers
134k views

I would like to implement an algorithm for automatic model selection. I am thinking of doing stepwise regression but anything will do (it has to be based on linear regressions though). My problem ...
S4M's user avatar
  • 2,742
386 votes
15 answers
159k views

A former colleague once argued to me as follows: We usually apply normality tests to the results of processes that, under the null, generate random variables that are only asymptotically or ...
shabbychef's user avatar
  • 15.2k
139 votes
5 answers
33k views

TL;DR See title. Motivation I am hoping for a canonical answer along the lines of "(1) No, (2) Not applicable, because (1)", which we can use to close many wrong questions about unbalanced ...
Stephan Kolassa's user avatar
375 votes
9 answers
377k views

I'm training a neural network but the training loss doesn't decrease. How can I fix this? I'm not asking about overfitting or regularization. I'm asking about how to solve the problem where my network'...
Sycorax's user avatar
  • 95.8k
306 votes
16 answers
559k views

After taking a statistics course and then trying to help fellow students, I noticed one subject that inspires much head-desk banging is interpreting the results of statistical hypothesis tests. It ...
Sharpie's user avatar
  • 4,504
92 votes
4 answers
63k views

I have a question regarding classification in general. Let $f$ be a classifier, which outputs a set of probabilities given some data D. Normally, one would say: well, if $P(c|D) > 0.5$, we will ...
sdgaw erzswer's user avatar
302 votes
3 answers
37k views

Imagine a standard machine-learning scenario: You are confronted with a large multivariate dataset and you have a pretty blurry understanding of it. What you need to do is to make predictions ...
Tim's user avatar
  • 144k
186 votes
6 answers
130k views

On the Wikipedia page about naive Bayes classifiers, there is this line: $p(\mathrm{height}|\mathrm{male}) = 1.5789$ (A probability distribution over 1 is OK. It is the area under the bell curve ...
babelproofreader's user avatar
124 votes
8 answers
58k views

I'm wondering what the value is in taking a continuous predictor variable and breaking it up (e.g., into quintiles), before using it in a model. It seems to me that by binning the variable we lose ...
Tom's user avatar
  • 1,871
213 votes
9 answers
238k views

If you have a variable which perfectly separates zeroes and ones in target variable, R will yield the following "perfect or quasi perfect separation" warning message: ...
user333's user avatar
  • 7,361
1384 votes
27 answers
991k views

In today's pattern recognition class my professor talked about PCA, eigenvectors and eigenvalues. I understood the mathematics of it. If I'm asked to find eigenvalues etc. I'll do it correctly like ...
claws's user avatar
  • 14.1k
351 votes
13 answers
207k views

From Wikipedia, there are three interpretations of the degrees of freedom of a statistic: In statistics, the number of degrees of freedom is the number of values in the final calculation of a ...
Tim's user avatar
  • 20.1k
404 votes
13 answers
408k views

What is the difference between Logit and Probit model? I'm more interested here in knowing when to use logistic regression, and when to use Probit. If there is any literature which defines it using ...
Beta's user avatar
  • 6,526
108 votes
1 answer
94k views

The Mean Absolute Percentage Error (mape) is a common accuracy or error measure for time series or other predictions, $$ \text{MAPE} = \frac{100}{n}\sum_{t=1}^n\frac{|A_t-F_t|}{A_t}\%,$$ where $A_t$ ...
Stephan Kolassa's user avatar

15 30 50 per page
1
2 3 4 5
1993