Frequent Questions
29,889 questions
258
votes
12
answers
145k
views
Why is accuracy not the best measure for assessing classification models?
This is a general question that was asked indirectly multiple times in here, but it lacks a single authoritative answer. It would be great to have a detailed answer to this for the reference.
...
258
votes
8
answers
134k
views
Algorithms for automatic model selection
I would like to implement an algorithm for automatic model selection.
I am thinking of doing stepwise regression but anything will do (it has to be based on linear regressions though).
My problem ...
386
votes
15
answers
159k
views
Is normality testing 'essentially useless'?
A former colleague once argued to me as follows:
We usually apply normality tests to the results of processes that,
under the null, generate random variables that are only
asymptotically or ...
139
votes
5
answers
33k
views
Are unbalanced datasets problematic, and (how) does oversampling (purport to) help?
TL;DR
See title.
Motivation
I am hoping for a canonical answer along the lines of "(1) No, (2) Not applicable, because (1)", which we can use to close many wrong questions about unbalanced ...
375
votes
9
answers
377k
views
What should I do when my neural network doesn't learn?
I'm training a neural network but the training loss doesn't decrease. How can I fix this?
I'm not asking about overfitting or regularization. I'm asking about how to solve the problem where my network'...
306
votes
16
answers
559k
views
What is the meaning of p values and t values in statistical tests?
After taking a statistics course and then trying to help fellow students, I noticed one subject that inspires much head-desk banging is interpreting the results of statistical hypothesis tests. It ...
92
votes
4
answers
63k
views
Reduce Classification Probability Threshold
I have a question regarding classification in general. Let $f$ be a classifier, which outputs a set of probabilities given some data D. Normally, one would say: well, if $P(c|D) > 0.5$, we will ...
302
votes
3
answers
37k
views
How to know that your machine learning problem is hopeless?
Imagine a standard machine-learning scenario:
You are confronted with a large multivariate dataset and you have a
pretty blurry understanding of it. What you need to do is to make
predictions ...
186
votes
6
answers
130k
views
Can a probability distribution value exceeding 1 be OK?
On the Wikipedia page about naive Bayes classifiers, there is this line:
$p(\mathrm{height}|\mathrm{male}) = 1.5789$ (A probability distribution over 1 is OK. It is the area under the bell curve ...
124
votes
8
answers
58k
views
What is the benefit of breaking up a continuous predictor variable?
I'm wondering what the value is in taking a continuous predictor variable and breaking it up (e.g., into quintiles), before using it in a model.
It seems to me that by binning the variable we lose ...
213
votes
9
answers
238k
views
How to deal with perfect separation in logistic regression?
If you have a variable which perfectly separates zeroes and ones in target variable, R will yield the following "perfect or quasi perfect separation" warning message:
...
1384
votes
27
answers
991k
views
Making sense of principal component analysis, eigenvectors & eigenvalues
In today's pattern recognition class my professor talked about PCA, eigenvectors and eigenvalues.
I understood the mathematics of it. If I'm asked to find eigenvalues etc. I'll do it correctly like ...
351
votes
13
answers
207k
views
How to understand degrees of freedom?
From Wikipedia, there are three interpretations of the degrees of freedom of a statistic:
In statistics, the number of degrees of freedom is the number of
values in the final calculation of a ...
404
votes
13
answers
408k
views
Difference between logit and probit models
What is the difference between Logit and Probit model?
I'm more interested here in knowing when to use logistic regression, and when to use Probit.
If there is any literature which defines it using ...
108
votes
1
answer
94k
views
What are the shortcomings of the Mean Absolute Percentage Error (MAPE)?
The Mean Absolute Percentage Error (mape) is a common accuracy or error measure for time series or other predictions,
$$ \text{MAPE} = \frac{100}{n}\sum_{t=1}^n\frac{|A_t-F_t|}{A_t}\%,$$
where $A_t$ ...