Questions tagged [regression]
Techniques for analyzing the relationship between one (or more) "dependent" variables and "independent" variables.
4,697 questions
124
votes
8
answers
58k
views
What is the benefit of breaking up a continuous predictor variable?
I'm wondering what the value is in taking a continuous predictor variable and breaking it up (e.g., into quintiles), before using it in a model.
It seems to me that by binning the variable we lose ...
213
votes
9
answers
238k
views
How to deal with perfect separation in logistic regression?
If you have a variable which perfectly separates zeroes and ones in target variable, R will yield the following "perfect or quasi perfect separation" warning message:
...
187
votes
11
answers
243k
views
When is it ok to remove the intercept in a linear regression model?
I am running linear regression models and wondering what the conditions are for removing the intercept term.
In comparing results from two different regressions where one has the intercept and the ...
138
votes
18
answers
124k
views
Including the interaction but not the main effects in a model
Is it ever valid to include a two-way interaction in a model without including the main effects? What if your hypothesis is only about the interaction, do you still need to include the main effects?
114
votes
6
answers
44k
views
Principled way of collapsing categorical variables with many levels?
What techniques are available for collapsing (or pooling) many categories to a few, for the purpose of using them as an input (predictor) in a statistical model?
Consider a variable like college ...
291
votes
6
answers
51k
views
Is $R^2$ useful or dangerous?
I was skimming through some lecture notes by Cosma Shalizi (in particular, section 2.1.1 of the second lecture), and was reminded that you can get very low $R^2$ even when you have a completely linear ...
223
votes
5
answers
286k
views
How exactly does one “control for other variables”?
Here is the article that motivated this question: Does impatience make us fat?
I liked this article, and it nicely demonstrates the concept of “controlling for other variables” (IQ, career, income, ...
126
votes
3
answers
151k
views
Does an unbalanced sample matter when doing logistic regression?
Okay, so I think I have a decent enough sample, taking into account the 20:1 rule of thumb: a fairly large sample (N=374) for a total of 7 candidate predictor variables.
My problem is the following: ...
140
votes
3
answers
53k
views
What if residuals are normally distributed, but y is not?
I've got a weird question. Assume that you have a small sample where the dependent variable that you're going to analyze with a simple linear model is highly left skewed. Thus you assume that $u$ is ...
109
votes
10
answers
54k
views
What is a complete list of the usual assumptions for linear regression?
What are the usual assumptions for linear regression?
Do they include:
a linear relationship between the independent and dependent variable
independent errors
normal distribution of errors
...
232
votes
9
answers
565k
views
In linear regression, when is it appropriate to use the log of an independent variable instead of the actual values?
Am I looking for a better behaved distribution for the independent variable in question, or to reduce the effect of outliers, or something else?
33
votes
1
answer
15k
views
Omitted variable bias in logistic regression vs. omitted variable bias in ordinary least squares regression
I have a question about omitted variable bias in logistic and linear regression.
Say I omit some variables from a linear regression model. Pretend that those omitted variables are uncorrelated with ...
30
votes
1
answer
14k
views
Goodness of fit and which model to choose linear regression or Poisson
I need some advice regarding two main dilemmas in my research, which is a case study of 3 big pharmaceuticals and innovation. Number of patents per year is the dependent variable.
My questions are
...
48
votes
1
answer
20k
views
How do you deal with "nested" variables in a regression model?
Consider a statistical problem where you have a response variable that you want to describe conditional on an explanatory ...
73
votes
2
answers
28k
views
Is there a difference between 'controlling for' and 'ignoring' other variables in multiple regression?
The coefficient of an explanatory variable in a multiple regression tells us the relationship of that explanatory variable with the dependent variable. All this, while 'controlling' for the other ...