3.6. Assumptions in Regression#

The statistical assumptions for regression are as follows:

  • The sample is randomly selected.

  • The linear regression model assumes that the relationship between \(x\) and the mean of \(y\) follows a straight line.

  • The conditional distribution of \(y\) is normal.

  • The standard deviation of the conditional distribution is the same at each fixed \(x\) value (homoscedasticity).

  • In addition, we need to be careful that there is not too much overlap between the explanatory variables. Too much overlap is known as multicollinearity.

In practice, the assumptions are never perfectly fulfilled, but the regression model can still be useful. It is adequate to check that no assumption is grossly violated. (Agresti, Chapter 14).

Note that first assumption – that the sample is randomly selected – depends on the method of data collection. It is not a statistical test!

The remaining assumptions are statistical assumptions, and we will look at some of these later.