Fundamentals of Statistics contains material of various lectures and courses of H. Lohninger on statistics, data analysis and here for more.

Regression - Assumptions

As with any other method, linear regression is based on assumptions which have to be fulfilled for correct results:

  • The expected relationship between X and Y is linear: one should carefully distinguish linear, curvilinear and non-linear relationships. While curvilinear relationships can be transformed into linear ones, non-linear relationships cannot.
  • All measurements are independent of each other; any trend over time, or any common correlation to a third variable, must be avoided.
  • For each X, the Y values are distributed normally.
  • For each X, the Y-distribution has the same variance (homoscedastic data). This requirement is often not met, especially with data covering a large range (several orders of magnitude).

These assumptions should be checked by inspecting the data and the residuals. One should always look at the X-Y plot, at the histogram of the residuals, and at the residuals plotted against Xi. Further, it is a good idea to check whether the residuals are uncorrelated (e.g. using the Durbin-Watson-Test) as the confidence intervals of the parameters will be wrong in case of serial correlation among the residuals.