Fundamentals of Statistics contains material of various lectures and courses of H. Lohninger on statistics, data analysis and chemometrics......click here for more.


Analysis of Residuals

The analysis of residuals is important for any regression model. While numerical analysis is more profound, practice shows that numerical tests are unsatisfactory for small samples. However, it is possible to use graphical methods for analyzing residuals. This usually gives better results, since the human brain is trained to recognize patterns.

Besides the distribution of the residuals (they have to be normally distributed), any dependence on one or more of the descriptor variables has to be detected and addressed. Plots of the residuals against the independent variable(s) usually give hints as to  whether the assumptions of a least squares regression are fulfilled (i.e. the detection of outliers, misfits, and heteroscedasticity is much easier by means of residual plots).

The following table gives an overview on the effects of unfulfilled assumptions:

Assumption Effect on residuals if the assumption is not true Typical residual plot
The applied regression function has to match the type of the actual relation between X and Y. Residuals show a systematic deviation from the ideal band structure.
The errors of the measurements are independent of each other. The residuals exhibit a serial correlation. The serial correlation is not always clearly recognizable; perform a Durbin-Watson test to be sure.
For a given X, repeated measurements of Y are normally distributed. The residuals are not normally distributed. For small samples the histogram of the distribution of the residuals may be misleading. So always perform a test for normality.
For each X, the Y-distribution has the same variance. The residuals plotted against x do not show a band structure with equal variation. Typically, the variation of the residuals is higher on one end of the plot than on the other ("trumpet structure").

The following slide show displays some further examples of data sets which do not fulfill these assumptions.