Fundamentals of Statistics contains material of various lectures and courses of H. Lohninger on statistics, data analysis and here for more.

MLR and Collinearity

Collinear variables are a major problem with MLR modeling. Two variables are said to be collinear if they are approximately (or exactly) linearly dependent, or in other words, if there is a high correlation between the two variables. If a model is based on highly correlated variables, the estimated regression coefficients become unstable. This renders the coefficients useless for causal interpretation.

There are at least three ways to determine collinearity:

  • looking at the cross correlation table. The cross correlation table, however, displays only collinearities between two variables. If there is a linear relationship between more variables, the cross correlation table is only of limited use. In addition, the correlation is heavily affected by outliers.
  • the variance inflation factor (VIF) measures the increase in variance compared to an orthogonal base.
  • the condition index: the condition index is defined by the square root of the ratio of the largest and the smallest eigenvalue of the scatter matrix XTX. This value is large if there is collinearity between the variables.