Fundamentals of Statistics contains material of various lectures and courses of H. Lohninger on statistics, data analysis and chemometrics......click here for more.  ## Exercise - The Effect of Collinear Variables on MLR Models

Strongly correlated variables cause MLR models to become unstable. In order to show this, try to introduce some collinear variables into the data set BOILPTS and calculate a multiple regression model with and without the collinear variables. To be specific,

1. Create two copies of the variable "RandicToz" and add a small amount of noise to both copies (2 %).
2. Calculate an MLR model for the boiling point, using the original "RandicToz" variable and one of the noisy variables. Save the protocol in a file.
3. Calculate another MLR model by using the other noisy variable, instead of the first one. Again, save the protocol file.
4. Create two copies of the variable "JHET" and add a small amount of noise to both copies (2 %).
5. Calculate a third MLR model for the boiling point, using the original "RandicToz" variable and one of the noisy JHET variables. Save the protocol in a file.
6. Finally, calculate a fourth MLR model for the boiling point, using the original "RandicToz" variable and the other noisy JHET variable. Save the protocol in a file.
7. Compare the four protocols. Look at the goodness of fit, the F values and the regression coefficients. Do you see the difference between models 1 & 2 and models 3 & 4?

Now go to the  DataLab  and carry out the above-mentioned steps. In addition to comparing the regression parameters, you should also have a close look at the estimated values. You will see that the estimated values will differ quite a lot for the first two data sets, but only a little for the 3rd and the 4th sets.

Last Update: 2012-10-08