Fundamentals of Statistics contains material of various lectures and courses of H. Lohninger on statistics, data analysis and here for more.

Exercise - Chance Correlation

The following exercise shows you the influence of chance correlations on the results of a multivariate model. The experiment should be set up with an artificial data set which contains only uncorrelated data. In theory it should not be possible to set up a multivariate model relating any variable of the data set to the remaining variables. We will start with 100 observations, reducing the number of observations in repeated experiments (use  DataLab  for all experiments).

  1. Create an empty matrix containing 15 columns (variables) and 100 rows (observations). Fill this data matrix with uncorrelated data.
  2. Calculate a multiple linear regression model, using the first column as the target variables, and the rest as descriptor variables. Write down the goodness of fit and the F statistic.
  3. Repeat step 2 several times, reducing the number of observations to 60, 45, 30, 20, 17, and 15, respectively.
  4. Plot the goodness of fit, and the F statistic against the number of observations.

Can you deduct a rule for the minimum number of observations in relation to the number of variables which yields acceptable results for multiple linear regression? Is this rule valid for neural networks, too ?