Fundamentals of Statistics contains material of various lectures and courses of H. Lohninger on statistics, data analysis and chemometrics......click here for more.


Validation of Models

Creating new models from a finite amount of data always includes a (small) chance that the model will not reflect the underlying relationship, but has been caused by random effects. The chance of invalid models increases with a decreasing number of measurements and an increasing number of variables. This has led to the rule of thumb (which is often too loose, especially with non-linear methods) that the number of measurements has to be at least three times the number of variables in the model.

Some (linear) multivariate methods provide theoretical foundation on the estimation of the reliability of such a model. When it comes to more sophisticated methods, or to non-linear methods, the resulting models have to be validated by a heuristic approach. In principle, there are several methods to perform this, certain ones often being tailored to a specific model.

One approach for validation, however, always performs quite well. This approach is called cross-validation, also known as the "leave-one-out" method. Cross-validation permits the determination of a measure for the prediction error called PRESS (prediction error sum of squares). Another little used procedure for the validation of models is the addition of noise and checking the reaction of the model.