Fundamentals of Statistics contains material of various lectures and courses of H. Lohninger on statistics, data analysis and chemometrics......click here for more. 
Home Multivariate Data Modeling Validation of Models CrossValidation  
See also: Predictive Ability, Validation of Models  
Cross ValidationWhen setting up multivariate models, it is very important to check their validity. While the reliablity of wellknown linear models can usually be expressed by some theoretical measures (i.e. the Fvalue, or the goodness of fit), the situation is less favourable to other methods, such as neural networks or some other nonlinear mappings. One particular method to assess the performance of a method is a procedure commonly called crossvalidation, or bootstrapping. While there are several different flavors of crossvalidation, the fundamental idea stays the same: the model data is split into two mutally exclusive sets, a larger one (the 'training' set) and a smaller one (the 'test' set). The larger data set is used to set up the model, while the smaller data set is used to validate the model, i.e. the model is applied to the smaller data set and the results are compared to the expected values (as defined in the smaller data set). This process is then repeated with different subsets, until each object of the data set is used once for the test set. The size of the test set for each repetition of the procedure can be adjusted to the user's needs, and mainly depends on the size of the entire data set and the amount of time and effort used to perform the crossvalidation. There are two conceivable extreme cases: (1) splitting the data set into two equal halves, and (2) selecting only a single object for the test set. The latter approach is also called full crossvalidation, and is in general the favourable approach.
In order to measure the performance of the model, one should calculate the PRESS value.


Home Multivariate Data Modeling Validation of Models CrossValidation 