Fundamentals of Statistics contains material of various lectures and courses of H. Lohninger on statistics, data analysis and chemometrics......click here for more.


Predictive Ability

For many procedures of multvariate statistics the degrees of freedom cannot be specified, thus preventing the calculation of the standard error. In order to get nevertheless a metric for the prediction error one can resort to the quadratic mean of the observed residuals:

RMSEP stands for Root Mean Squared Error of Prediction,
PRESS stands for "PRedictive Error Sum of Squares", or "PREdiction Sum of Squares" .

RMSEP is calculated by summing all squared prediction errors during cross-validation and is an indicator of the reliability and predictive ability of the model. The lower the RMSEP value the higher is the predictive ability of the model.

PRESS (or RMSEP) can be used to find the optimum number of components by a stepwise variable selection procedure. The "best" model consists of as few predictor variables as possible and shows the lowest (or almost the lowest) PRESS. In the figure below you see an example of a hypothetical variable selection procedure, resulting in the "best" model of 5 predictor variables.

Note: a disadvantage of using PRESS or RMSEP is the enormous number of calculations necessary to obtain the PRESS value. This is especially true for calculation-intensive models (such as neural networks) and large data sets.