Fundamentals of Statistics contains material of various lectures and courses of H. Lohninger on statistics, data analysis and here for more.

Scaling of Data

Scaling of data may be useful and/or necessary under certain circumstances (e.g. when variables span different ranges). There are several different versions of scaling, the most important of which are listed below. Scaling procedures may be applied to the full data matrix, or to parts of the matrix only (e.g. columnwise).

Range Scaling

Range scaling transforms the values to another range which usually includes both a shift and a change of the scale (magnification, or reduction). The data samples are transformed according to the following equation:

Mean Centering

Subtracting the mean of the data is often called "mean centering". It results in a shift of the data towards the mean. The mean of the transformed data thereafter equals zero:

Y = X - μ


Standardization (sometimes also called autoscaling, or z-transformation) is the scaling procedure which results in a zero mean and unit variance of any descriptor variable. For every data value the mean μ has to be subtracted, and the result has to be divided by the standard deviation σ (note that the order of these two operations must not be reversed):

Y = ( X - μ) / σ