Fundamentals of Statistics contains material of various lectures and courses of H. Lohninger on statistics, data analysis and chemometrics......click here for more.


Histogram

Histograms are an efficient and common method to describe distributions of continuous variables. In general, histograms plot the frequency of occurrence of an observation within given fixed-width intervals. Histograms can be regarded as a type of classification of data. Each sample is sorted into one of several "bins" according to some property. The following  interactive example  shows how histograms are calculated.

An important question is the number of intervals used for the histogram: if the number of classes is too low, or too high, the histogram may hide the information in the data. Try this  interactive example  to see the effect of varying interval sizes. There are several rules of thumb on how many classes to use:

nclass
nclass ~ 2
nclass ~ 10log10(n)

The last equation is unsuitable for a low number of observations (<50). The question of the bin width does not arise with data measured on a nominal or ordinal level because the number of classes follows naturally from the class assignments (the only exception would be an ordinal variable with many categories).

When constructing a histogram one should be careful to establish strict proportionality between the areas of the histogram bars and the underlying frequencies. Humans tend to interpret diagrams which do not exhibit this proportionality in a wrong and misleading way. In addition, one should avoid unequal bar widths. By using equal widths the frequencies can be directly related to the heights of the bars.

Histograms, by definition, are stair case functions. A smoother alternative to histograms can be seen in frequency polygons.