Fundamentals of Statistics contains material of various lectures and courses of H. Lohninger on statistics, data analysis and here for more.

Missing Values

One major problem of any analysis of data is caused by missing values. The resulting, partially empty data matrices are hard to interpret and should be avoided whenever possible. However, several methods exist to deal with missing values.

    Voice of an expert:
    "Proper (i.e. versatile) missing value handling is essential to any data analysis package worthy of the name"
    Mark Myatt, Brixton Health, UK, newsgroup sci.stat.consult, Dec 1996

Possibilities to deal with missing values:

  • use only rows (or columns) of data that have no missing values
  • fill in missing values with row (or column) averages or with values estimated by regression
  • use only this data for each analysis option which is available for that particular case
  • use your knowledge of the data source to impute missing values
  • some packages do not offer any methods of imputation, but extends all interactive graphic tools to include missing values
  • sometimes missing data may have a meaning of its own (e.g. in sociological studies, where no answer to a question may also be some kind of an answer)

The results of a model or analysis should always be checked with and without the missing data. If they are markedly different you should try to find some explanation for this. More information on that topic is available in the book on missing data by Rubin .

Be sure to always mark imputed data as such. Otherwise you may confuse it with real data later on.

Last Update: 2012-10-08