Fundamentals of Statistics contains material of various lectures and courses of H. Lohninger on statistics, data analysis and chemometrics......click here for more.


Exercise - Create a Data Set with Outliers

The detection of outliers can be quite important, and cumbersome. To gain experience in detecting outliers, you should design 2 data sets exhibiting the following features:

 

Data set 1: 700 to 1500 data points, normally distributed, no special measures against outliers taken (use the function "gauss" of the DataLab command Math/Transformation/Single Formula to create the data set)

 

Data set 2: approx. 1000 data points, skewed to the right (hint: use squared data of a normal distribution with a zero mean to create the skewed data). Change 2 values of the data set such that one of these values falls outside the +/-2.5 sigma range, but within the +/-4 sigma range, and the other falls outside the +/-4 sigma range.

 

Apply the variance/iqr outlier test of DataLab and report the list of outliers.

Please answer the following questions:

  • How many values of data set 1 were you expecting to fall outside the 2.5 sigma area, and how many values actually fall outside these limits?
  • Remove the "outliers" of data set 1 which fall outside the 2.5 sigma area, and repeat the test. What is the result? Is it OK to eliminate outliers by such a stepwise approach?
  • Compare the results of the 2.5 sigma test and the interquartile range test for both data sets. Explain the difference in sensitivity.


You may now go directly to the  DataLab  in order to experiment with the data.