Fundamentals of Statistics contains material of various lectures and courses of H. Lohninger on statistics, data analysis and here for more.

Structure of Measured Data

In order to apply any method of data analysis, you have to be knowledgable about the structure of your data. Depending on the kind of analysis (classification or calibration), you should look for several aspects of the data set.

In the case of classification problems, there are basically three cases to be distinguished:

  • data sets with linearly separable classes
  • data with non-linearly separable classes
  • classes which cannot be separated at all

Of all three cases the user should be aware that in the multi-dimensional case, it is not a trivial task to decide which type of problem is being addressed. Furthermore, the selection of the most suitable predictors strongly depends on these issues. So, in general, one has to try out and experiment with the data before setting up a classifier.

In the case of calibration problems, there are two aspects which should be considered prior to building a model:

  • Is a linear model sufficient for describing the relationship between predictors and target variables?
  • Is it necessary to use a non-linear model in order to set up a model?

Again, this decision is not easily made and is even more complicated by noise in the data. In the case of extensive noise, a possible non-linear relationship is often covered by the noise, thus making it impossible to create a non-linear model.