Fundamentals of Statistics contains material of various lectures and courses of H. Lohninger on statistics, data analysis and chemometrics......click here for more.


Linear Discriminant Analysis - Introduction

Linear Discriminant Analysis (LDA) is a method to discriminate between two or more groups of samples. In order to develop a classifier based on LDA, you have to perform the following steps:
 

definition of groups

definition of discriminating function

estimation of discriminating function

test of discriminating function

application

Definition of groups:

The groups to be discriminated can be defined either naturally by the problem under investigation, or by some preceding analysis, such as a cluster analysis. The number of groups is not restricted to two, although the discrimination between two groups is the most common approach. Note that the number of groups must not exceed the number of variables describing the data set. Another prerequisite is that the groups have the same covariance structure (i.e. they must be comparable).
 

Definition of discriminating function:

In principle, any mathematical function may be used as a discriminating function. In case of the LDA, a linear function of the form

y = a0 + a1x1 + a2x2 + ..... + anxn

is used, with xi being the variables describing the data set. The parameters aihave to be determined in such a way that the discrimination between the groups is best. Note that this linear discriminating function is formally equivalent to the multiple linear regression. In fact, one can directly use MLR if the response variable y is replaced by the weighted class numbers c1 and c2:

c1 = n2/(n1+n2)    and    c2 = - n1/(n1+n2)

In order to get a better understanding of the working of  the discriminating function, start the following  interactive example .
 

Estimation of the parameters of the discriminating function:

As you have seen in the interactive example above, there is only one direction of the discriminating line which yields the best separation results. The determination of the coefficients of the discriminating function is quite simple. In principle, the discriminating function is formed in such a way that the separation (=distance) between the groups is maximized, and the distance within the groups is minimized.
 

Test of the discriminating function

When the discriminating function is parametrized, it has to be tested either by using an independent set of test data, or by performing cross-validation. In both cases, the results of the test set should be comparable to the training data.
 

Application

Discriminant analysis can be used to perform either analysis or classification:
 

  • Analysis: How can the material be interpreted? Which variables contribute most to the difference?
  • Classification: Given that a discriminating function can be found which provides satisfactory separation, this function can be used to classify unknown objects.