Fundamentals of Statistics contains material of various lectures and courses of H. Lohninger on statistics, data analysis and here for more.

Distance Matrix

If we look at a data matrix A having n objects and p variables, we can define a distance matrix D by calculating the distance between each pair of objects and entering it into the distance matrix. The distance matrix is a symmetric quadratic matrix of size n n which contains all zeroes along the main diagonal (the distance of each object and a replica of itself is zero).

The distances can be calculated using various measures of distance so that the distance matrix may contain not only well-known Euclidean distances in meters, but also, for example, topological distances or decorrelated distances.

Distance matrices form a convenient basis for many calculations. However they consume a lot of memory (especially if the data matrix contains many object, i.e. if n is large) so that in some application only sub-matrices are calculated in order to save memory.

Distance matrices find an application in many fields including the following:

  • the search for the shortest connection between to points in a network
  • cluster analysis
  • the calculation of quantitative structure property relationships (QSPR, QSAR)
  • topological indices
  • th search for chemical sub-structures
  • bioinformatics