Fundamentals of Statistics contains material of various lectures and courses of H. Lohninger on statistics, data analysis and chemometrics......click here for more.

## Agglomerative Clustering

Agglomerative clustering is based on the following principle: find the two objects which are closest to each other, merge them into a single new cluster, and repeat this process until all objects and clusters are merged into a single one. During the merging process it is necessary to record the distances of the merged objects in order to construct a dendrogram. The type of clustering can be influenced by the parameters of the Lance-Williams equation:

dqi' = s dpi + t dqi + u dpq + v |dpi-dqp|

with

s,t,u, and v being the system parameters,
dpi, dqi, dpq the distances between the clusters (or objects), and
dqi' being the new distance between the new cluster q and all other objects i. dqi' replaces dqi during the merging process.

Listed below are the parameters of the most commonly used clustering techniques.

 type of clustering s t u v comment single linkage 0.5 0.5 0 -0.5 contracting complete linkage 0.5 0.5 0 0.5 dilating average linkage 0.5 0.5 0 0 compromise median 0.5 0.5 -0.25 0 not monotonous centroid (1) np/n nq/n -npnq/n2 0 not monotonous Ward (1) (np+ni)/(n-ni) (nq+ni)/(n-ni) -ni/(n-ni) 0 "best" approach flexible strategy a a 1-2a 0 parameter a determines behavior n ... number of objects  np ... number of objects in cluster p  nq ... number of objects in cluster q  ni ... number of objects in cluster i

 (1) Both Ward's procedure and the centroid procedure require the distance matrix to contain squared Euclidean distances.