Fundamentals of Statistics contains material of various lectures and courses of H. Lohninger on statistics, data analysis and chemometrics......click here for more.


Agglomerative Clustering

Agglomerative clustering is based on the following principle: find the two objects which are closest to each other, merge them into a single new cluster, and repeat this process until all objects and clusters are merged into a single one. During the merging process it is necessary to record the distances of the merged objects in order to construct a dendrogram. The type of clustering can be influenced by the parameters of the Lance-Williams equation:

dqi' = s dpi + t dqi + u dpq + v |dpi-dqp|

with

s,t,u, and v being the system parameters,
dpi, dqi, dpq the distances between the clusters (or objects), and
dqi' being the new distance between the new cluster q and all other objects i. dqi' replaces dqi during the merging process.


Listed below are the parameters of the most commonly used clustering techniques.
 
type of clustering s t u v comment
single linkage 0.5 0.5 0 -0.5 contracting
complete linkage 0.5 0.5 0 0.5 dilating
average linkage 0.5 0.5 0 0 compromise
median 0.5 0.5 -0.25 0 not monotonous
centroid (1) np/n nq/n -npnq/n2 0 not monotonous
Ward (1) (np+ni)/(n-ni) (nq+ni)/(n-ni) -ni/(n-ni) 0 "best" approach
flexible strategy a a 1-2a 0 parameter a determines behavior
n ... number of objects 
np ... number of objects in cluster p 
nq ... number of objects in cluster q 
ni ... number of objects in cluster i



(1) Both Ward's procedure and the centroid procedure require the distance matrix to contain squared Euclidean distances.