Agglomerative Clustering
Agglomerative clustering is based on the following principle: find the
two objects which are closest to each other, merge them into a single new
cluster, and repeat this process until all objects and clusters are merged
into a single one. During the merging process it is necessary to record
the distances of the merged objects in order to construct a dendrogram.
The type of clustering can be influenced by the parameters of the Lance-Williams equation:
dqi' = s dpi + t dqi + u dpq
+ v |dpi-dqp|
with
s,t,u, and v being the system parameters,
dpi, dqi, dpq the distances between
the clusters (or objects), and
dqi' being the new distance between the new cluster q and
all other objects i. dqi' replaces dqi during the
merging process.
Listed below are the parameters of the most commonly used clustering
techniques.
| type of clustering |
s |
t |
u |
v |
comment |
| single linkage |
0.5 |
0.5 |
0 |
-0.5 |
contracting |
| complete linkage |
0.5 |
0.5 |
0 |
0.5 |
dilating |
| average linkage |
0.5 |
0.5 |
0 |
0 |
compromise |
| median |
0.5 |
0.5 |
-0.25 |
0 |
not monotonous |
| centroid |
np/n |
nq/n |
-npnq/n2 |
0 |
not monotonous |
| Ward |
(np+ni)/(n-ni) |
(nq+ni)/(n-ni) |
-ni/(n-ni) |
0 |
"best" approach |
| flexible strategy |
a |
a |
1-2a |
0 |
parameter a determines behavior |
n ... number of objects
np ... number of objects in cluster p
nq ... number of objects in cluster q
ni ... number of objects in cluster i |
|