K-means clustering is an algorithm that is used in two different BoundarySeer techniques, spatially-constrained clustering and fuzzy classification.
Both techniques require grouping the data into classes, or clusters. In fuzzy classification, the classes are based on variable values, irrespective of spatial location. In spatially-constrained clustering, as the name suggests, group membership is constrained by the spatial location, i.e. distant locations with similar values will not be grouped together.
For both methods, k-means clustering begins and ends with a fixed number of classes (or clusters). Class memberships are rearranged through an iterative process in order to optimize the classification, using the following criteria:
|
Where, M = (mic) is a matrix of class memberships,
R = (rcv) is a matrix of class means, rcv, denoting the mean of class c for variable v,
xi = (xi1,...,xip)T is the vector representing values of the p variables at location i,
rc = (rc1,...,rcp)T is the vector representing the center of class c in terms of means of the p variables,
d2(xi,rc) is the square distance between xi and rc, also expressed as dic2.
f is the fuzziness criterion ( f= 1 gives hard clusters and is required for spatially constrained clustering; f= 2 is a good minimum value for fuzzy clustering McBratney and de Gruijter 1993).
e is the stopping criterion, which determines the level of convergence necessary before quitting (McBratney and de Gruijter 1993 recommend e = 0.001).
See also: