Choosing fuzzy classification parameters

To perform a fuzzy classification, you must choose values for the number of classes (k), the fuzziness of the classification (phi), and the stopping criterion (epsilon). BoundarySeer provides some preset defaults for these settings, so you may classify your data without entering any values. You may wish to test the influence of these parameters on the classification by repeating the analysis and varying the parameters.

How many classes? Choosing a value for k

Choosing an appropriate number of classes is the eternal classification problem. Classification techniques will produce the number of clusters specified, regardless of whether they are meaningful distinctions. The k-means technique for fuzzy classification maximizes between-cluster variation for a set number of clusters (k). You may wish to check on how the chosen value of k influences the clustering by comparing the outcomes for a range of k values.

If you have a sense of the number of clusters that is appropriate for your data, use that. For a first pass, you might try a "rule-of-thumb" from hard clustering: k = n½ (McBratney and Moore 1985) where n = the number of objects in the data set.

How fuzzy? Choosing a value for f

f, phi, determines the fuzziness of the classification. When phi is set to one (not possible in BoundarySeer), the clustering is hard clustering, with binary class membership (yes/no). Phi values for fuzzy clustering can range from just above 1 to infinity. Yet, at very high phi values, the classification may be so fuzzy as to not distinguish any classes at all. The choice of phi will balance the need for structure (distinguishable classes) from continuity (fuzziness). A common starting place is phi = 2 (McBratney and deGruijter 1992). As phi approaches one, clustering becomes more difficult (McBratney and Moore 1985), so values lower than 1.1 may not produce good results.

How optimal? Choosing a value for e

BoundarySeer will continually reallocate class membership values between the classes until it arrives at an optimal arrangement. The cutoff for the optimization is e, epsilon. BoundarySeer minimizes the within-class least-squared error term. Once BoundarySeer is changing the matrix of membership values by very small amounts, it is time to stop optimization. BoundarySeer compares matrices of membership values by the largest proportional difference between membership values (i.e. if a membership value is 0.75 and it changes by 0.03, then the proportional difference is 0.03/0.75 = 0.04). McBratney and deGruijter (1992) recommend epsilon = 0.001. That would be a change of 0.00075 in a membership value of 0.75. All proportionate differences for each class membership value for each location are calculated, and the largest must be less than epsilon.


See also: