Assessing goodness of fit

The goodness of fit index is the ratio of the between cluster sum of squares error (SSE) and the within cluster SSE. Essentially, you want there to be larger differences between than within clusters, so you seek higher values of goodness of fit. However, places where the goodness-of-fit improves a lot are good clusters (so rather than peaks in the index, you look for jumps).

When you view the goodness of fit graph, you will see a few possible jumps: 4 clusters and one between 199 and 224. Often a very small number of clusters groups most of the data except for a few outliers, so we will focus on the higher number of clusters rather than 4 clusters.

  1. Focus the graph in towards the latter jumps.

    1. Right click on the plot and choose to Change Formatting.

    2. Reset the X axis minimum to 199, the maximum to 224.

    3. If you continue to focus in, you will find the jump is at 205 clusters.

  2. Next:  Find 205 clusters in the dataset.

Next