How does MPCluster find clusters?
Mathematically complete clustering (i.e. one that checks all possibilities and will always return the optimum result) is an NP-complete problem. This means it takes an exponential period of time to execute, and even simple input datasets would take many years to compute. Therefore practical clustering algorithms use a range of techniques to produce a 'good' result in a reasonable amount of time. MPCluster gives you the option of using two of these algorithms.
The K-Means algorithm actually implements the k-means++ algorithm. This chooses an initial set of cluster positions and then iterates through the data, improving the cluster positions incrementally. Ideally this stops when the clusters stop moving. MPCluster repeats the process a number of times, and chooses the best result. The process can also be stopped by the user after a few minutes of processing.
The Hierarchical algorithm works by 'joining' data points that are close together to form new clusters. These then iteratively acquire neighboring points until a cluster that meets the required parameters is found. In theory a hierarchical algorithm can produce one large cluster with an internal representation of the relationships of the component data points and sub-groups. MPCluster could be expanded to perform this if a suitable real work geographic application can be found.
Why is MPCluster not listing or plotting all the clusters that I ask for?
MPCluster may not always allocate all clusters. Clusters that end up containing zero data locations are treated as 'null' (i.e. empty) clusters. These are not reported to the Excel workbook or plotted on the map.
Empty clusters can occur if you request more clusters than the data can accept. For the K-Means algorithm, it can occur if MPCluster has mis-located clusters so they cannot attract sufficient pushpins. Try setting the Re-allocate Empty Clusters option to reduce this effect.
Why does MPCluster produce clusters that 'tile' all input data points?
This will occur if you do not give MPCluster enough constraints. I.e. the clusters can be of any size, and they grow to incorporate all available data points. The effect can look like a territory tiling. This can be useful for some applications, but it does not represent true clustering.
This is easily solved by setting the maximum size of the clusters. Do this by use the Maximum number of points per cluster and/or Maximum cluster radius options.