Is there any approach for selecting the best clustering?
Hi. The k-means algorithm is non deterministic and can deliver very different results, depending on the initialization of the centroids. So, in real world scenarios, it is not easy to choose the best output of the k-means. I would like to know if there is some approach for selecting the best k-means output in an automatic way. Best regards.
Re: Is there any approach for selecting the best clustering?
For a fixed number of clusters k in SimpleKMeans, you can just pick the solution that gives you the smallest sum of squared errors on the training data (shown in the output as "Within cluster sum of squared errors”).
Incidentally, on the iris data, there is a nice correspondence between that measure and the classification error from a classes-to-clusters evaluation: