How to fix the number of clusters in clustering

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

How to fix the number of clusters in clustering

Jianye GE
Hi all,

I am trying to guess the best number of clusters by Silhouette coefficient. What I did was to set the number of clusters from 2, 3, ..., 10, and for each fixed number of clusters, do the clustering and use ClusterEvaluation to calculate Silhouette coefficient. The best number of clusters would be the number with the highest Silhouette coefficient.

However, what I found was when I set the number of clusters in the WEKA clustering algorithms, after clustering, the number of clusters may change to a different number. For example, I set the number of clusters as 2, but it may change to 1, 3 or other numbers after clustering. The followings are my codes.

                        EM model = new EM();
                        model.setNumClusters(i); // i is the number of clusters, from 2 to 10;
                        model.buildClusterer(data);
                       
So, my question is how can I fix the number of clusters in clustering, so that the clustering would not try the other numbers of clusters?

Thanks!
JG
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: How to fix the number of clusters in clustering

Michael Hall


On Nov 11, 2020, at 7:16 PM, [hidden email] wrote:

For example, I set the number of clusters as 2, but it may change to 1, 3 or other numbers after clustering. The followings are my codes.

EM model = new EM(); 
model.setNumClusters(i); // i is the number of clusters, from 2 to 10;
        model.buildClusterer(data);

Can’t tell from these three lines of code but I think it should honor the setNumClusters. That looks like a variable, how do you know for a given iteration the number of clusters doesn’t match the variable?

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html