SimpleKMeans: Lloyd, Hartigan-Wong or something else?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

SimpleKMeans: Lloyd, Hartigan-Wong or something else?

Chivany van der Werff
Dear members of this mailing list,

At the moment I am testing if the results from the SimpleKMeans clustering algorithm can be reproduced in Python.
In Weka I have used all the default parameters except for the number of clusters. Now I am trying to fill in the parameters for the KMeans algorithm in Python and one of the parameters is 'algorithm'. So my question is if the SimpleKMeans algorithm is Lloyd's or Hartigan-Wong's algorithm or maybe even some other algorithm? I was not able to find a certain answer to the question on the internet so I hope someone here knows. I'm doing this little test as part of my master thesis so if someone could enlighten me I'd be very grateful.

Kind regards,

Chivany van der Werff
Leiden Institute of Computer Science

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: SimpleKMeans: Lloyd, Hartigan-Wong or something else?

Eibe Frank-2
Administrator

SimpleKMeans implements Lloyd’s algorithm.

 

You will probably need to look at EuclideanDistance as well to make sure that you are configuring the implementations as similarly as possible. For example, EuclideanDistance normalises all numeric attributes to the range [0,1] by default.

 

Anyway, given that k-means converges to a local optimum, establishing equivalence of two implementations is tricky.

 

Cheers,

Eibe

 

From: [hidden email]
Sent: Monday, 15 July 2019 1:54 PM
To: [hidden email]
Subject: [Wekalist] SimpleKMeans: Lloyd, Hartigan-Wong or something else?

 

Dear members of this mailing list,

 

At the moment I am testing if the results from the SimpleKMeans clustering algorithm can be reproduced in Python.

In Weka I have used all the default parameters except for the number of clusters. Now I am trying to fill in the parameters for the KMeans algorithm in Python and one of the parameters is 'algorithm'. So my question is if the SimpleKMeans algorithm is Lloyd's or Hartigan-Wong's algorithm or maybe even some other algorithm? I was not able to find a certain answer to the question on the internet so I hope someone here knows. I'm doing this little test as part of my master thesis so if someone could enlighten me I'd be very grateful.

 

Kind regards,

 

Chivany van der Werff

Leiden Institute of Computer Science

 


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: SimpleKMeans: Lloyd, Hartigan-Wong or something else?

Chivany van der Werff
Dear Eibe,

Thank you so much for your quick response!
You're saving me a lot of experimenting with the answer as well as the additional information.
Is there any place online where I can find it being stated that SimpleKMeans implements Lloyd's algorithm so I can refer to it? Or is there any other way to refer to the information you have given me?

I'll try my best to find a similar implementation in Python which is the experiment I am about to perform. If I somewhat succeed I'll share the results if anyone's interested. I also planned on trying it out in R but I'm starting off with Python first.

Anyway, thanks again! I immensely appreciate it!

Kind regards,

Chivany van der Werff
Leiden Institute of Computer Science


Van: [hidden email] <[hidden email]> namens Eibe Frank <[hidden email]>
Verzonden: maandag 15 juli 2019 11:54
Aan: Weka machine learning workbench list.
Onderwerp: Re: [Wekalist] SimpleKMeans: Lloyd, Hartigan-Wong or something else?
 

SimpleKMeans implements Lloyd’s algorithm.

 

You will probably need to look at EuclideanDistance as well to make sure that you are configuring the implementations as similarly as possible. For example, EuclideanDistance normalises all numeric attributes to the range [0,1] by default.

 

Anyway, given that k-means converges to a local optimum, establishing equivalence of two implementations is tricky.

 

Cheers,

Eibe

 

From: [hidden email]
Sent: Monday, 15 July 2019 1:54 PM
To: [hidden email]
Subject: [Wekalist] SimpleKMeans: Lloyd, Hartigan-Wong or something else?

 

Dear members of this mailing list,

 

At the moment I am testing if the results from the SimpleKMeans clustering algorithm can be reproduced in Python.

In Weka I have used all the default parameters except for the number of clusters. Now I am trying to fill in the parameters for the KMeans algorithm in Python and one of the parameters is 'algorithm'. So my question is if the SimpleKMeans algorithm is Lloyd's or Hartigan-Wong's algorithm or maybe even some other algorithm? I was not able to find a certain answer to the question on the internet so I hope someone here knows. I'm doing this little test as part of my master thesis so if someone could enlighten me I'd be very grateful.

 

Kind regards,

 

Chivany van der Werff

Leiden Institute of Computer Science

 


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: SimpleKMeans: Lloyd, Hartigan-Wong or something else?

Eibe Frank-3
You could refer to the source code, e.g., the main trunk version:


Cheers,
Eibe

On Tue, Jul 16, 2019 at 9:15 AM Chivany van der Werff <[hidden email]> wrote:
Dear Eibe,

Thank you so much for your quick response!
You're saving me a lot of experimenting with the answer as well as the additional information.
Is there any place online where I can find it being stated that SimpleKMeans implements Lloyd's algorithm so I can refer to it? Or is there any other way to refer to the information you have given me?

I'll try my best to find a similar implementation in Python which is the experiment I am about to perform. If I somewhat succeed I'll share the results if anyone's interested. I also planned on trying it out in R but I'm starting off with Python first.

Anyway, thanks again! I immensely appreciate it!

Kind regards,

Chivany van der Werff
Leiden Institute of Computer Science


Van: [hidden email] <[hidden email]> namens Eibe Frank <[hidden email]>
Verzonden: maandag 15 juli 2019 11:54
Aan: Weka machine learning workbench list.
Onderwerp: Re: [Wekalist] SimpleKMeans: Lloyd, Hartigan-Wong or something else?
 

SimpleKMeans implements Lloyd’s algorithm.

 

You will probably need to look at EuclideanDistance as well to make sure that you are configuring the implementations as similarly as possible. For example, EuclideanDistance normalises all numeric attributes to the range [0,1] by default.

 

Anyway, given that k-means converges to a local optimum, establishing equivalence of two implementations is tricky.

 

Cheers,

Eibe

 

From: [hidden email]
Sent: Monday, 15 July 2019 1:54 PM
To: [hidden email]
Subject: [Wekalist] SimpleKMeans: Lloyd, Hartigan-Wong or something else?

 

Dear members of this mailing list,

 

At the moment I am testing if the results from the SimpleKMeans clustering algorithm can be reproduced in Python.

In Weka I have used all the default parameters except for the number of clusters. Now I am trying to fill in the parameters for the KMeans algorithm in Python and one of the parameters is 'algorithm'. So my question is if the SimpleKMeans algorithm is Lloyd's or Hartigan-Wong's algorithm or maybe even some other algorithm? I was not able to find a certain answer to the question on the internet so I hope someone here knows. I'm doing this little test as part of my master thesis so if someone could enlighten me I'd be very grateful.

 

Kind regards,

 

Chivany van der Werff

Leiden Institute of Computer Science

 

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html