Hiararchical clustering in WEKA

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Hiararchical clustering in WEKA

Ens SADEG Souhila
Hi,

I would like to use Weka to perform a hierarchical agglomerative clustering. But i don't now if the Hierarchical clusterer sses the aggolomerative or the diviive apporach.

My second question is about nominal attributes. Dos the hierarchical clusterer work well with nominative attributes ? i have data with 2 values (binary) 3 values (ternary).

Thank you very much.

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Hiararchical clustering in WEKA

Eibe Frank-2
Administrator
HierarchicalClusterer uses agglomerative clustering.

Nominal attributes are dealt with by the distance function that is selected as an option. By default, it’s Euclidean distance. This, by default, normalises numeric attributes to the [0,1] range. For nominal attributes, if the two values are the same, the distance is taken to be zero, otherwise it is taken to be one. Manhattan distance in WEKA works the same way.

Cheers,
Eibe

> On 7/06/2017, at 8:40 PM, Ens SADEG Souhila <[hidden email]> wrote:
>
> Hi,
>
> I would like to use Weka to perform a hierarchical agglomerative clustering. But i don't now if the Hierarchical clusterer sses the aggolomerative or the diviive apporach.
>
> My second question is about nominal attributes. Dos the hierarchical clusterer work well with nominative attributes ? i have data with 2 values (binary) 3 values (ternary).
>
> Thank you very much.
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Hiararchical clustering in WEKA

Ens SADEG Souhila
Thank you Eibe for your answer.

I have another problem with Hierarchical clustering : For some big datasets, it causes a stackOverflow.

My question is the following, if i increase the number of clusters, could this solve the problem since the algorithme will stop earlier ? 

May be the hierarchical clustering is not the suitable algorithm for big datasets, could you tell me about an algorithm to which i can indicate the number of clusters ? Simple K-means generally gives one cluster but i need at least two clusters.

Thank you for your assistance.


2017-06-08 2:51 GMT+01:00 Eibe Frank <[hidden email]>:
HierarchicalClusterer uses agglomerative clustering.

Nominal attributes are dealt with by the distance function that is selected as an option. By default, it’s Euclidean distance. This, by default, normalises numeric attributes to the [0,1] range. For nominal attributes, if the two values are the same, the distance is taken to be zero, otherwise it is taken to be one. Manhattan distance in WEKA works the same way.

Cheers,
Eibe

> On 7/06/2017, at 8:40 PM, Ens SADEG Souhila <[hidden email]> wrote:
>
> Hi,
>
> I would like to use Weka to perform a hierarchical agglomerative clustering. But i don't now if the Hierarchical clusterer sses the aggolomerative or the diviive apporach.
>
> My second question is about nominal attributes. Dos the hierarchical clusterer work well with nominative attributes ? i have data with 2 values (binary) 3 values (ternary).
>
> Thank you very much.
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Hiararchical clustering in WEKA

Eibe Frank-2
Administrator
Have you tried increasing the stack size for the Java virtual machine? The -Xms option can be used for this.

The number of clusters is a parameter for SimpleKMeans. By default, SimpleKMeans should give you two clusters.

Cheers,
Eibe

> On 8/06/2017, at 1:59 PM, Ens SADEG Souhila <[hidden email]> wrote:
>
> Thank you Eibe for your answer.
>
> I have another problem with Hierarchical clustering : For some big datasets, it causes a stackOverflow.
>
> My question is the following, if i increase the number of clusters, could this solve the problem since the algorithme will stop earlier ?
>
> May be the hierarchical clustering is not the suitable algorithm for big datasets, could you tell me about an algorithm to which i can indicate the number of clusters ? Simple K-means generally gives one cluster but i need at least two clusters.
>
> Thank you for your assistance.
>
>
> 2017-06-08 2:51 GMT+01:00 Eibe Frank <[hidden email]>:
> HierarchicalClusterer uses agglomerative clustering.
>
> Nominal attributes are dealt with by the distance function that is selected as an option. By default, it’s Euclidean distance. This, by default, normalises numeric attributes to the [0,1] range. For nominal attributes, if the two values are the same, the distance is taken to be zero, otherwise it is taken to be one. Manhattan distance in WEKA works the same way.
>
> Cheers,
> Eibe
>
> > On 7/06/2017, at 8:40 PM, Ens SADEG Souhila <[hidden email]> wrote:
> >
> > Hi,
> >
> > I would like to use Weka to perform a hierarchical agglomerative clustering. But i don't now if the Hierarchical clusterer sses the aggolomerative or the diviive apporach.
> >
> > My second question is about nominal attributes. Dos the hierarchical clusterer work well with nominative attributes ? i have data with 2 values (binary) 3 values (ternary).
> >
> > Thank you very much.
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Hiararchical clustering in WEKA

Ens SADEG Souhila
Tahnk you for your answer,

I can't increase the stack size because i run my programs on a distant cluster and as a simple user i don't have the rights to change any thing.

I tried to use simple kmeans but ever i require two clusters it gives me only one cluster (may be because the second is empty) contrary to the hierarchical that gives me the exact number i ask it to give me.

cheers,
Souhila

2017-06-08 4:15 GMT+01:00 Eibe Frank <[hidden email]>:
Have you tried increasing the stack size for the Java virtual machine? The -Xms option can be used for this.

The number of clusters is a parameter for SimpleKMeans. By default, SimpleKMeans should give you two clusters.

Cheers,
Eibe

> On 8/06/2017, at 1:59 PM, Ens SADEG Souhila <[hidden email]> wrote:
>
> Thank you Eibe for your answer.
>
> I have another problem with Hierarchical clustering : For some big datasets, it causes a stackOverflow.
>
> My question is the following, if i increase the number of clusters, could this solve the problem since the algorithme will stop earlier ?
>
> May be the hierarchical clustering is not the suitable algorithm for big datasets, could you tell me about an algorithm to which i can indicate the number of clusters ? Simple K-means generally gives one cluster but i need at least two clusters.
>
> Thank you for your assistance.
>
>
> 2017-06-08 2:51 GMT+01:00 Eibe Frank <[hidden email]>:
> HierarchicalClusterer uses agglomerative clustering.
>
> Nominal attributes are dealt with by the distance function that is selected as an option. By default, it’s Euclidean distance. This, by default, normalises numeric attributes to the [0,1] range. For nominal attributes, if the two values are the same, the distance is taken to be zero, otherwise it is taken to be one. Manhattan distance in WEKA works the same way.
>
> Cheers,
> Eibe
>
> > On 7/06/2017, at 8:40 PM, Ens SADEG Souhila <[hidden email]> wrote:
> >
> > Hi,
> >
> > I would like to use Weka to perform a hierarchical agglomerative clustering. But i don't now if the Hierarchical clusterer sses the aggolomerative or the diviive apporach.
> >
> > My second question is about nominal attributes. Dos the hierarchical clusterer work well with nominative attributes ? i have data with 2 values (binary) 3 values (ternary).
> >
> > Thank you very much.
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Hiararchical clustering in WEKA

Eibe Frank-2
Administrator

> On 8 Jun 2017, at 15:24, Ens SADEG Souhila <[hidden email]> wrote:
>
> I can't increase the stack size because i run my programs on a distant cluster and as a simple user i don't have the rights to change any thing.

You don't need admin rights. You can use the environment variable _JAVA_OPTIONS. On Linux, you'd use

export _JAVA_OPTIONS=-Xss500m

to set the stack size to 500MB.

Cheers,
Eibe


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Loading...