Evaluating clusters with unlabeled instances

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Evaluating clusters with unlabeled instances

haytham.salhi
Hello Weka, 

I am wondering, is it feasible to evaluate clusters using "classes to clusters evaluation" with some unlabeled instances so that these unlabeled clusters are not taken into account when doing the evaluation. That is, we mean to have them (i.e, unlabeled instances) when generating the clusters, yet we need to exclude them from the clustering evaluation. To be precise, please let's look at the following example:

Assume we have the following instances:
@attribute theClass {A,B}
@attribute a1 numeric
@attribute a2 numeric
@attribute a3 numeric
@attribute a4 numeric
@attribute a5 numeric
@data
{0 B,2 1,3 1,5 1}
{0 ?,2 1,5 1}
{1 1,4 2}
{2 1,4 2}
As we observe, the first instance has class B, the second one's class is missing (we don't care of it), the third and fourth instances have class A.
Now what k-means (with k = 2) generates is: 

#############################
Clustered Instances

0      2 ( 50%)
1      2 ( 50%)

Class attribute: theClass
Classes to Clusters:

 0 1  <-- assigned to cluster
 2 1 | A
 0 1 | B
Cluster 0 <-- A
Cluster 1 <-- B
Incorrectly clustered instances :	1.0	 25      %
##############################
We see that the we have two clusters, each cluster has two instances but when doing the evaluation, we have the unlabeled instance clustered in the cluster1 (prediction) and considered as A (as actual, even though its label does not exist); thus we have it as incorrectly clustered instance. What I desire to do, is to exclude (ignore) this from the evaluation and so have incorrectly clustered instances of 0%.
How can I achieve this? Moreover, how can I calculate the other measures like (precision, recall) in the light of the ClassifciationViaClustering which does not work when a class is missing.

Best,
Haytham

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Evaluating clusters with unlabeled instances

Alexander Osherenko
If you are interested in evaluating of clustering, there is the section 16.3, p. 356 about it (containing different measures such as Purity, Normalized mutual information, Rand index, F measure):


from Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press. 2008.

​Best, Alexander​

--
Alexander Osherenko, Dr. rer. nat.
Senior HCI architect


2017-04-26 0:55 GMT+01:00 Haytham Salhi <[hidden email]>:
Hello Weka, 

I am wondering, is it feasible to evaluate clusters using "classes to clusters evaluation" with some unlabeled instances so that these unlabeled clusters are not taken into account when doing the evaluation. That is, we mean to have them (i.e, unlabeled instances) when generating the clusters, yet we need to exclude them from the clustering evaluation. To be precise, please let's look at the following example:

Assume we have the following instances:
@attribute theClass {A,B}
@attribute a1 numeric
@attribute a2 numeric
@attribute a3 numeric
@attribute a4 numeric
@attribute a5 numeric
@data
{0 B,2 1,3 1,5 1}
{0 ?,2 1,5 1}
{1 1,4 2}
{2 1,4 2}
As we observe, the first instance has class B, the second one's class is missing (we don't care of it), the third and fourth instances have class A.
Now what k-means (with k = 2) generates is: 

#############################
Clustered Instances

0      2 ( 50%)
1      2 ( 50%)

Class attribute: theClass
Classes to Clusters:

 0 1  <-- assigned to cluster
 2 1 | A
 0 1 | B
Cluster 0 <-- A
Cluster 1 <-- B
Incorrectly clustered instances :	1.0	 25      %
##############################
We see that the we have two clusters, each cluster has two instances but when doing the evaluation, we have the unlabeled instance clustered in the cluster1 (prediction) and considered as A (as actual, even though its label does not exist); thus we have it as incorrectly clustered instance. What I desire to do, is to exclude (ignore) this from the evaluation and so have incorrectly clustered instances of 0%.
How can I achieve this? Moreover, how can I calculate the other measures like (precision, recall) in the light of the ClassifciationViaClustering which does not work when a class is missing.

Best,
Haytham

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Evaluating clusters with unlabeled instances

Eibe Frank-2
Administrator
In reply to this post by haytham.salhi
As you say, the unlabeled instance should be excluded from the evaluation. I've just committed a bug fix for this into the main trunk (3.9.2-SNAPSHOT). The next nightly snapshot should have this fix. Thanks for the bug report.

It sounds like ClassificationViaClustering should be fixed as well.

Cheers,
Eibe

> On 26 Apr 2017, at 11:55, Haytham Salhi <[hidden email]> wrote:
>
> Hello Weka,
>
> I am wondering, is it feasible to evaluate clusters using "classes to clusters evaluation" with some unlabeled instances so that these unlabeled clusters are not taken into account when doing the evaluation. That is, we mean to have them (i.e, unlabeled instances) when generating the clusters, yet we need to exclude them from the clustering evaluation. To be precise, please let's look at the following example:
>
> Assume we have the following instances:
> @attribute theClass {A,B}
> @attribute a1 numeric
> @attribute a2 numeric
> @attribute a3 numeric
>
> @attribute a4 numeric
> @attribute a5 numeric
>
> @data
> {0 B,2 1,3 1,5 1}
> {0 ?,2 1,5 1}
> {1 1,4 2}
> {2 1,4 2}
>
> As we observe, the first instance has class B, the second one's class is missing (we don't care of it), the third and fourth instances have class A.
> Now what k-means (with k = 2) generates is:
>
>
> #############################
> Clustered Instances
>
> 0      2 ( 50%)
> 1      2 ( 50%)
>
> Class attribute: theClass
> Classes to Clusters:
>
>  0 1  <-- assigned to cluster
>  2 1 | A
>  0 1 | B
>
> Cluster 0 <-- A
> Cluster 1 <-- B
>
> Incorrectly clustered instances : 1.0 25      %
> ##############################
>
> We see that the we have two clusters, each cluster has two instances but when doing the evaluation, we have the unlabeled instance clustered in the cluster1 (prediction) and considered as A (as actual, even though its label does not exist); thus we have it as incorrectly clustered instance. What I desire to do, is to exclude (ignore) this from the evaluation and so have incorrectly clustered instances of 0%.
> How can I achieve this? Moreover, how can I calculate the other measures like (precision, recall) in the light of the ClassifciationViaClustering which does not work when a class is missing.
>
> Best,
> Haytham
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Evaluating clusters with unlabeled instances

haytham.salhi
Thanks a lot Eibe. Could you please let me know when ClassificationViaClustering component will be fixed? Can we have it today or tomorrow?

On Wed, Apr 26, 2017 at 11:43 AM, Eibe Frank <[hidden email]> wrote:
As you say, the unlabeled instance should be excluded from the evaluation. I've just committed a bug fix for this into the main trunk (3.9.2-SNAPSHOT). The next nightly snapshot should have this fix. Thanks for the bug report.

It sounds like ClassificationViaClustering should be fixed as well.

Cheers,
Eibe

> On 26 Apr 2017, at 11:55, Haytham Salhi <[hidden email]> wrote:
>
> Hello Weka,
>
> I am wondering, is it feasible to evaluate clusters using "classes to clusters evaluation" with some unlabeled instances so that these unlabeled clusters are not taken into account when doing the evaluation. That is, we mean to have them (i.e, unlabeled instances) when generating the clusters, yet we need to exclude them from the clustering evaluation. To be precise, please let's look at the following example:
>
> Assume we have the following instances:
> @attribute theClass {A,B}
> @attribute a1 numeric
> @attribute a2 numeric
> @attribute a3 numeric
>
> @attribute a4 numeric
> @attribute a5 numeric
>
> @data
> {0 B,2 1,3 1,5 1}
> {0 ?,2 1,5 1}
> {1 1,4 2}
> {2 1,4 2}
>
> As we observe, the first instance has class B, the second one's class is missing (we don't care of it), the third and fourth instances have class A.
> Now what k-means (with k = 2) generates is:
>
>
> #############################
> Clustered Instances
>
> 0      2 ( 50%)
> 1      2 ( 50%)
>
> Class attribute: theClass
> Classes to Clusters:
>
>  0 1  <-- assigned to cluster
>  2 1 | A
>  0 1 | B
>
> Cluster 0 <-- A
> Cluster 1 <-- B
>
> Incorrectly clustered instances :     1.0      25      %
> ##############################
>
> We see that the we have two clusters, each cluster has two instances but when doing the evaluation, we have the unlabeled instance clustered in the cluster1 (prediction) and considered as A (as actual, even though its label does not exist); thus we have it as incorrectly clustered instance. What I desire to do, is to exclude (ignore) this from the evaluation and so have incorrectly clustered instances of 0%.
> How can I achieve this? Moreover, how can I calculate the other measures like (precision, recall) in the light of the ClassifciationViaClustering which does not work when a class is missing.
>
> Best,
> Haytham
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Evaluating clusters with unlabeled instances

Mark Hall
I've released a new version of the classificationViaClustering package that incorporates the fix. You can get it via the package manager.

Cheers,
Mark.

On 27/04/17, 2:23 AM, "Haytham Salhi" <[hidden email] on behalf of [hidden email]> wrote:

    Thanks a lot Eibe. Could you please let me know when ClassificationViaClustering component will be fixed? Can we have it today or tomorrow?
   
    On Wed, Apr 26, 2017 at 11:43 AM, Eibe Frank <[hidden email]> wrote:
   
    As you say, the unlabeled instance should be excluded from the evaluation. I've just committed a bug fix for this into the main trunk (3.9.2-SNAPSHOT). The next nightly snapshot should have this fix. Thanks for the bug report.
   
    It sounds like ClassificationViaClustering should be fixed as well.
   
    Cheers,
    Eibe
   
    > On 26 Apr 2017, at 11:55, Haytham Salhi <[hidden email]> wrote:
    >
    > Hello Weka,
    >
    > I am wondering, is it feasible to evaluate clusters using "classes to clusters evaluation" with some unlabeled instances so that these unlabeled clusters are not taken into account when doing the evaluation. That is, we mean to have them (i.e, unlabeled instances) when generating the clusters, yet we need to exclude them from the clustering evaluation. To be precise, please let's look at the following example:
    >
    > Assume we have the following instances:
    > @attribute theClass {A,B}
    > @attribute a1 numeric
    > @attribute a2 numeric
    > @attribute a3 numeric
    >
    > @attribute a4 numeric
    > @attribute a5 numeric
    >
    > @data
    > {0 B,2 1,3 1,5 1}
    > {0 ?,2 1,5 1}
    > {1 1,4 2}
    > {2 1,4 2}
    >
    > As we observe, the first instance has class B, the second one's class is missing (we don't care of it), the third and fourth instances have class A.
    > Now what k-means (with k = 2) generates is:
    >
    >
    > #############################
    > Clustered Instances
    >
    > 0      2 ( 50%)
    > 1      2 ( 50%)
    >
    > Class attribute: theClass
    > Classes to Clusters:
    >
    >  0 1  <-- assigned to cluster
    >  2 1 | A
    >  0 1 | B
    >
    > Cluster 0 <-- A
    > Cluster 1 <-- B
    >
    > Incorrectly clustered instances :     1.0      25      %
    > ##############################
    >
    > We see that the we have two clusters, each cluster has two instances but when doing the evaluation, we have the unlabeled instance clustered in the cluster1 (prediction) and considered as A (as actual, even though its label does not exist); thus we have it as incorrectly clustered instance. What I desire to do, is to exclude (ignore) this from the evaluation and so have incorrectly clustered instances of 0%.
    > How can I achieve this? Moreover, how can I calculate the other measures like (precision, recall) in the light of the ClassifciationViaClustering which does not work when a class is missing.
    >
    > Best,
    > Haytham
   
   
    > _______________________________________________
    > Wekalist mailing list
    > Send posts to: [hidden email]
    > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
    > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
   
    _______________________________________________
    Wekalist mailing list
    Send posts to: [hidden email]
    List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
    List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
   
   
   
   
   
   
   
    _______________________________________________
    Wekalist mailing list
    Send posts to: [hidden email]
    List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
    List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
   


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Evaluating clusters with unlabeled instances

haytham.salhi
Thanks Mark - appreciated. Can we have a maven release for the latest version (1.0.5) of classificationViaClustering package? Also, I am wondering, how can I get the nightly snapshot (3.9.2-SNAPSHOT) of weka? The latest dev version on maven is 3.9.1. 



On Thu, Apr 27, 2017 at 6:12 AM, Mark Hall <[hidden email]> wrote:
I've released a new version of the classificationViaClustering package that incorporates the fix. You can get it via the package manager.

Cheers,
Mark.

On 27/04/17, 2:23 AM, "Haytham Salhi" <[hidden email] on behalf of [hidden email]> wrote:

    Thanks a lot Eibe. Could you please let me know when ClassificationViaClustering component will be fixed? Can we have it today or tomorrow?

    On Wed, Apr 26, 2017 at 11:43 AM, Eibe Frank <[hidden email]> wrote:

    As you say, the unlabeled instance should be excluded from the evaluation. I've just committed a bug fix for this into the main trunk (3.9.2-SNAPSHOT). The next nightly snapshot should have this fix. Thanks for the bug report.

    It sounds like ClassificationViaClustering should be fixed as well.

    Cheers,
    Eibe

    > On 26 Apr 2017, at 11:55, Haytham Salhi <[hidden email]> wrote:
    >
    > Hello Weka,
    >
    > I am wondering, is it feasible to evaluate clusters using "classes to clusters evaluation" with some unlabeled instances so that these unlabeled clusters are not taken into account when doing the evaluation. That is, we mean to have them (i.e, unlabeled instances) when generating the clusters, yet we need to exclude them from the clustering evaluation. To be precise, please let's look at the following example:
    >
    > Assume we have the following instances:
    > @attribute theClass {A,B}
    > @attribute a1 numeric
    > @attribute a2 numeric
    > @attribute a3 numeric
    >
    > @attribute a4 numeric
    > @attribute a5 numeric
    >
    > @data
    > {0 B,2 1,3 1,5 1}
    > {0 ?,2 1,5 1}
    > {1 1,4 2}
    > {2 1,4 2}
    >
    > As we observe, the first instance has class B, the second one's class is missing (we don't care of it), the third and fourth instances have class A.
    > Now what k-means (with k = 2) generates is:
    >
    >
    > #############################
    > Clustered Instances
    >
    > 0      2 ( 50%)
    > 1      2 ( 50%)
    >
    > Class attribute: theClass
    > Classes to Clusters:
    >
    >  0 1  <-- assigned to cluster
    >  2 1 | A
    >  0 1 | B
    >
    > Cluster 0 <-- A
    > Cluster 1 <-- B
    >
    > Incorrectly clustered instances :     1.0      25      %
    > ##############################
    >
    > We see that the we have two clusters, each cluster has two instances but when doing the evaluation, we have the unlabeled instance clustered in the cluster1 (prediction) and considered as A (as actual, even though its label does not exist); thus we have it as incorrectly clustered instance. What I desire to do, is to exclude (ignore) this from the evaluation and so have incorrectly clustered instances of 0%.
    > How can I achieve this? Moreover, how can I calculate the other measures like (precision, recall) in the light of the ClassifciationViaClustering which does not work when a class is missing.
    >
    > Best,
    > Haytham


    > _______________________________________________
    > Wekalist mailing list
    > Send posts to: [hidden email]
    > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
    > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

    _______________________________________________
    Wekalist mailing list
    Send posts to: [hidden email]
    List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
    List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html







    _______________________________________________
    Wekalist mailing list
    Send posts to: [hidden email]
    List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
    List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Evaluating clusters with unlabeled instances

Peter Reutemann-3
On April 28, 2017 2:37:14 AM GMT+12:00, Haytham Salhi <[hidden email]> wrote:
Thanks Mark - appreciated. Can we have a maven release for the latest version (1.0.5) of classificationViaClustering package? Also, I am wondering, how can I get the nightly snapshot (3.9.2-SNAPSHOT) of weka? The latest dev version on maven is 3.9.1. 



On Thu, Apr 27, 2017 at 6:12 AM, Mark Hall <[hidden email]> wrote:
I've released a new version of the classificationViaClustering package that incorporates the fix. You can get it via the package manager.

Cheers,
Mark.

On 27/04/17, 2:23 AM, "Haytham Salhi" <[hidden email] on behalf of [hidden email]> wrote:

    Thanks a lot Eibe. Could you please let me know when ClassificationViaClustering component will be fixed? Can we have it today or tomorrow?

    On Wed, Apr 26, 2017 at 11:43 AM, Eibe Frank <[hidden email]> wrote:

    As you say, the unlabeled instance should be excluded from the evaluation. I've just committed a bug fix for this into the main trunk (3.9.2-SNAPSHOT). The next nightly snapshot should have this fix. Thanks for the bug report.

    It sounds like ClassificationViaClustering should be fixed as well.

    Cheers,
    Eibe

    > On 26 Apr 2017, at 11:55, Haytham Salhi <[hidden email]> wrote:
    >
    > Hello Weka,
    >
    > I am wondering, is it feasible to evaluate clusters using "classes to clusters evaluation" with some unlabeled instances so that these unlabeled clusters are not taken into account when doing the evaluation. That is, we mean to have them (i.e, unlabeled instances) when generating the clusters, yet we need to exclude them from the clustering evaluation. To be precise, please let's look at the following example:
    >
    > Assume we have the following instances:
    > @attribute theClass {A,B}
    > @attribute a1 numeric
    > @attribute a2 numeric
    > @attribute a3 numeric
    >
    > @attribute a4 numeric
    > @attribute a5 numeric
    >
    > @data
    > {0 B,2 1,3 1,5 1}
    > {0 ?,2 1,5 1}
    > {1 1,4 2}
    > {2 1,4 2}
    >
    > As we observe, the first instance has class B, the second one's class is missing (we don't care of it), the third and fourth instances have class A.
    > Now what k-means (with k = 2) generates is:
    >
    >
    > #############################
    > Clustered Instances
    >
    > 0      2 ( 50%)
    > 1      2 ( 50%)
    >
    > Class attribute: theClass
    > Classes to Clusters:
    >
    >  0 1  <-- assigned to cluster
    >  2 1 | A
    >  0 1 | B
    >
    > Cluster 0 <-- A
    > Cluster 1 <-- B
    >
    > Incorrectly clustered instances :     1.0      25      %
    > ##############################
    >
    > We see that the we have two clusters, each cluster has two instances but when doing the evaluation, we have the unlabeled instance clustered in the cluster1 (prediction) and considered as A (as actual, even though its label does not exist); thus we have it as incorrectly clustered instance. What I desire to do, is to exclude (ignore) this from the evaluation and so have incorrectly clustered instances of 0%.
    > How can I achieve this? Moreover, how can I calculate the other measures like (precision, recall) in the light of the ClassifciationViaClustering which does not work when a class is missing.
    >
    > Best,
    > Haytham


    > _______________________________________________
    > Wekalist mailing list
    > Send posts to: [hidden email]
    > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
    > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

    _______________________________________________
    Wekalist mailing list
    Send posts to: [hidden email]
    List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
    List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html







    _______________________________________________
    Wekalist mailing list
    Send posts to: [hidden email]
    List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
    List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


Only releases of Weka / packages get published on Maven Central. For all other cases, you have to compile the artifacts yourself at this stage.

Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 858-5174
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Evaluating clusters with unlabeled instances

haytham.salhi
Hello again,

Further, kindly note that the default capabilities of classificationViaClustering classifier should be updated accordingly. The class must enable NO_CLASS capability which otherwise the class would raise an exception. 

Best,
Haytham

On Thu, Apr 27, 2017 at 9:05 PM, Peter Reutemann <[hidden email]> wrote:
On April 28, 2017 2:37:14 AM GMT+12:00, Haytham Salhi <[hidden email]> wrote:
Thanks Mark - appreciated. Can we have a maven release for the latest version (1.0.5) of classificationViaClustering package? Also, I am wondering, how can I get the nightly snapshot (3.9.2-SNAPSHOT) of weka? The latest dev version on maven is 3.9.1. 



On Thu, Apr 27, 2017 at 6:12 AM, Mark Hall <[hidden email]> wrote:
I've released a new version of the classificationViaClustering package that incorporates the fix. You can get it via the package manager.

Cheers,
Mark.

On 27/04/17, 2:23 AM, "Haytham Salhi" <[hidden email] on behalf of [hidden email]> wrote:

    Thanks a lot Eibe. Could you please let me know when ClassificationViaClustering component will be fixed? Can we have it today or tomorrow?

    On Wed, Apr 26, 2017 at 11:43 AM, Eibe Frank <[hidden email]> wrote:

    As you say, the unlabeled instance should be excluded from the evaluation. I've just committed a bug fix for this into the main trunk (3.9.2-SNAPSHOT). The next nightly snapshot should have this fix. Thanks for the bug report.

    It sounds like ClassificationViaClustering should be fixed as well.

    Cheers,
    Eibe

    > On 26 Apr 2017, at 11:55, Haytham Salhi <[hidden email]> wrote:
    >
    > Hello Weka,
    >
    > I am wondering, is it feasible to evaluate clusters using "classes to clusters evaluation" with some unlabeled instances so that these unlabeled clusters are not taken into account when doing the evaluation. That is, we mean to have them (i.e, unlabeled instances) when generating the clusters, yet we need to exclude them from the clustering evaluation. To be precise, please let's look at the following example:
    >
    > Assume we have the following instances:
    > @attribute theClass {A,B}
    > @attribute a1 numeric
    > @attribute a2 numeric
    > @attribute a3 numeric
    >
    > @attribute a4 numeric
    > @attribute a5 numeric
    >
    > @data
    > {0 B,2 1,3 1,5 1}
    > {0 ?,2 1,5 1}
    > {1 1,4 2}
    > {2 1,4 2}
    >
    > As we observe, the first instance has class B, the second one's class is missing (we don't care of it), the third and fourth instances have class A.
    > Now what k-means (with k = 2) generates is:
    >
    >
    > #############################
    > Clustered Instances
    >
    > 0      2 ( 50%)
    > 1      2 ( 50%)
    >
    > Class attribute: theClass
    > Classes to Clusters:
    >
    >  0 1  <-- assigned to cluster
    >  2 1 | A
    >  0 1 | B
    >
    > Cluster 0 <-- A
    > Cluster 1 <-- B
    >
    > Incorrectly clustered instances :     1.0      25      %
    > ##############################
    >
    > We see that the we have two clusters, each cluster has two instances but when doing the evaluation, we have the unlabeled instance clustered in the cluster1 (prediction) and considered as A (as actual, even though its label does not exist); thus we have it as incorrectly clustered instance. What I desire to do, is to exclude (ignore) this from the evaluation and so have incorrectly clustered instances of 0%.
    > How can I achieve this? Moreover, how can I calculate the other measures like (precision, recall) in the light of the ClassifciationViaClustering which does not work when a class is missing.
    >
    > Best,
    > Haytham


    > _______________________________________________
    > Wekalist mailing list
    > Send posts to: [hidden email]
    > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
    > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

    _______________________________________________
    Wekalist mailing list
    Send posts to: [hidden email]
    List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
    List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html







    _______________________________________________
    Wekalist mailing list
    Send posts to: [hidden email]
    List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
    List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


Only releases of Weka / packages get published on Maven Central. For all other cases, you have to compile the artifacts yourself at this stage.

Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
<a href="tel:+64%207-858%205174" value="+6478585174" target="_blank">+64 (7) 858-5174
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Evaluating clusters with unlabeled instances

Eibe Frank-2
Administrator
I don’t think that would be appropriate: the classifier needs to have access to class values so that the classes-to-clusters mapping can be established.

Cheers,
Eibe

> On 18/05/2017, at 10:31 AM, Haytham Salhi <[hidden email]> wrote:
>
> Hello again,
>
> Further, kindly note that the default capabilities of classificationViaClustering classifier should be updated accordingly. The class must enable NO_CLASS capability which otherwise the class would raise an exception.
>
> Best,
> Haytham
>
> On Thu, Apr 27, 2017 at 9:05 PM, Peter Reutemann <[hidden email]> wrote:
> On April 28, 2017 2:37:14 AM GMT+12:00, Haytham Salhi <[hidden email]> wrote:
> Thanks Mark - appreciated. Can we have a maven release for the latest version (1.0.5) of classificationViaClustering package? Also, I am wondering, how can I get the nightly snapshot (3.9.2-SNAPSHOT) of weka? The latest dev version on maven is 3.9.1.
>
>
>
> On Thu, Apr 27, 2017 at 6:12 AM, Mark Hall <[hidden email]> wrote:
> I've released a new version of the classificationViaClustering package that incorporates the fix. You can get it via the package manager.
>
> Cheers,
> Mark.
>
> On 27/04/17, 2:23 AM, "Haytham Salhi" <[hidden email] on behalf of [hidden email]> wrote:
>
>     Thanks a lot Eibe. Could you please let me know when ClassificationViaClustering component will be fixed? Can we have it today or tomorrow?
>
>     On Wed, Apr 26, 2017 at 11:43 AM, Eibe Frank <[hidden email]> wrote:
>
>     As you say, the unlabeled instance should be excluded from the evaluation. I've just committed a bug fix for this into the main trunk (3.9.2-SNAPSHOT). The next nightly snapshot should have this fix. Thanks for the bug report.
>
>     It sounds like ClassificationViaClustering should be fixed as well.
>
>     Cheers,
>     Eibe
>
>     > On 26 Apr 2017, at 11:55, Haytham Salhi <[hidden email]> wrote:
>     >
>     > Hello Weka,
>     >
>     > I am wondering, is it feasible to evaluate clusters using "classes to clusters evaluation" with some unlabeled instances so that these unlabeled clusters are not taken into account when doing the evaluation. That is, we mean to have them (i.e, unlabeled instances) when generating the clusters, yet we need to exclude them from the clustering evaluation. To be precise, please let's look at the following example:
>     >
>     > Assume we have the following instances:
>     > @attribute theClass {A,B}
>     > @attribute a1 numeric
>     > @attribute a2 numeric
>     > @attribute a3 numeric
>     >
>     > @attribute a4 numeric
>     > @attribute a5 numeric
>     >
>     > @data
>     > {0 B,2 1,3 1,5 1}
>     > {0 ?,2 1,5 1}
>     > {1 1,4 2}
>     > {2 1,4 2}
>     >
>     > As we observe, the first instance has class B, the second one's class is missing (we don't care of it), the third and fourth instances have class A.
>     > Now what k-means (with k = 2) generates is:
>     >
>     >
>     > #############################
>     > Clustered Instances
>     >
>     > 0      2 ( 50%)
>     > 1      2 ( 50%)
>     >
>     > Class attribute: theClass
>     > Classes to Clusters:
>     >
>     >  0 1  <-- assigned to cluster
>     >  2 1 | A
>     >  0 1 | B
>     >
>     > Cluster 0 <-- A
>     > Cluster 1 <-- B
>     >
>     > Incorrectly clustered instances :     1.0      25      %
>     > ##############################
>     >
>     > We see that the we have two clusters, each cluster has two instances but when doing the evaluation, we have the unlabeled instance clustered in the cluster1 (prediction) and considered as A (as actual, even though its label does not exist); thus we have it as incorrectly clustered instance. What I desire to do, is to exclude (ignore) this from the evaluation and so have incorrectly clustered instances of 0%.
>     > How can I achieve this? Moreover, how can I calculate the other measures like (precision, recall) in the light of the ClassifciationViaClustering which does not work when a class is missing.
>     >
>     > Best,
>     > Haytham
>
>
>     > _______________________________________________
>     > Wekalist mailing list
>     > Send posts to: [hidden email]
>     > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
>     > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>     _______________________________________________
>     Wekalist mailing list
>     Send posts to: [hidden email]
>     List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
>     List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
>
>
>
>
>
>     _______________________________________________
>     Wekalist mailing list
>     Send posts to: [hidden email]
>     List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
>     List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
> Only releases of Weka / packages get published on Maven Central. For all other cases, you have to compile the artifacts yourself at this stage.
>
> Cheers, Peter
> --
> Peter Reutemann
> Dept. of Computer Science
> University of Waikato, NZ
> +64 (7) 858-5174
> http://www.cms.waikato.ac.nz/~fracpete/
> http://www.data-mining.co.nz
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Evaluating clusters with unlabeled instances

haytham.salhi
Leaving it as is (i.e., disables NO_CLASS capability) would result in capability exception when some instances have no classes, thus making the fix of no value. I think the classifier would still have access to class values even though we enable the NO_CLASS capability. Please correct me if I am wrong.

On Thu, May 18, 2017 at 12:33 AM, Eibe Frank <[hidden email]> wrote:
I don’t think that would be appropriate: the classifier needs to have access to class values so that the classes-to-clusters mapping can be established.

Cheers,
Eibe

> On 18/05/2017, at 10:31 AM, Haytham Salhi <[hidden email]> wrote:
>
> Hello again,
>
> Further, kindly note that the default capabilities of classificationViaClustering classifier should be updated accordingly. The class must enable NO_CLASS capability which otherwise the class would raise an exception.
>
> Best,
> Haytham
>
> On Thu, Apr 27, 2017 at 9:05 PM, Peter Reutemann <[hidden email]> wrote:
> On April 28, 2017 2:37:14 AM GMT+12:00, Haytham Salhi <[hidden email]> wrote:
> Thanks Mark - appreciated. Can we have a maven release for the latest version (1.0.5) of classificationViaClustering package? Also, I am wondering, how can I get the nightly snapshot (3.9.2-SNAPSHOT) of weka? The latest dev version on maven is 3.9.1.
>
>
>
> On Thu, Apr 27, 2017 at 6:12 AM, Mark Hall <[hidden email]> wrote:
> I've released a new version of the classificationViaClustering package that incorporates the fix. You can get it via the package manager.
>
> Cheers,
> Mark.
>
> On 27/04/17, 2:23 AM, "Haytham Salhi" <[hidden email] on behalf of [hidden email]> wrote:
>
>     Thanks a lot Eibe. Could you please let me know when ClassificationViaClustering component will be fixed? Can we have it today or tomorrow?
>
>     On Wed, Apr 26, 2017 at 11:43 AM, Eibe Frank <[hidden email]> wrote:
>
>     As you say, the unlabeled instance should be excluded from the evaluation. I've just committed a bug fix for this into the main trunk (3.9.2-SNAPSHOT). The next nightly snapshot should have this fix. Thanks for the bug report.
>
>     It sounds like ClassificationViaClustering should be fixed as well.
>
>     Cheers,
>     Eibe
>
>     > On 26 Apr 2017, at 11:55, Haytham Salhi <[hidden email]> wrote:
>     >
>     > Hello Weka,
>     >
>     > I am wondering, is it feasible to evaluate clusters using "classes to clusters evaluation" with some unlabeled instances so that these unlabeled clusters are not taken into account when doing the evaluation. That is, we mean to have them (i.e, unlabeled instances) when generating the clusters, yet we need to exclude them from the clustering evaluation. To be precise, please let's look at the following example:
>     >
>     > Assume we have the following instances:
>     > @attribute theClass {A,B}
>     > @attribute a1 numeric
>     > @attribute a2 numeric
>     > @attribute a3 numeric
>     >
>     > @attribute a4 numeric
>     > @attribute a5 numeric
>     >
>     > @data
>     > {0 B,2 1,3 1,5 1}
>     > {0 ?,2 1,5 1}
>     > {1 1,4 2}
>     > {2 1,4 2}
>     >
>     > As we observe, the first instance has class B, the second one's class is missing (we don't care of it), the third and fourth instances have class A.
>     > Now what k-means (with k = 2) generates is:
>     >
>     >
>     > #############################
>     > Clustered Instances
>     >
>     > 0      2 ( 50%)
>     > 1      2 ( 50%)
>     >
>     > Class attribute: theClass
>     > Classes to Clusters:
>     >
>     >  0 1  <-- assigned to cluster
>     >  2 1 | A
>     >  0 1 | B
>     >
>     > Cluster 0 <-- A
>     > Cluster 1 <-- B
>     >
>     > Incorrectly clustered instances :     1.0      25      %
>     > ##############################
>     >
>     > We see that the we have two clusters, each cluster has two instances but when doing the evaluation, we have the unlabeled instance clustered in the cluster1 (prediction) and considered as A (as actual, even though its label does not exist); thus we have it as incorrectly clustered instance. What I desire to do, is to exclude (ignore) this from the evaluation and so have incorrectly clustered instances of 0%.
>     > How can I achieve this? Moreover, how can I calculate the other measures like (precision, recall) in the light of the ClassifciationViaClustering which does not work when a class is missing.
>     >
>     > Best,
>     > Haytham
>
>
>     > _______________________________________________
>     > Wekalist mailing list
>     > Send posts to: [hidden email]
>     > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
>     > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>     _______________________________________________
>     Wekalist mailing list
>     Send posts to: [hidden email]
>     List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
>     List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
>
>
>
>
>
>     _______________________________________________
>     Wekalist mailing list
>     Send posts to: [hidden email]
>     List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
>     List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
> Only releases of Weka / packages get published on Maven Central. For all other cases, you have to compile the artifacts yourself at this stage.
>
> Cheers, Peter
> --
> Peter Reutemann
> Dept. of Computer Science
> University of Waikato, NZ
> <a href="tel:%2B64%20%287%29%20858-5174" value="+6478585174">+64 (7) 858-5174
> http://www.cms.waikato.ac.nz/~fracpete/
> http://www.data-mining.co.nz
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Evaluating clusters with unlabeled instances

Peter Reutemann
> Leaving it as is (i.e., disables NO_CLASS capability) would result in
> capability exception when some instances have no classes, thus making the
> fix of no value. I think the classifier would still have access to class
> values even though we enable the NO_CLASS capability. Please correct me if I
> am wrong.

I think, you mean "MISSING_CLASS_VALUES". "NO_CLASS" means that the
class attribute is not set at all (cluster algorithms use this).

http://weka.sourceforge.net/doc.dev/weka/core/Capabilities.Capability.html

Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 858-5174
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Evaluating clusters with unlabeled instances

haytham.salhi
Exactly, Peter. You are right. NO_CLASS means the class attribute is not set at all. 

Therefore, result.enable(Capability.MISSING_CLASS_VALUES); must be added into getCapabilities method to avoid the exception.

On Thu, May 18, 2017 at 12:51 AM, Peter Reutemann <[hidden email]> wrote:
> Leaving it as is (i.e., disables NO_CLASS capability) would result in
> capability exception when some instances have no classes, thus making the
> fix of no value. I think the classifier would still have access to class
> values even though we enable the NO_CLASS capability. Please correct me if I
> am wrong.

I think, you mean "MISSING_CLASS_VALUES". "NO_CLASS" means that the
class attribute is not set at all (cluster algorithms use this).

http://weka.sourceforge.net/doc.dev/weka/core/Capabilities.Capability.html

Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
<a href="tel:%2B64%20%287%29%20858-5174" value="+6478585174">+64 (7) 858-5174
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Loading...