Inconsistent results between ClusterEvaluation and ClassificationViaClustering

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

Inconsistent results between ClusterEvaluation and ClassificationViaClustering

haytham.salhi
Hello Weka, 

As we have now ClusterEvaluation (and ClassificationViaClustering) ignoring the instances whose class attribute is missing when doing "classes-to-clusters" evaluation, ClusterEvaluation is tested and seems to be working fine. However, ClassificationViaClustering along with Evaluation still behaves strangely. 

As an example, let's take the following simple case:

1- Assume we have the following data with a class attribute:

@relation 'example'

@attribute theClass {A,B}
@attribute I numeric
@attribute am numeric
@attribute are numeric
@attribute bebo numeric
@attribute different numeric
@attribute great numeric
@attribute haytham numeric
@attribute hello numeric
@attribute how numeric
@attribute man numeric
@attribute mazen numeric
@attribute movie numeric
@attribute samir numeric
@attribute sir numeric
@attribute totally numeric
@attribute you numeric

@data
{6 1,12 2}
{0 ?,11 2,13 1}
{0 B,7 1,11 1,13 1}
{4 1,6 2,12 2}
{0 ?,3 1,8 1,9 1,10 1,16 1}
{0 ?,3 1,8 1,9 1,16 1}
{0 ?,3 1,8 1,9 1,14 1,16 1}
{0 ?,1 1,2 1,5 3,15 1}

2- As we can see, we have two instances with class A (first and fourth) and one with class B (third). Other instances' classes are missing.

3- Let's assume we want to do k-means clustering with k =4 and with kmeans++ as an initialization method. The model output is: 

Number of iterations: 2
Within cluster sum of squared errors: 2.5833333333333335

Initial starting points (k-means++):

Cluster 0: {2 1,7 1,8 1,15 1}
Cluster 1: {6 1,10 1,12 1}
Cluster 2: {3 1,5 2,11 2}
Cluster 3: {0 1,1 1,4 3,14 1}

Missing values globally replaced with mean/mode

Final cluster centroids:
                         Cluster#
Attribute    Full Data          0          1          2          3
                 (8.0)      (3.0)      (2.0)      (2.0)      (1.0)
==================================================================
I                0.125          0          0          0          1
am               0.125          0          0          0          1
are              0.375          1          0          0          0
bebo             0.125          0          0        0.5          0
different        0.375          0          0          0          3
great            0.375          0          0        1.5          0
haytham          0.125          0        0.5          0          0
hello            0.375          1          0          0          0
how              0.375          1          0          0          0
man              0.125     0.3333          0          0          0
mazen            0.375          0        1.5          0          0
movie              0.5          0          0          2          0
samir             0.25          0          1          0          0
sir              0.125     0.3333          0          0          0
totally          0.125          0          0          0          1
you              0.375          1          0          0          0 

4- After building the clusterer and doing the evaluation (using ClusterEvaluation), we have the following reasonable results: 

Clustered Instances

0      3 ( 38%)
1      2 ( 25%)
2      2 ( 25%)
3      1 ( 13%)


Class attribute: theClass
Classes to Clusters:

 1 2  <-- assigned to cluster
 0 2 | A
 1 0 | B

Cluster 1 <-- B
Cluster 2 <-- A

Incorrectly clustered instances : 0.0  0      %

Cluster assignments: [2.0, 1.0, 1.0, 2.0, 0.0, 0.0, 0.0, 3.0]

Here, class 3 and class 0 are ignored in "classses-to-clusters" evaluation and this makes total sense.

5- However, when buidling the model using ClassificationViaClustering with same clusterer settings, the model output is: 

Number of iterations: 2
Within cluster sum of squared errors: 0.0

Initial starting points (k-means++):

Cluster 0: {5 1,11 2}
Cluster 1: {6 1,10 1,12 1}
Cluster 2: {3 1,5 2,11 2}

Missing values globally replaced with mean/mode

Final cluster centroids:
                         Cluster#
Attribute    Full Data          0          1          2
                 (3.0)      (1.0)      (1.0)      (1.0)
=======================================================
I                    0          0          0          0
am                   0          0          0          0
are                  0          0          0          0
bebo            0.3333          0          0          1
different            0          0          0          0
great                1          1          0          2
haytham         0.3333          0          1          0
hello                0          0          0          0
how                  0          0          0          0
man                  0          0          0          0
mazen           0.3333          0          1          0
movie           1.3333          2          0          2
samir           0.3333          0          1          0
sir                  0          0          0          0
totally              0          0          0          0
you                  0          0          0          0

What's strange here is, even though we set the number of clusters to 4, the model outputs only three clusters; thus, the evaluation of this model is not reasonable. Below is a part of model evaluation:

Clusters to classes mapping:
  1. Cluster: no class
  2. Cluster: B (2)
  3. Cluster: A (1)

Classes to clusters mapping:
  1. Class (A): 3. Cluster
  2. Class (B): 2. Cluster


=== Summary ===

Correctly Classified Instances           2               66.6667 %
Incorrectly Classified Instances         0                0      %
Kappa statistic                          1     
Mean absolute error                      0     
Root mean squared error                  0     
Relative absolute error                  0      %
Root relative squared error              0      %
UnClassified Instances                   1               33.3333 %
Total Number of Instances                3     
Ignored Class Unknown Instances                  5     

Note that I am using WEKA.3.9.2-SNAPSHOT. 

Best,
Haytham

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent results between ClusterEvaluation and ClassificationViaClustering

Eibe Frank-2
Administrator
Could you perhaps send us your data?

Cheers,
Eibe

> On 26 May 2017, at 12:20, Haytham Salhi <[hidden email]> wrote:
>
> Hello Weka,
>
> As we have now ClusterEvaluation (and ClassificationViaClustering) ignoring the instances whose class attribute is missing when doing "classes-to-clusters" evaluation, ClusterEvaluation is tested and seems to be working fine. However, ClassificationViaClustering along with Evaluation still behaves strangely.
>
> As an example, let's take the following simple case:
>
> 1- Assume we have the following data with a class attribute:
>
> @relation 'example'
>
> @attribute theClass {A,B}
> @attribute I numeric
> @attribute am numeric
> @attribute are numeric
> @attribute bebo numeric
> @attribute different numeric
> @attribute great numeric
> @attribute haytham numeric
> @attribute hello numeric
> @attribute how numeric
> @attribute man numeric
> @attribute mazen numeric
> @attribute movie numeric
> @attribute samir numeric
> @attribute sir numeric
> @attribute totally numeric
> @attribute you numeric
>
> @data
> {6 1,12 2}
> {0 ?,11 2,13 1}
> {0 B,7 1,11 1,13 1}
> {4 1,6 2,12 2}
> {0 ?,3 1,8 1,9 1,10 1,16 1}
> {0 ?,3 1,8 1,9 1,16 1}
> {0 ?,3 1,8 1,9 1,14 1,16 1}
> {0 ?,1 1,2 1,5 3,15 1}
>
> 2- As we can see, we have two instances with class A (first and fourth) and one with class B (third). Other instances' classes are missing.
>
> 3- Let's assume we want to do k-means clustering with k =4 and with kmeans++ as an initialization method. The model output is:
>
> Number of iterations: 2
> Within cluster sum of squared errors: 2.5833333333333335
>
> Initial starting points (k-means++):
>
> Cluster 0: {2 1,7 1,8 1,15 1}
> Cluster 1: {6 1,10 1,12 1}
> Cluster 2: {3 1,5 2,11 2}
> Cluster 3: {0 1,1 1,4 3,14 1}
>
> Missing values globally replaced with mean/mode
>
> Final cluster centroids:
>                          Cluster#
> Attribute    Full Data          0          1          2          3
>                  (8.0)      (3.0)      (2.0)      (2.0)      (1.0)
> ==================================================================
> I                0.125          0          0          0          1
> am               0.125          0          0          0          1
> are              0.375          1          0          0          0
> bebo             0.125          0          0        0.5          0
> different        0.375          0          0          0          3
> great            0.375          0          0        1.5          0
> haytham          0.125          0        0.5          0          0
> hello            0.375          1          0          0          0
> how              0.375          1          0          0          0
> man              0.125     0.3333          0          0          0
> mazen            0.375          0        1.5          0          0
> movie              0.5          0          0          2          0
> samir             0.25          0          1          0          0
> sir              0.125     0.3333          0          0          0
> totally          0.125          0          0          0          1
> you              0.375          1          0          0          0
>
> 4- After building the clusterer and doing the evaluation (using ClusterEvaluation), we have the following reasonable results:
>
> Clustered Instances
>
> 0      3 ( 38%)
> 1      2 ( 25%)
> 2      2 ( 25%)
> 3      1 ( 13%)
>
>
> Class attribute: theClass
> Classes to Clusters:
>
>  1 2  <-- assigned to cluster
>  0 2 | A
>  1 0 | B
>
> Cluster 1 <-- B
> Cluster 2 <-- A
>
> Incorrectly clustered instances : 0.0  0      %
>
> Cluster assignments: [2.0, 1.0, 1.0, 2.0, 0.0, 0.0, 0.0, 3.0]
>
> Here, class 3 and class 0 are ignored in "classses-to-clusters" evaluation and this makes total sense.
>
> 5- However, when buidling the model using ClassificationViaClustering with same clusterer settings, the model output is:
>
> Number of iterations: 2
> Within cluster sum of squared errors: 0.0
>
> Initial starting points (k-means++):
>
> Cluster 0: {5 1,11 2}
> Cluster 1: {6 1,10 1,12 1}
> Cluster 2: {3 1,5 2,11 2}
>
> Missing values globally replaced with mean/mode
>
> Final cluster centroids:
>                          Cluster#
> Attribute    Full Data          0          1          2
>                  (3.0)      (1.0)      (1.0)      (1.0)
> =======================================================
> I                    0          0          0          0
> am                   0          0          0          0
> are                  0          0          0          0
> bebo            0.3333          0          0          1
> different            0          0          0          0
> great                1          1          0          2
> haytham         0.3333          0          1          0
> hello                0          0          0          0
> how                  0          0          0          0
> man                  0          0          0          0
> mazen           0.3333          0          1          0
> movie           1.3333          2          0          2
> samir           0.3333          0          1          0
> sir                  0          0          0          0
> totally              0          0          0          0
> you                  0          0          0          0
>
> What's strange here is, even though we set the number of clusters to 4, the model outputs only three clusters; thus, the evaluation of this model is not reasonable. Below is a part of model evaluation:
>
> Clusters to classes mapping:
>   1. Cluster: no class
>   2. Cluster: B (2)
>   3. Cluster: A (1)
>
> Classes to clusters mapping:
>   1. Class (A): 3. Cluster
>   2. Class (B): 2. Cluster
>
>
> === Summary ===
>
> Correctly Classified Instances           2               66.6667 %
> Incorrectly Classified Instances         0                0      %
> Kappa statistic                          1    
> Mean absolute error                      0    
> Root mean squared error                  0    
> Relative absolute error                  0      %
> Root relative squared error              0      %
> UnClassified Instances                   1               33.3333 %
> Total Number of Instances                3    
> Ignored Class Unknown Instances                  5    
>
> Note that I am using WEKA.3.9.2-SNAPSHOT.
>
> Best,
> Haytham
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent results between ClusterEvaluation and ClassificationViaClustering

haytham.salhi
The dataset is included in the first point above in ARFF format. Please let me know if you want any further info.

Best,
Haytham

On Sat, May 27, 2017 at 7:58 AM, Eibe Frank <[hidden email]> wrote:
Could you perhaps send us your data?

Cheers,
Eibe

> On 26 May 2017, at 12:20, Haytham Salhi <[hidden email]> wrote:
>
> Hello Weka,
>
> As we have now ClusterEvaluation (and ClassificationViaClustering) ignoring the instances whose class attribute is missing when doing "classes-to-clusters" evaluation, ClusterEvaluation is tested and seems to be working fine. However, ClassificationViaClustering along with Evaluation still behaves strangely.
>
> As an example, let's take the following simple case:
>
> 1- Assume we have the following data with a class attribute:
>
> @relation 'example'
>
> @attribute theClass {A,B}
> @attribute I numeric
> @attribute am numeric
> @attribute are numeric
> @attribute bebo numeric
> @attribute different numeric
> @attribute great numeric
> @attribute haytham numeric
> @attribute hello numeric
> @attribute how numeric
> @attribute man numeric
> @attribute mazen numeric
> @attribute movie numeric
> @attribute samir numeric
> @attribute sir numeric
> @attribute totally numeric
> @attribute you numeric
>
> @data
> {6 1,12 2}
> {0 ?,11 2,13 1}
> {0 B,7 1,11 1,13 1}
> {4 1,6 2,12 2}
> {0 ?,3 1,8 1,9 1,10 1,16 1}
> {0 ?,3 1,8 1,9 1,16 1}
> {0 ?,3 1,8 1,9 1,14 1,16 1}
> {0 ?,1 1,2 1,5 3,15 1}
>
> 2- As we can see, we have two instances with class A (first and fourth) and one with class B (third). Other instances' classes are missing.
>
> 3- Let's assume we want to do k-means clustering with k =4 and with kmeans++ as an initialization method. The model output is:
>
> Number of iterations: 2
> Within cluster sum of squared errors: 2.5833333333333335
>
> Initial starting points (k-means++):
>
> Cluster 0: {2 1,7 1,8 1,15 1}
> Cluster 1: {6 1,10 1,12 1}
> Cluster 2: {3 1,5 2,11 2}
> Cluster 3: {0 1,1 1,4 3,14 1}
>
> Missing values globally replaced with mean/mode
>
> Final cluster centroids:
>                          Cluster#
> Attribute    Full Data          0          1          2          3
>                  (8.0)      (3.0)      (2.0)      (2.0)      (1.0)
> ==================================================================
> I                0.125          0          0          0          1
> am               0.125          0          0          0          1
> are              0.375          1          0          0          0
> bebo             0.125          0          0        0.5          0
> different        0.375          0          0          0          3
> great            0.375          0          0        1.5          0
> haytham          0.125          0        0.5          0          0
> hello            0.375          1          0          0          0
> how              0.375          1          0          0          0
> man              0.125     0.3333          0          0          0
> mazen            0.375          0        1.5          0          0
> movie              0.5          0          0          2          0
> samir             0.25          0          1          0          0
> sir              0.125     0.3333          0          0          0
> totally          0.125          0          0          0          1
> you              0.375          1          0          0          0
>
> 4- After building the clusterer and doing the evaluation (using ClusterEvaluation), we have the following reasonable results:
>
> Clustered Instances
>
> 0      3 ( 38%)
> 1      2 ( 25%)
> 2      2 ( 25%)
> 3      1 ( 13%)
>
>
> Class attribute: theClass
> Classes to Clusters:
>
>  1 2  <-- assigned to cluster
>  0 2 | A
>  1 0 | B
>
> Cluster 1 <-- B
> Cluster 2 <-- A
>
> Incorrectly clustered instances :     0.0       0      %
>
> Cluster assignments: [2.0, 1.0, 1.0, 2.0, 0.0, 0.0, 0.0, 3.0]
>
> Here, class 3 and class 0 are ignored in "classses-to-clusters" evaluation and this makes total sense.
>
> 5- However, when buidling the model using ClassificationViaClustering with same clusterer settings, the model output is:
>
> Number of iterations: 2
> Within cluster sum of squared errors: 0.0
>
> Initial starting points (k-means++):
>
> Cluster 0: {5 1,11 2}
> Cluster 1: {6 1,10 1,12 1}
> Cluster 2: {3 1,5 2,11 2}
>
> Missing values globally replaced with mean/mode
>
> Final cluster centroids:
>                          Cluster#
> Attribute    Full Data          0          1          2
>                  (3.0)      (1.0)      (1.0)      (1.0)
> =======================================================
> I                    0          0          0          0
> am                   0          0          0          0
> are                  0          0          0          0
> bebo            0.3333          0          0          1
> different            0          0          0          0
> great                1          1          0          2
> haytham         0.3333          0          1          0
> hello                0          0          0          0
> how                  0          0          0          0
> man                  0          0          0          0
> mazen           0.3333          0          1          0
> movie           1.3333          2          0          2
> samir           0.3333          0          1          0
> sir                  0          0          0          0
> totally              0          0          0          0
> you                  0          0          0          0
>
> What's strange here is, even though we set the number of clusters to 4, the model outputs only three clusters; thus, the evaluation of this model is not reasonable. Below is a part of model evaluation:
>
> Clusters to classes mapping:
>   1. Cluster: no class
>   2. Cluster: B (2)
>   3. Cluster: A (1)
>
> Classes to clusters mapping:
>   1. Class (A): 3. Cluster
>   2. Class (B): 2. Cluster
>
>
> === Summary ===
>
> Correctly Classified Instances           2               66.6667 %
> Incorrectly Classified Instances         0                0      %
> Kappa statistic                          1
> Mean absolute error                      0
> Root mean squared error                  0
> Relative absolute error                  0      %
> Root relative squared error              0      %
> UnClassified Instances                   1               33.3333 %
> Total Number of Instances                3
> Ignored Class Unknown Instances                  5
>
> Note that I am using WEKA.3.9.2-SNAPSHOT.
>
> Best,
> Haytham
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent results between ClusterEvaluation and ClassificationViaClustering

Eibe Frank-2
Administrator
This will be fixed in the next release (1.0.6) of the classificationViaClustering package (it's already fixed in the SVN repository). Instances with missing class values were deleted in ClassificationViaClustering, that's why you got a different result. This is no longer the case in the new version.

Thanks for reporting this bug.

Cheers,
Eibe

> On 28 May 2017, at 02:35, Haytham Salhi <[hidden email]> wrote:
>
> The dataset is included in the first point above in ARFF format. Please let me know if you want any further info.
>
> Best,
> Haytham
>
> On Sat, May 27, 2017 at 7:58 AM, Eibe Frank <[hidden email]> wrote:
> Could you perhaps send us your data?
>
> Cheers,
> Eibe
>
> > On 26 May 2017, at 12:20, Haytham Salhi <[hidden email]> wrote:
> >
> > Hello Weka,
> >
> > As we have now ClusterEvaluation (and ClassificationViaClustering) ignoring the instances whose class attribute is missing when doing "classes-to-clusters" evaluation, ClusterEvaluation is tested and seems to be working fine. However, ClassificationViaClustering along with Evaluation still behaves strangely.
> >
> > As an example, let's take the following simple case:
> >
> > 1- Assume we have the following data with a class attribute:
> >
> > @relation 'example'
> >
> > @attribute theClass {A,B}
> > @attribute I numeric
> > @attribute am numeric
> > @attribute are numeric
> > @attribute bebo numeric
> > @attribute different numeric
> > @attribute great numeric
> > @attribute haytham numeric
> > @attribute hello numeric
> > @attribute how numeric
> > @attribute man numeric
> > @attribute mazen numeric
> > @attribute movie numeric
> > @attribute samir numeric
> > @attribute sir numeric
> > @attribute totally numeric
> > @attribute you numeric
> >
> > @data
> > {6 1,12 2}
> > {0 ?,11 2,13 1}
> > {0 B,7 1,11 1,13 1}
> > {4 1,6 2,12 2}
> > {0 ?,3 1,8 1,9 1,10 1,16 1}
> > {0 ?,3 1,8 1,9 1,16 1}
> > {0 ?,3 1,8 1,9 1,14 1,16 1}
> > {0 ?,1 1,2 1,5 3,15 1}
> >
> > 2- As we can see, we have two instances with class A (first and fourth) and one with class B (third). Other instances' classes are missing.
> >
> > 3- Let's assume we want to do k-means clustering with k =4 and with kmeans++ as an initialization method. The model output is:
> >
> > Number of iterations: 2
> > Within cluster sum of squared errors: 2.5833333333333335
> >
> > Initial starting points (k-means++):
> >
> > Cluster 0: {2 1,7 1,8 1,15 1}
> > Cluster 1: {6 1,10 1,12 1}
> > Cluster 2: {3 1,5 2,11 2}
> > Cluster 3: {0 1,1 1,4 3,14 1}
> >
> > Missing values globally replaced with mean/mode
> >
> > Final cluster centroids:
> >                          Cluster#
> > Attribute    Full Data          0          1          2          3
> >                  (8.0)      (3.0)      (2.0)      (2.0)      (1.0)
> > ==================================================================
> > I                0.125          0          0          0          1
> > am               0.125          0          0          0          1
> > are              0.375          1          0          0          0
> > bebo             0.125          0          0        0.5          0
> > different        0.375          0          0          0          3
> > great            0.375          0          0        1.5          0
> > haytham          0.125          0        0.5          0          0
> > hello            0.375          1          0          0          0
> > how              0.375          1          0          0          0
> > man              0.125     0.3333          0          0          0
> > mazen            0.375          0        1.5          0          0
> > movie              0.5          0          0          2          0
> > samir             0.25          0          1          0          0
> > sir              0.125     0.3333          0          0          0
> > totally          0.125          0          0          0          1
> > you              0.375          1          0          0          0
> >
> > 4- After building the clusterer and doing the evaluation (using ClusterEvaluation), we have the following reasonable results:
> >
> > Clustered Instances
> >
> > 0      3 ( 38%)
> > 1      2 ( 25%)
> > 2      2 ( 25%)
> > 3      1 ( 13%)
> >
> >
> > Class attribute: theClass
> > Classes to Clusters:
> >
> >  1 2  <-- assigned to cluster
> >  0 2 | A
> >  1 0 | B
> >
> > Cluster 1 <-- B
> > Cluster 2 <-- A
> >
> > Incorrectly clustered instances :     0.0       0      %
> >
> > Cluster assignments: [2.0, 1.0, 1.0, 2.0, 0.0, 0.0, 0.0, 3.0]
> >
> > Here, class 3 and class 0 are ignored in "classses-to-clusters" evaluation and this makes total sense.
> >
> > 5- However, when buidling the model using ClassificationViaClustering with same clusterer settings, the model output is:
> >
> > Number of iterations: 2
> > Within cluster sum of squared errors: 0.0
> >
> > Initial starting points (k-means++):
> >
> > Cluster 0: {5 1,11 2}
> > Cluster 1: {6 1,10 1,12 1}
> > Cluster 2: {3 1,5 2,11 2}
> >
> > Missing values globally replaced with mean/mode
> >
> > Final cluster centroids:
> >                          Cluster#
> > Attribute    Full Data          0          1          2
> >                  (3.0)      (1.0)      (1.0)      (1.0)
> > =======================================================
> > I                    0          0          0          0
> > am                   0          0          0          0
> > are                  0          0          0          0
> > bebo            0.3333          0          0          1
> > different            0          0          0          0
> > great                1          1          0          2
> > haytham         0.3333          0          1          0
> > hello                0          0          0          0
> > how                  0          0          0          0
> > man                  0          0          0          0
> > mazen           0.3333          0          1          0
> > movie           1.3333          2          0          2
> > samir           0.3333          0          1          0
> > sir                  0          0          0          0
> > totally              0          0          0          0
> > you                  0          0          0          0
> >
> > What's strange here is, even though we set the number of clusters to 4, the model outputs only three clusters; thus, the evaluation of this model is not reasonable. Below is a part of model evaluation:
> >
> > Clusters to classes mapping:
> >   1. Cluster: no class
> >   2. Cluster: B (2)
> >   3. Cluster: A (1)
> >
> > Classes to clusters mapping:
> >   1. Class (A): 3. Cluster
> >   2. Class (B): 2. Cluster
> >
> >
> > === Summary ===
> >
> > Correctly Classified Instances           2               66.6667 %
> > Incorrectly Classified Instances         0                0      %
> > Kappa statistic                          1
> > Mean absolute error                      0
> > Root mean squared error                  0
> > Relative absolute error                  0      %
> > Root relative squared error              0      %
> > UnClassified Instances                   1               33.3333 %
> > Total Number of Instances                3
> > Ignored Class Unknown Instances                  5
> >
> > Note that I am using WEKA.3.9.2-SNAPSHOT.
> >
> > Best,
> > Haytham
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent results between ClusterEvaluation and ClassificationViaClustering

haytham.salhi
Thanks a lot Eibe. Are there nightly snapshots of Weka packages or I should build it myself?

On Sun, May 28, 2017 at 10:13 AM, Eibe Frank <[hidden email]> wrote:
This will be fixed in the next release (1.0.6) of the classificationViaClustering package (it's already fixed in the SVN repository). Instances with missing class values were deleted in ClassificationViaClustering, that's why you got a different result. This is no longer the case in the new version.

Thanks for reporting this bug.

Cheers,
Eibe

> On 28 May 2017, at 02:35, Haytham Salhi <[hidden email]> wrote:
>
> The dataset is included in the first point above in ARFF format. Please let me know if you want any further info.
>
> Best,
> Haytham
>
> On Sat, May 27, 2017 at 7:58 AM, Eibe Frank <[hidden email]> wrote:
> Could you perhaps send us your data?
>
> Cheers,
> Eibe
>
> > On 26 May 2017, at 12:20, Haytham Salhi <[hidden email]> wrote:
> >
> > Hello Weka,
> >
> > As we have now ClusterEvaluation (and ClassificationViaClustering) ignoring the instances whose class attribute is missing when doing "classes-to-clusters" evaluation, ClusterEvaluation is tested and seems to be working fine. However, ClassificationViaClustering along with Evaluation still behaves strangely.
> >
> > As an example, let's take the following simple case:
> >
> > 1- Assume we have the following data with a class attribute:
> >
> > @relation 'example'
> >
> > @attribute theClass {A,B}
> > @attribute I numeric
> > @attribute am numeric
> > @attribute are numeric
> > @attribute bebo numeric
> > @attribute different numeric
> > @attribute great numeric
> > @attribute haytham numeric
> > @attribute hello numeric
> > @attribute how numeric
> > @attribute man numeric
> > @attribute mazen numeric
> > @attribute movie numeric
> > @attribute samir numeric
> > @attribute sir numeric
> > @attribute totally numeric
> > @attribute you numeric
> >
> > @data
> > {6 1,12 2}
> > {0 ?,11 2,13 1}
> > {0 B,7 1,11 1,13 1}
> > {4 1,6 2,12 2}
> > {0 ?,3 1,8 1,9 1,10 1,16 1}
> > {0 ?,3 1,8 1,9 1,16 1}
> > {0 ?,3 1,8 1,9 1,14 1,16 1}
> > {0 ?,1 1,2 1,5 3,15 1}
> >
> > 2- As we can see, we have two instances with class A (first and fourth) and one with class B (third). Other instances' classes are missing.
> >
> > 3- Let's assume we want to do k-means clustering with k =4 and with kmeans++ as an initialization method. The model output is:
> >
> > Number of iterations: 2
> > Within cluster sum of squared errors: 2.5833333333333335
> >
> > Initial starting points (k-means++):
> >
> > Cluster 0: {2 1,7 1,8 1,15 1}
> > Cluster 1: {6 1,10 1,12 1}
> > Cluster 2: {3 1,5 2,11 2}
> > Cluster 3: {0 1,1 1,4 3,14 1}
> >
> > Missing values globally replaced with mean/mode
> >
> > Final cluster centroids:
> >                          Cluster#
> > Attribute    Full Data          0          1          2          3
> >                  (8.0)      (3.0)      (2.0)      (2.0)      (1.0)
> > ==================================================================
> > I                0.125          0          0          0          1
> > am               0.125          0          0          0          1
> > are              0.375          1          0          0          0
> > bebo             0.125          0          0        0.5          0
> > different        0.375          0          0          0          3
> > great            0.375          0          0        1.5          0
> > haytham          0.125          0        0.5          0          0
> > hello            0.375          1          0          0          0
> > how              0.375          1          0          0          0
> > man              0.125     0.3333          0          0          0
> > mazen            0.375          0        1.5          0          0
> > movie              0.5          0          0          2          0
> > samir             0.25          0          1          0          0
> > sir              0.125     0.3333          0          0          0
> > totally          0.125          0          0          0          1
> > you              0.375          1          0          0          0
> >
> > 4- After building the clusterer and doing the evaluation (using ClusterEvaluation), we have the following reasonable results:
> >
> > Clustered Instances
> >
> > 0      3 ( 38%)
> > 1      2 ( 25%)
> > 2      2 ( 25%)
> > 3      1 ( 13%)
> >
> >
> > Class attribute: theClass
> > Classes to Clusters:
> >
> >  1 2  <-- assigned to cluster
> >  0 2 | A
> >  1 0 | B
> >
> > Cluster 1 <-- B
> > Cluster 2 <-- A
> >
> > Incorrectly clustered instances :     0.0       0      %
> >
> > Cluster assignments: [2.0, 1.0, 1.0, 2.0, 0.0, 0.0, 0.0, 3.0]
> >
> > Here, class 3 and class 0 are ignored in "classses-to-clusters" evaluation and this makes total sense.
> >
> > 5- However, when buidling the model using ClassificationViaClustering with same clusterer settings, the model output is:
> >
> > Number of iterations: 2
> > Within cluster sum of squared errors: 0.0
> >
> > Initial starting points (k-means++):
> >
> > Cluster 0: {5 1,11 2}
> > Cluster 1: {6 1,10 1,12 1}
> > Cluster 2: {3 1,5 2,11 2}
> >
> > Missing values globally replaced with mean/mode
> >
> > Final cluster centroids:
> >                          Cluster#
> > Attribute    Full Data          0          1          2
> >                  (3.0)      (1.0)      (1.0)      (1.0)
> > =======================================================
> > I                    0          0          0          0
> > am                   0          0          0          0
> > are                  0          0          0          0
> > bebo            0.3333          0          0          1
> > different            0          0          0          0
> > great                1          1          0          2
> > haytham         0.3333          0          1          0
> > hello                0          0          0          0
> > how                  0          0          0          0
> > man                  0          0          0          0
> > mazen           0.3333          0          1          0
> > movie           1.3333          2          0          2
> > samir           0.3333          0          1          0
> > sir                  0          0          0          0
> > totally              0          0          0          0
> > you                  0          0          0          0
> >
> > What's strange here is, even though we set the number of clusters to 4, the model outputs only three clusters; thus, the evaluation of this model is not reasonable. Below is a part of model evaluation:
> >
> > Clusters to classes mapping:
> >   1. Cluster: no class
> >   2. Cluster: B (2)
> >   3. Cluster: A (1)
> >
> > Classes to clusters mapping:
> >   1. Class (A): 3. Cluster
> >   2. Class (B): 2. Cluster
> >
> >
> > === Summary ===
> >
> > Correctly Classified Instances           2               66.6667 %
> > Incorrectly Classified Instances         0                0      %
> > Kappa statistic                          1
> > Mean absolute error                      0
> > Root mean squared error                  0
> > Relative absolute error                  0      %
> > Root relative squared error              0      %
> > UnClassified Instances                   1               33.3333 %
> > Total Number of Instances                3
> > Ignored Class Unknown Instances                  5
> >
> > Note that I am using WEKA.3.9.2-SNAPSHOT.
> >
> > Best,
> > Haytham
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent results between ClusterEvaluation and ClassificationViaClustering

Eibe Frank-2
Administrator
Mark will hopefully have time to make the package release soon. You can check here whether it has been made: https://sourceforge.net/projects/weka/files/weka-packages/

Alternatively, yes, you can build the package yourself by checking it out from SVN.

Cheers,
Eibe

> On 29/05/2017, at 10:47 AM, Haytham Salhi <[hidden email]> wrote:
>
> Thanks a lot Eibe. Are there nightly snapshots of Weka packages or I should build it myself?
>
> On Sun, May 28, 2017 at 10:13 AM, Eibe Frank <[hidden email]> wrote:
> This will be fixed in the next release (1.0.6) of the classificationViaClustering package (it's already fixed in the SVN repository). Instances with missing class values were deleted in ClassificationViaClustering, that's why you got a different result. This is no longer the case in the new version.
>
> Thanks for reporting this bug.
>
> Cheers,
> Eibe
>
> > On 28 May 2017, at 02:35, Haytham Salhi <[hidden email]> wrote:
> >
> > The dataset is included in the first point above in ARFF format. Please let me know if you want any further info.
> >
> > Best,
> > Haytham
> >
> > On Sat, May 27, 2017 at 7:58 AM, Eibe Frank <[hidden email]> wrote:
> > Could you perhaps send us your data?
> >
> > Cheers,
> > Eibe
> >
> > > On 26 May 2017, at 12:20, Haytham Salhi <[hidden email]> wrote:
> > >
> > > Hello Weka,
> > >
> > > As we have now ClusterEvaluation (and ClassificationViaClustering) ignoring the instances whose class attribute is missing when doing "classes-to-clusters" evaluation, ClusterEvaluation is tested and seems to be working fine. However, ClassificationViaClustering along with Evaluation still behaves strangely.
> > >
> > > As an example, let's take the following simple case:
> > >
> > > 1- Assume we have the following data with a class attribute:
> > >
> > > @relation 'example'
> > >
> > > @attribute theClass {A,B}
> > > @attribute I numeric
> > > @attribute am numeric
> > > @attribute are numeric
> > > @attribute bebo numeric
> > > @attribute different numeric
> > > @attribute great numeric
> > > @attribute haytham numeric
> > > @attribute hello numeric
> > > @attribute how numeric
> > > @attribute man numeric
> > > @attribute mazen numeric
> > > @attribute movie numeric
> > > @attribute samir numeric
> > > @attribute sir numeric
> > > @attribute totally numeric
> > > @attribute you numeric
> > >
> > > @data
> > > {6 1,12 2}
> > > {0 ?,11 2,13 1}
> > > {0 B,7 1,11 1,13 1}
> > > {4 1,6 2,12 2}
> > > {0 ?,3 1,8 1,9 1,10 1,16 1}
> > > {0 ?,3 1,8 1,9 1,16 1}
> > > {0 ?,3 1,8 1,9 1,14 1,16 1}
> > > {0 ?,1 1,2 1,5 3,15 1}
> > >
> > > 2- As we can see, we have two instances with class A (first and fourth) and one with class B (third). Other instances' classes are missing.
> > >
> > > 3- Let's assume we want to do k-means clustering with k =4 and with kmeans++ as an initialization method. The model output is:
> > >
> > > Number of iterations: 2
> > > Within cluster sum of squared errors: 2.5833333333333335
> > >
> > > Initial starting points (k-means++):
> > >
> > > Cluster 0: {2 1,7 1,8 1,15 1}
> > > Cluster 1: {6 1,10 1,12 1}
> > > Cluster 2: {3 1,5 2,11 2}
> > > Cluster 3: {0 1,1 1,4 3,14 1}
> > >
> > > Missing values globally replaced with mean/mode
> > >
> > > Final cluster centroids:
> > >                          Cluster#
> > > Attribute    Full Data          0          1          2          3
> > >                  (8.0)      (3.0)      (2.0)      (2.0)      (1.0)
> > > ==================================================================
> > > I                0.125          0          0          0          1
> > > am               0.125          0          0          0          1
> > > are              0.375          1          0          0          0
> > > bebo             0.125          0          0        0.5          0
> > > different        0.375          0          0          0          3
> > > great            0.375          0          0        1.5          0
> > > haytham          0.125          0        0.5          0          0
> > > hello            0.375          1          0          0          0
> > > how              0.375          1          0          0          0
> > > man              0.125     0.3333          0          0          0
> > > mazen            0.375          0        1.5          0          0
> > > movie              0.5          0          0          2          0
> > > samir             0.25          0          1          0          0
> > > sir              0.125     0.3333          0          0          0
> > > totally          0.125          0          0          0          1
> > > you              0.375          1          0          0          0
> > >
> > > 4- After building the clusterer and doing the evaluation (using ClusterEvaluation), we have the following reasonable results:
> > >
> > > Clustered Instances
> > >
> > > 0      3 ( 38%)
> > > 1      2 ( 25%)
> > > 2      2 ( 25%)
> > > 3      1 ( 13%)
> > >
> > >
> > > Class attribute: theClass
> > > Classes to Clusters:
> > >
> > >  1 2  <-- assigned to cluster
> > >  0 2 | A
> > >  1 0 | B
> > >
> > > Cluster 1 <-- B
> > > Cluster 2 <-- A
> > >
> > > Incorrectly clustered instances :     0.0       0      %
> > >
> > > Cluster assignments: [2.0, 1.0, 1.0, 2.0, 0.0, 0.0, 0.0, 3.0]
> > >
> > > Here, class 3 and class 0 are ignored in "classses-to-clusters" evaluation and this makes total sense.
> > >
> > > 5- However, when buidling the model using ClassificationViaClustering with same clusterer settings, the model output is:
> > >
> > > Number of iterations: 2
> > > Within cluster sum of squared errors: 0.0
> > >
> > > Initial starting points (k-means++):
> > >
> > > Cluster 0: {5 1,11 2}
> > > Cluster 1: {6 1,10 1,12 1}
> > > Cluster 2: {3 1,5 2,11 2}
> > >
> > > Missing values globally replaced with mean/mode
> > >
> > > Final cluster centroids:
> > >                          Cluster#
> > > Attribute    Full Data          0          1          2
> > >                  (3.0)      (1.0)      (1.0)      (1.0)
> > > =======================================================
> > > I                    0          0          0          0
> > > am                   0          0          0          0
> > > are                  0          0          0          0
> > > bebo            0.3333          0          0          1
> > > different            0          0          0          0
> > > great                1          1          0          2
> > > haytham         0.3333          0          1          0
> > > hello                0          0          0          0
> > > how                  0          0          0          0
> > > man                  0          0          0          0
> > > mazen           0.3333          0          1          0
> > > movie           1.3333          2          0          2
> > > samir           0.3333          0          1          0
> > > sir                  0          0          0          0
> > > totally              0          0          0          0
> > > you                  0          0          0          0
> > >
> > > What's strange here is, even though we set the number of clusters to 4, the model outputs only three clusters; thus, the evaluation of this model is not reasonable. Below is a part of model evaluation:
> > >
> > > Clusters to classes mapping:
> > >   1. Cluster: no class
> > >   2. Cluster: B (2)
> > >   3. Cluster: A (1)
> > >
> > > Classes to clusters mapping:
> > >   1. Class (A): 3. Cluster
> > >   2. Class (B): 2. Cluster
> > >
> > >
> > > === Summary ===
> > >
> > > Correctly Classified Instances           2               66.6667 %
> > > Incorrectly Classified Instances         0                0      %
> > > Kappa statistic                          1
> > > Mean absolute error                      0
> > > Root mean squared error                  0
> > > Relative absolute error                  0      %
> > > Root relative squared error              0      %
> > > UnClassified Instances                   1               33.3333 %
> > > Total Number of Instances                3
> > > Ignored Class Unknown Instances                  5
> > >
> > > Note that I am using WEKA.3.9.2-SNAPSHOT.
> > >
> > > Best,
> > > Haytham
> > > _______________________________________________
> > > Wekalist mailing list
> > > Send posts to: [hidden email]
> > > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> >
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> >
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent results between ClusterEvaluation and ClassificationViaClustering

haytham.salhi
Hello again, 

Much thanks Ebie and Mark for resolving the issue. I have tested it and it seems to be working fine.

However, there is also an important case at which I guess it is worth looking: 

In short: 

labeled instances that are placed in clusters (regardless these clusters have classes or do not) other than the correct cluster (as per classes-to-clusters policy) are considered as incorrectly clustered instances in ClusterEvaluation, which makes total sense. However, ClassificationViaClustering considers instances that are placed in an "unlabeled" cluster as an unclassified instance; and thus they are not taken into account when calculating measures such as precesion/recall/F-measure by using Evaluation component. Attachements dataset1.arff and dataset2.arff are examples.

More details:

Let's say we have similar dataset (attached as dataset1.arff) as above having instances ordered from 1 to 8 and class labels: A and B. The algorithm outputs 4 clusters as follows: (Note: 3->B means instance 3 has actual class of B, 8->? means instance 8 has no class, and so on).


Cluster 0: {2->?, 3->B}
Cluster 1: {5->?, 6->?, 7->B}
Cluster 2: {1->A, 4->A}
Cluster 3: {8->?}

As per classes to clusters evaluation, cluster 4 is predicted to be A, and cluster 3 is predicted to be B. All instances with missing class are ignored. The output of ClusterEvaluation is:

 0 1 2  <-- assigned to cluster
 0 0 2 | A
 1 1 0 | B

Cluster 0 <-- No class
Cluster 1 <-- B
Cluster 2 <-- A

Incorrectly clustered instances : 1.0 12.5    %

Which makes total sense. It considers instance #3 as an incorrectly clustered instance. ClassificationViaClustering, however, considers it as an unclassified instance. ClassificationViaClustering's output:

=== Summary ===

Correctly Classified Instances           3               75      %
Incorrectly Classified Instances         0                0      %
Kappa statistic                          1     
Mean absolute error                      0     
Root mean squared error                  0     
Relative absolute error                  0      %
Root relative squared error              0      %
UnClassified Instances                   1               25      %
Total Number of Instances                4     
Ignored Class Unknown Instances                  4     
Weighted precesion = 1.0
Weighted recall = 1.0
Weighted Macro F measure = 1.0
Averaged Macro F measure = 1.0
Averaged Micro F measure = 1.0

=== Confusion Matrix ===

 a b   <-- classified as
 2 0 | a = A
 0 1 | b = B

Now, here is the debate about instance #3. Which is more correct? to consider it as unclassified instance or incorrectly classified instance? I guess since it has a class (i.e., B) and is clustered in a cluster different than "the cluster B", it might be reasonable to consider it as incorrectly classified instance even though cluster 0 has no class. Additionally, even though the accuracy is correct and takes into account the unclassified instances, other measures do not. Look at the precesion/recall/F-measure. They all have 100% percentages and I think it should not be that case because instance 3 has actual class B and is placed in an incorrect cluster! That's why it might be perhaps better to count it as incorrectly classified instances.


What do you think? 

Best,
Haytham

On Mon, May 29, 2017 at 1:10 AM, Eibe Frank <[hidden email]> wrote:
Mark will hopefully have time to make the package release soon. You can check here whether it has been made: https://sourceforge.net/projects/weka/files/weka-packages/

Alternatively, yes, you can build the package yourself by checking it out from SVN.

Cheers,
Eibe

> On 29/05/2017, at 10:47 AM, Haytham Salhi <[hidden email]> wrote:
>
> Thanks a lot Eibe. Are there nightly snapshots of Weka packages or I should build it myself?
>
> On Sun, May 28, 2017 at 10:13 AM, Eibe Frank <[hidden email]> wrote:
> This will be fixed in the next release (1.0.6) of the classificationViaClustering package (it's already fixed in the SVN repository). Instances with missing class values were deleted in ClassificationViaClustering, that's why you got a different result. This is no longer the case in the new version.
>
> Thanks for reporting this bug.
>
> Cheers,
> Eibe
>
> > On 28 May 2017, at 02:35, Haytham Salhi <[hidden email]> wrote:
> >
> > The dataset is included in the first point above in ARFF format. Please let me know if you want any further info.
> >
> > Best,
> > Haytham
> >
> > On Sat, May 27, 2017 at 7:58 AM, Eibe Frank <[hidden email]> wrote:
> > Could you perhaps send us your data?
> >
> > Cheers,
> > Eibe
> >
> > > On 26 May 2017, at 12:20, Haytham Salhi <[hidden email]> wrote:
> > >
> > > Hello Weka,
> > >
> > > As we have now ClusterEvaluation (and ClassificationViaClustering) ignoring the instances whose class attribute is missing when doing "classes-to-clusters" evaluation, ClusterEvaluation is tested and seems to be working fine. However, ClassificationViaClustering along with Evaluation still behaves strangely.
> > >
> > > As an example, let's take the following simple case:
> > >
> > > 1- Assume we have the following data with a class attribute:
> > >
> > > @relation 'example'
> > >
> > > @attribute theClass {A,B}
> > > @attribute I numeric
> > > @attribute am numeric
> > > @attribute are numeric
> > > @attribute bebo numeric
> > > @attribute different numeric
> > > @attribute great numeric
> > > @attribute haytham numeric
> > > @attribute hello numeric
> > > @attribute how numeric
> > > @attribute man numeric
> > > @attribute mazen numeric
> > > @attribute movie numeric
> > > @attribute samir numeric
> > > @attribute sir numeric
> > > @attribute totally numeric
> > > @attribute you numeric
> > >
> > > @data
> > > {6 1,12 2}
> > > {0 ?,11 2,13 1}
> > > {0 B,7 1,11 1,13 1}
> > > {4 1,6 2,12 2}
> > > {0 ?,3 1,8 1,9 1,10 1,16 1}
> > > {0 ?,3 1,8 1,9 1,16 1}
> > > {0 ?,3 1,8 1,9 1,14 1,16 1}
> > > {0 ?,1 1,2 1,5 3,15 1}
> > >
> > > 2- As we can see, we have two instances with class A (first and fourth) and one with class B (third). Other instances' classes are missing.
> > >
> > > 3- Let's assume we want to do k-means clustering with k =4 and with kmeans++ as an initialization method. The model output is:
> > >
> > > Number of iterations: 2
> > > Within cluster sum of squared errors: 2.5833333333333335
> > >
> > > Initial starting points (k-means++):
> > >
> > > Cluster 0: {2 1,7 1,8 1,15 1}
> > > Cluster 1: {6 1,10 1,12 1}
> > > Cluster 2: {3 1,5 2,11 2}
> > > Cluster 3: {0 1,1 1,4 3,14 1}
> > >
> > > Missing values globally replaced with mean/mode
> > >
> > > Final cluster centroids:
> > >                          Cluster#
> > > Attribute    Full Data          0          1          2          3
> > >                  (8.0)      (3.0)      (2.0)      (2.0)      (1.0)
> > > ==================================================================
> > > I                0.125          0          0          0          1
> > > am               0.125          0          0          0          1
> > > are              0.375          1          0          0          0
> > > bebo             0.125          0          0        0.5          0
> > > different        0.375          0          0          0          3
> > > great            0.375          0          0        1.5          0
> > > haytham          0.125          0        0.5          0          0
> > > hello            0.375          1          0          0          0
> > > how              0.375          1          0          0          0
> > > man              0.125     0.3333          0          0          0
> > > mazen            0.375          0        1.5          0          0
> > > movie              0.5          0          0          2          0
> > > samir             0.25          0          1          0          0
> > > sir              0.125     0.3333          0          0          0
> > > totally          0.125          0          0          0          1
> > > you              0.375          1          0          0          0
> > >
> > > 4- After building the clusterer and doing the evaluation (using ClusterEvaluation), we have the following reasonable results:
> > >
> > > Clustered Instances
> > >
> > > 0      3 ( 38%)
> > > 1      2 ( 25%)
> > > 2      2 ( 25%)
> > > 3      1 ( 13%)
> > >
> > >
> > > Class attribute: theClass
> > > Classes to Clusters:
> > >
> > >  1 2  <-- assigned to cluster
> > >  0 2 | A
> > >  1 0 | B
> > >
> > > Cluster 1 <-- B
> > > Cluster 2 <-- A
> > >
> > > Incorrectly clustered instances :     0.0       0      %
> > >
> > > Cluster assignments: [2.0, 1.0, 1.0, 2.0, 0.0, 0.0, 0.0, 3.0]
> > >
> > > Here, class 3 and class 0 are ignored in "classses-to-clusters" evaluation and this makes total sense.
> > >
> > > 5- However, when buidling the model using ClassificationViaClustering with same clusterer settings, the model output is:
> > >
> > > Number of iterations: 2
> > > Within cluster sum of squared errors: 0.0
> > >
> > > Initial starting points (k-means++):
> > >
> > > Cluster 0: {5 1,11 2}
> > > Cluster 1: {6 1,10 1,12 1}
> > > Cluster 2: {3 1,5 2,11 2}
> > >
> > > Missing values globally replaced with mean/mode
> > >
> > > Final cluster centroids:
> > >                          Cluster#
> > > Attribute    Full Data          0          1          2
> > >                  (3.0)      (1.0)      (1.0)      (1.0)
> > > =======================================================
> > > I                    0          0          0          0
> > > am                   0          0          0          0
> > > are                  0          0          0          0
> > > bebo            0.3333          0          0          1
> > > different            0          0          0          0
> > > great                1          1          0          2
> > > haytham         0.3333          0          1          0
> > > hello                0          0          0          0
> > > how                  0          0          0          0
> > > man                  0          0          0          0
> > > mazen           0.3333          0          1          0
> > > movie           1.3333          2          0          2
> > > samir           0.3333          0          1          0
> > > sir                  0          0          0          0
> > > totally              0          0          0          0
> > > you                  0          0          0          0
> > >
> > > What's strange here is, even though we set the number of clusters to 4, the model outputs only three clusters; thus, the evaluation of this model is not reasonable. Below is a part of model evaluation:
> > >
> > > Clusters to classes mapping:
> > >   1. Cluster: no class
> > >   2. Cluster: B (2)
> > >   3. Cluster: A (1)
> > >
> > > Classes to clusters mapping:
> > >   1. Class (A): 3. Cluster
> > >   2. Class (B): 2. Cluster
> > >
> > >
> > > === Summary ===
> > >
> > > Correctly Classified Instances           2               66.6667 %
> > > Incorrectly Classified Instances         0                0      %
> > > Kappa statistic                          1
> > > Mean absolute error                      0
> > > Root mean squared error                  0
> > > Relative absolute error                  0      %
> > > Root relative squared error              0      %
> > > UnClassified Instances                   1               33.3333 %
> > > Total Number of Instances                3
> > > Ignored Class Unknown Instances                  5
> > >
> > > Note that I am using WEKA.3.9.2-SNAPSHOT.
> > >
> > > Best,
> > > Haytham
> > > _______________________________________________
> > > Wekalist mailing list
> > > Send posts to: [hidden email]
> > > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> >
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> >
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

dataset1.arff (870 bytes) Download Attachment
dataset2.arff (870 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent results between ClusterEvaluation and ClassificationViaClustering

haytham.salhi
Just to correct this line for last message:

As per classes to clusters evaluation, cluster 2 is predicted to be A, and cluster 1 is predicted to be B. All instances with missing class are ignored. The output of ClusterEvaluation is:

Sorry for the confusion.

On Sat, Jun 3, 2017 at 12:31 AM, Haytham Salhi <[hidden email]> wrote:
Hello again, 

Much thanks Ebie and Mark for resolving the issue. I have tested it and it seems to be working fine.

However, there is also an important case at which I guess it is worth looking: 

In short: 

labeled instances that are placed in clusters (regardless these clusters have classes or do not) other than the correct cluster (as per classes-to-clusters policy) are considered as incorrectly clustered instances in ClusterEvaluation, which makes total sense. However, ClassificationViaClustering considers instances that are placed in an "unlabeled" cluster as an unclassified instance; and thus they are not taken into account when calculating measures such as precesion/recall/F-measure by using Evaluation component. Attachements dataset1.arff and dataset2.arff are examples.

More details:

Let's say we have similar dataset (attached as dataset1.arff) as above having instances ordered from 1 to 8 and class labels: A and B. The algorithm outputs 4 clusters as follows: (Note: 3->B means instance 3 has actual class of B, 8->? means instance 8 has no class, and so on).


Cluster 0: {2->?, 3->B}
Cluster 1: {5->?, 6->?, 7->B}
Cluster 2: {1->A, 4->A}
Cluster 3: {8->?}

As per classes to clusters evaluation, cluster 4 is predicted to be A, and cluster 3 is predicted to be B. All instances with missing class are ignored. The output of ClusterEvaluation is:

 0 1 2  <-- assigned to cluster
 0 0 2 | A
 1 1 0 | B

Cluster 0 <-- No class
Cluster 1 <-- B
Cluster 2 <-- A

Incorrectly clustered instances : 1.0 12.5    %

Which makes total sense. It considers instance #3 as an incorrectly clustered instance. ClassificationViaClustering, however, considers it as an unclassified instance. ClassificationViaClustering's output:

=== Summary ===

Correctly Classified Instances           3               75      %
Incorrectly Classified Instances         0                0      %
Kappa statistic                          1     
Mean absolute error                      0     
Root mean squared error                  0     
Relative absolute error                  0      %
Root relative squared error              0      %
UnClassified Instances                   1               25      %
Total Number of Instances                4     
Ignored Class Unknown Instances                  4     
Weighted precesion = 1.0
Weighted recall = 1.0
Weighted Macro F measure = 1.0
Averaged Macro F measure = 1.0
Averaged Micro F measure = 1.0

=== Confusion Matrix ===

 a b   <-- classified as
 2 0 | a = A
 0 1 | b = B

Now, here is the debate about instance #3. Which is more correct? to consider it as unclassified instance or incorrectly classified instance? I guess since it has a class (i.e., B) and is clustered in a cluster different than "the cluster B", it might be reasonable to consider it as incorrectly classified instance even though cluster 0 has no class. Additionally, even though the accuracy is correct and takes into account the unclassified instances, other measures do not. Look at the precesion/recall/F-measure. They all have 100% percentages and I think it should not be that case because instance 3 has actual class B and is placed in an incorrect cluster! That's why it might be perhaps better to count it as incorrectly classified instances.


What do you think? 

Best,
Haytham

On Mon, May 29, 2017 at 1:10 AM, Eibe Frank <[hidden email]> wrote:
Mark will hopefully have time to make the package release soon. You can check here whether it has been made: https://sourceforge.net/projects/weka/files/weka-packages/

Alternatively, yes, you can build the package yourself by checking it out from SVN.

Cheers,
Eibe

> On 29/05/2017, at 10:47 AM, Haytham Salhi <[hidden email]> wrote:
>
> Thanks a lot Eibe. Are there nightly snapshots of Weka packages or I should build it myself?
>
> On Sun, May 28, 2017 at 10:13 AM, Eibe Frank <[hidden email]> wrote:
> This will be fixed in the next release (1.0.6) of the classificationViaClustering package (it's already fixed in the SVN repository). Instances with missing class values were deleted in ClassificationViaClustering, that's why you got a different result. This is no longer the case in the new version.
>
> Thanks for reporting this bug.
>
> Cheers,
> Eibe
>
> > On 28 May 2017, at 02:35, Haytham Salhi <[hidden email]> wrote:
> >
> > The dataset is included in the first point above in ARFF format. Please let me know if you want any further info.
> >
> > Best,
> > Haytham
> >
> > On Sat, May 27, 2017 at 7:58 AM, Eibe Frank <[hidden email]> wrote:
> > Could you perhaps send us your data?
> >
> > Cheers,
> > Eibe
> >
> > > On 26 May 2017, at 12:20, Haytham Salhi <[hidden email]> wrote:
> > >
> > > Hello Weka,
> > >
> > > As we have now ClusterEvaluation (and ClassificationViaClustering) ignoring the instances whose class attribute is missing when doing "classes-to-clusters" evaluation, ClusterEvaluation is tested and seems to be working fine. However, ClassificationViaClustering along with Evaluation still behaves strangely.
> > >
> > > As an example, let's take the following simple case:
> > >
> > > 1- Assume we have the following data with a class attribute:
> > >
> > > @relation 'example'
> > >
> > > @attribute theClass {A,B}
> > > @attribute I numeric
> > > @attribute am numeric
> > > @attribute are numeric
> > > @attribute bebo numeric
> > > @attribute different numeric
> > > @attribute great numeric
> > > @attribute haytham numeric
> > > @attribute hello numeric
> > > @attribute how numeric
> > > @attribute man numeric
> > > @attribute mazen numeric
> > > @attribute movie numeric
> > > @attribute samir numeric
> > > @attribute sir numeric
> > > @attribute totally numeric
> > > @attribute you numeric
> > >
> > > @data
> > > {6 1,12 2}
> > > {0 ?,11 2,13 1}
> > > {0 B,7 1,11 1,13 1}
> > > {4 1,6 2,12 2}
> > > {0 ?,3 1,8 1,9 1,10 1,16 1}
> > > {0 ?,3 1,8 1,9 1,16 1}
> > > {0 ?,3 1,8 1,9 1,14 1,16 1}
> > > {0 ?,1 1,2 1,5 3,15 1}
> > >
> > > 2- As we can see, we have two instances with class A (first and fourth) and one with class B (third). Other instances' classes are missing.
> > >
> > > 3- Let's assume we want to do k-means clustering with k =4 and with kmeans++ as an initialization method. The model output is:
> > >
> > > Number of iterations: 2
> > > Within cluster sum of squared errors: 2.5833333333333335
> > >
> > > Initial starting points (k-means++):
> > >
> > > Cluster 0: {2 1,7 1,8 1,15 1}
> > > Cluster 1: {6 1,10 1,12 1}
> > > Cluster 2: {3 1,5 2,11 2}
> > > Cluster 3: {0 1,1 1,4 3,14 1}
> > >
> > > Missing values globally replaced with mean/mode
> > >
> > > Final cluster centroids:
> > >                          Cluster#
> > > Attribute    Full Data          0          1          2          3
> > >                  (8.0)      (3.0)      (2.0)      (2.0)      (1.0)
> > > ==================================================================
> > > I                0.125          0          0          0          1
> > > am               0.125          0          0          0          1
> > > are              0.375          1          0          0          0
> > > bebo             0.125          0          0        0.5          0
> > > different        0.375          0          0          0          3
> > > great            0.375          0          0        1.5          0
> > > haytham          0.125          0        0.5          0          0
> > > hello            0.375          1          0          0          0
> > > how              0.375          1          0          0          0
> > > man              0.125     0.3333          0          0          0
> > > mazen            0.375          0        1.5          0          0
> > > movie              0.5          0          0          2          0
> > > samir             0.25          0          1          0          0
> > > sir              0.125     0.3333          0          0          0
> > > totally          0.125          0          0          0          1
> > > you              0.375          1          0          0          0
> > >
> > > 4- After building the clusterer and doing the evaluation (using ClusterEvaluation), we have the following reasonable results:
> > >
> > > Clustered Instances
> > >
> > > 0      3 ( 38%)
> > > 1      2 ( 25%)
> > > 2      2 ( 25%)
> > > 3      1 ( 13%)
> > >
> > >
> > > Class attribute: theClass
> > > Classes to Clusters:
> > >
> > >  1 2  <-- assigned to cluster
> > >  0 2 | A
> > >  1 0 | B
> > >
> > > Cluster 1 <-- B
> > > Cluster 2 <-- A
> > >
> > > Incorrectly clustered instances :     0.0       0      %
> > >
> > > Cluster assignments: [2.0, 1.0, 1.0, 2.0, 0.0, 0.0, 0.0, 3.0]
> > >
> > > Here, class 3 and class 0 are ignored in "classses-to-clusters" evaluation and this makes total sense.
> > >
> > > 5- However, when buidling the model using ClassificationViaClustering with same clusterer settings, the model output is:
> > >
> > > Number of iterations: 2
> > > Within cluster sum of squared errors: 0.0
> > >
> > > Initial starting points (k-means++):
> > >
> > > Cluster 0: {5 1,11 2}
> > > Cluster 1: {6 1,10 1,12 1}
> > > Cluster 2: {3 1,5 2,11 2}
> > >
> > > Missing values globally replaced with mean/mode
> > >
> > > Final cluster centroids:
> > >                          Cluster#
> > > Attribute    Full Data          0          1          2
> > >                  (3.0)      (1.0)      (1.0)      (1.0)
> > > =======================================================
> > > I                    0          0          0          0
> > > am                   0          0          0          0
> > > are                  0          0          0          0
> > > bebo            0.3333          0          0          1
> > > different            0          0          0          0
> > > great                1          1          0          2
> > > haytham         0.3333          0          1          0
> > > hello                0          0          0          0
> > > how                  0          0          0          0
> > > man                  0          0          0          0
> > > mazen           0.3333          0          1          0
> > > movie           1.3333          2          0          2
> > > samir           0.3333          0          1          0
> > > sir                  0          0          0          0
> > > totally              0          0          0          0
> > > you                  0          0          0          0
> > >
> > > What's strange here is, even though we set the number of clusters to 4, the model outputs only three clusters; thus, the evaluation of this model is not reasonable. Below is a part of model evaluation:
> > >
> > > Clusters to classes mapping:
> > >   1. Cluster: no class
> > >   2. Cluster: B (2)
> > >   3. Cluster: A (1)
> > >
> > > Classes to clusters mapping:
> > >   1. Class (A): 3. Cluster
> > >   2. Class (B): 2. Cluster
> > >
> > >
> > > === Summary ===
> > >
> > > Correctly Classified Instances           2               66.6667 %
> > > Incorrectly Classified Instances         0                0      %
> > > Kappa statistic                          1
> > > Mean absolute error                      0
> > > Root mean squared error                  0
> > > Relative absolute error                  0      %
> > > Root relative squared error              0      %
> > > UnClassified Instances                   1               33.3333 %
> > > Total Number of Instances                3
> > > Ignored Class Unknown Instances                  5
> > >
> > > Note that I am using WEKA.3.9.2-SNAPSHOT.
> > >
> > > Best,
> > > Haytham
> > > _______________________________________________
> > > Wekalist mailing list
> > > Send posts to: [hidden email]
> > > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> >
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> >
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent results between ClusterEvaluation and ClassificationViaClustering

haytham.salhi
Any idea on this issue?

On Sat, Jun 3, 2017 at 12:37 AM, Haytham Salhi <[hidden email]> wrote:
Just to correct this line for last message:

As per classes to clusters evaluation, cluster 2 is predicted to be A, and cluster 1 is predicted to be B. All instances with missing class are ignored. The output of ClusterEvaluation is:

Sorry for the confusion.

On Sat, Jun 3, 2017 at 12:31 AM, Haytham Salhi <[hidden email]> wrote:
Hello again, 

Much thanks Ebie and Mark for resolving the issue. I have tested it and it seems to be working fine.

However, there is also an important case at which I guess it is worth looking: 

In short: 

labeled instances that are placed in clusters (regardless these clusters have classes or do not) other than the correct cluster (as per classes-to-clusters policy) are considered as incorrectly clustered instances in ClusterEvaluation, which makes total sense. However, ClassificationViaClustering considers instances that are placed in an "unlabeled" cluster as an unclassified instance; and thus they are not taken into account when calculating measures such as precesion/recall/F-measure by using Evaluation component. Attachements dataset1.arff and dataset2.arff are examples.

More details:

Let's say we have similar dataset (attached as dataset1.arff) as above having instances ordered from 1 to 8 and class labels: A and B. The algorithm outputs 4 clusters as follows: (Note: 3->B means instance 3 has actual class of B, 8->? means instance 8 has no class, and so on).


Cluster 0: {2->?, 3->B}
Cluster 1: {5->?, 6->?, 7->B}
Cluster 2: {1->A, 4->A}
Cluster 3: {8->?}

As per classes to clusters evaluation, cluster 4 is predicted to be A, and cluster 3 is predicted to be B. All instances with missing class are ignored. The output of ClusterEvaluation is:

 0 1 2  <-- assigned to cluster
 0 0 2 | A
 1 1 0 | B

Cluster 0 <-- No class
Cluster 1 <-- B
Cluster 2 <-- A

Incorrectly clustered instances : 1.0 12.5    %

Which makes total sense. It considers instance #3 as an incorrectly clustered instance. ClassificationViaClustering, however, considers it as an unclassified instance. ClassificationViaClustering's output:

=== Summary ===

Correctly Classified Instances           3               75      %
Incorrectly Classified Instances         0                0      %
Kappa statistic                          1     
Mean absolute error                      0     
Root mean squared error                  0     
Relative absolute error                  0      %
Root relative squared error              0      %
UnClassified Instances                   1               25      %
Total Number of Instances                4     
Ignored Class Unknown Instances                  4     
Weighted precesion = 1.0
Weighted recall = 1.0
Weighted Macro F measure = 1.0
Averaged Macro F measure = 1.0
Averaged Micro F measure = 1.0

=== Confusion Matrix ===

 a b   <-- classified as
 2 0 | a = A
 0 1 | b = B

Now, here is the debate about instance #3. Which is more correct? to consider it as unclassified instance or incorrectly classified instance? I guess since it has a class (i.e., B) and is clustered in a cluster different than "the cluster B", it might be reasonable to consider it as incorrectly classified instance even though cluster 0 has no class. Additionally, even though the accuracy is correct and takes into account the unclassified instances, other measures do not. Look at the precesion/recall/F-measure. They all have 100% percentages and I think it should not be that case because instance 3 has actual class B and is placed in an incorrect cluster! That's why it might be perhaps better to count it as incorrectly classified instances.


What do you think? 

Best,
Haytham

On Mon, May 29, 2017 at 1:10 AM, Eibe Frank <[hidden email]> wrote:
Mark will hopefully have time to make the package release soon. You can check here whether it has been made: https://sourceforge.net/projects/weka/files/weka-packages/

Alternatively, yes, you can build the package yourself by checking it out from SVN.

Cheers,
Eibe

> On 29/05/2017, at 10:47 AM, Haytham Salhi <[hidden email]> wrote:
>
> Thanks a lot Eibe. Are there nightly snapshots of Weka packages or I should build it myself?
>
> On Sun, May 28, 2017 at 10:13 AM, Eibe Frank <[hidden email]> wrote:
> This will be fixed in the next release (1.0.6) of the classificationViaClustering package (it's already fixed in the SVN repository). Instances with missing class values were deleted in ClassificationViaClustering, that's why you got a different result. This is no longer the case in the new version.
>
> Thanks for reporting this bug.
>
> Cheers,
> Eibe
>
> > On 28 May 2017, at 02:35, Haytham Salhi <[hidden email]> wrote:
> >
> > The dataset is included in the first point above in ARFF format. Please let me know if you want any further info.
> >
> > Best,
> > Haytham
> >
> > On Sat, May 27, 2017 at 7:58 AM, Eibe Frank <[hidden email]> wrote:
> > Could you perhaps send us your data?
> >
> > Cheers,
> > Eibe
> >
> > > On 26 May 2017, at 12:20, Haytham Salhi <[hidden email]> wrote:
> > >
> > > Hello Weka,
> > >
> > > As we have now ClusterEvaluation (and ClassificationViaClustering) ignoring the instances whose class attribute is missing when doing "classes-to-clusters" evaluation, ClusterEvaluation is tested and seems to be working fine. However, ClassificationViaClustering along with Evaluation still behaves strangely.
> > >
> > > As an example, let's take the following simple case:
> > >
> > > 1- Assume we have the following data with a class attribute:
> > >
> > > @relation 'example'
> > >
> > > @attribute theClass {A,B}
> > > @attribute I numeric
> > > @attribute am numeric
> > > @attribute are numeric
> > > @attribute bebo numeric
> > > @attribute different numeric
> > > @attribute great numeric
> > > @attribute haytham numeric
> > > @attribute hello numeric
> > > @attribute how numeric
> > > @attribute man numeric
> > > @attribute mazen numeric
> > > @attribute movie numeric
> > > @attribute samir numeric
> > > @attribute sir numeric
> > > @attribute totally numeric
> > > @attribute you numeric
> > >
> > > @data
> > > {6 1,12 2}
> > > {0 ?,11 2,13 1}
> > > {0 B,7 1,11 1,13 1}
> > > {4 1,6 2,12 2}
> > > {0 ?,3 1,8 1,9 1,10 1,16 1}
> > > {0 ?,3 1,8 1,9 1,16 1}
> > > {0 ?,3 1,8 1,9 1,14 1,16 1}
> > > {0 ?,1 1,2 1,5 3,15 1}
> > >
> > > 2- As we can see, we have two instances with class A (first and fourth) and one with class B (third). Other instances' classes are missing.
> > >
> > > 3- Let's assume we want to do k-means clustering with k =4 and with kmeans++ as an initialization method. The model output is:
> > >
> > > Number of iterations: 2
> > > Within cluster sum of squared errors: 2.5833333333333335
> > >
> > > Initial starting points (k-means++):
> > >
> > > Cluster 0: {2 1,7 1,8 1,15 1}
> > > Cluster 1: {6 1,10 1,12 1}
> > > Cluster 2: {3 1,5 2,11 2}
> > > Cluster 3: {0 1,1 1,4 3,14 1}
> > >
> > > Missing values globally replaced with mean/mode
> > >
> > > Final cluster centroids:
> > >                          Cluster#
> > > Attribute    Full Data          0          1          2          3
> > >                  (8.0)      (3.0)      (2.0)      (2.0)      (1.0)
> > > ==================================================================
> > > I                0.125          0          0          0          1
> > > am               0.125          0          0          0          1
> > > are              0.375          1          0          0          0
> > > bebo             0.125          0          0        0.5          0
> > > different        0.375          0          0          0          3
> > > great            0.375          0          0        1.5          0
> > > haytham          0.125          0        0.5          0          0
> > > hello            0.375          1          0          0          0
> > > how              0.375          1          0          0          0
> > > man              0.125     0.3333          0          0          0
> > > mazen            0.375          0        1.5          0          0
> > > movie              0.5          0          0          2          0
> > > samir             0.25          0          1          0          0
> > > sir              0.125     0.3333          0          0          0
> > > totally          0.125          0          0          0          1
> > > you              0.375          1          0          0          0
> > >
> > > 4- After building the clusterer and doing the evaluation (using ClusterEvaluation), we have the following reasonable results:
> > >
> > > Clustered Instances
> > >
> > > 0      3 ( 38%)
> > > 1      2 ( 25%)
> > > 2      2 ( 25%)
> > > 3      1 ( 13%)
> > >
> > >
> > > Class attribute: theClass
> > > Classes to Clusters:
> > >
> > >  1 2  <-- assigned to cluster
> > >  0 2 | A
> > >  1 0 | B
> > >
> > > Cluster 1 <-- B
> > > Cluster 2 <-- A
> > >
> > > Incorrectly clustered instances :     0.0       0      %
> > >
> > > Cluster assignments: [2.0, 1.0, 1.0, 2.0, 0.0, 0.0, 0.0, 3.0]
> > >
> > > Here, class 3 and class 0 are ignored in "classses-to-clusters" evaluation and this makes total sense.
> > >
> > > 5- However, when buidling the model using ClassificationViaClustering with same clusterer settings, the model output is:
> > >
> > > Number of iterations: 2
> > > Within cluster sum of squared errors: 0.0
> > >
> > > Initial starting points (k-means++):
> > >
> > > Cluster 0: {5 1,11 2}
> > > Cluster 1: {6 1,10 1,12 1}
> > > Cluster 2: {3 1,5 2,11 2}
> > >
> > > Missing values globally replaced with mean/mode
> > >
> > > Final cluster centroids:
> > >                          Cluster#
> > > Attribute    Full Data          0          1          2
> > >                  (3.0)      (1.0)      (1.0)      (1.0)
> > > =======================================================
> > > I                    0          0          0          0
> > > am                   0          0          0          0
> > > are                  0          0          0          0
> > > bebo            0.3333          0          0          1
> > > different            0          0          0          0
> > > great                1          1          0          2
> > > haytham         0.3333          0          1          0
> > > hello                0          0          0          0
> > > how                  0          0          0          0
> > > man                  0          0          0          0
> > > mazen           0.3333          0          1          0
> > > movie           1.3333          2          0          2
> > > samir           0.3333          0          1          0
> > > sir                  0          0          0          0
> > > totally              0          0          0          0
> > > you                  0          0          0          0
> > >
> > > What's strange here is, even though we set the number of clusters to 4, the model outputs only three clusters; thus, the evaluation of this model is not reasonable. Below is a part of model evaluation:
> > >
> > > Clusters to classes mapping:
> > >   1. Cluster: no class
> > >   2. Cluster: B (2)
> > >   3. Cluster: A (1)
> > >
> > > Classes to clusters mapping:
> > >   1. Class (A): 3. Cluster
> > >   2. Class (B): 2. Cluster
> > >
> > >
> > > === Summary ===
> > >
> > > Correctly Classified Instances           2               66.6667 %
> > > Incorrectly Classified Instances         0                0      %
> > > Kappa statistic                          1
> > > Mean absolute error                      0
> > > Root mean squared error                  0
> > > Relative absolute error                  0      %
> > > Root relative squared error              0      %
> > > UnClassified Instances                   1               33.3333 %
> > > Total Number of Instances                3
> > > Ignored Class Unknown Instances                  5
> > >
> > > Note that I am using WEKA.3.9.2-SNAPSHOT.
> > >
> > > Best,
> > > Haytham
> > > _______________________________________________
> > > Wekalist mailing list
> > > Send posts to: [hidden email]
> > > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> >
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> >
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html




_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent results between ClusterEvaluation and ClassificationViaClustering

Eibe Frank-2
Administrator
This is a good idea. We should make the other behaviour (no unclassified instances) an option in ClassificationViaClustering.

Cheers,
Eibe

> On 5 Jun 2017, at 07:49, Haytham Salhi <[hidden email]> wrote:
>
> Any idea on this issue?
>
> On Sat, Jun 3, 2017 at 12:37 AM, Haytham Salhi <[hidden email]> wrote:
> Just to correct this line for last message:
>
> As per classes to clusters evaluation, cluster 2 is predicted to be A, and cluster 1 is predicted to be B. All instances with missing class are ignored. The output of ClusterEvaluation is:
>
> Sorry for the confusion.
>
> On Sat, Jun 3, 2017 at 12:31 AM, Haytham Salhi <[hidden email]> wrote:
> Hello again,
>
> Much thanks Ebie and Mark for resolving the issue. I have tested it and it seems to be working fine.
>
> However, there is also an important case at which I guess it is worth looking:
>
> In short:
>
> labeled instances that are placed in clusters (regardless these clusters have classes or do not) other than the correct cluster (as per classes-to-clusters policy) are considered as incorrectly clustered instances in ClusterEvaluation, which makes total sense. However, ClassificationViaClustering considers instances that are placed in an "unlabeled" cluster as an unclassified instance; and thus they are not taken into account when calculating measures such as precesion/recall/F-measure by using Evaluation component. Attachements dataset1.arff and dataset2.arff are examples.
>
> More details:
>
> Let's say we have similar dataset (attached as dataset1.arff) as above having instances ordered from 1 to 8 and class labels: A and B. The algorithm outputs 4 clusters as follows: (Note: 3->B means instance 3 has actual class of B, 8->? means instance 8 has no class, and so on).
>
>
> Cluster 0: {2->?, 3->B}
> Cluster 1: {5->?, 6->?, 7->B}
> Cluster 2: {1->A, 4->A}
> Cluster 3: {8->?}
>
> As per classes to clusters evaluation, cluster 4 is predicted to be A, and cluster 3 is predicted to be B. All instances with missing class are ignored. The output of ClusterEvaluation is:
>
>  0 1 2  <-- assigned to cluster
>  0 0 2 | A
>  1 1 0 | B
>
> Cluster 0 <-- No class
> Cluster 1 <-- B
> Cluster 2 <-- A
>
> Incorrectly clustered instances : 1.0 12.5    %
>
> Which makes total sense. It considers instance #3 as an incorrectly clustered instance. ClassificationViaClustering, however, considers it as an unclassified instance. ClassificationViaClustering's output:
>
> === Summary ===
>
> Correctly Classified Instances           3               75      %
> Incorrectly Classified Instances         0                0      %
> Kappa statistic                          1    
> Mean absolute error                      0    
> Root mean squared error                  0    
> Relative absolute error                  0      %
> Root relative squared error              0      %
> UnClassified Instances                   1               25      %
> Total Number of Instances                4    
> Ignored Class Unknown Instances                  4    
> Weighted precesion = 1.0
> Weighted recall = 1.0
> Weighted Macro F measure = 1.0
> Averaged Macro F measure = 1.0
> Averaged Micro F measure = 1.0
>
> === Confusion Matrix ===
>
>  a b   <-- classified as
>  2 0 | a = A
>  0 1 | b = B
>
> Now, here is the debate about instance #3. Which is more correct? to consider it as unclassified instance or incorrectly classified instance? I guess since it has a class (i.e., B) and is clustered in a cluster different than "the cluster B", it might be reasonable to consider it as incorrectly classified instance even though cluster 0 has no class. Additionally, even though the accuracy is correct and takes into account the unclassified instances, other measures do not. Look at the precesion/recall/F-measure. They all have 100% percentages and I think it should not be that case because instance 3 has actual class B and is placed in an incorrect cluster! That's why it might be perhaps better to count it as incorrectly classified instances.
>
>
> What do you think?
>
> Best,
> Haytham
>
> On Mon, May 29, 2017 at 1:10 AM, Eibe Frank <[hidden email]> wrote:
> Mark will hopefully have time to make the package release soon. You can check here whether it has been made: https://sourceforge.net/projects/weka/files/weka-packages/
>
> Alternatively, yes, you can build the package yourself by checking it out from SVN.
>
> Cheers,
> Eibe
>
> > On 29/05/2017, at 10:47 AM, Haytham Salhi <[hidden email]> wrote:
> >
> > Thanks a lot Eibe. Are there nightly snapshots of Weka packages or I should build it myself?
> >
> > On Sun, May 28, 2017 at 10:13 AM, Eibe Frank <[hidden email]> wrote:
> > This will be fixed in the next release (1.0.6) of the classificationViaClustering package (it's already fixed in the SVN repository). Instances with missing class values were deleted in ClassificationViaClustering, that's why you got a different result. This is no longer the case in the new version.
> >
> > Thanks for reporting this bug.
> >
> > Cheers,
> > Eibe
> >
> > > On 28 May 2017, at 02:35, Haytham Salhi <[hidden email]> wrote:
> > >
> > > The dataset is included in the first point above in ARFF format. Please let me know if you want any further info.
> > >
> > > Best,
> > > Haytham
> > >
> > > On Sat, May 27, 2017 at 7:58 AM, Eibe Frank <[hidden email]> wrote:
> > > Could you perhaps send us your data?
> > >
> > > Cheers,
> > > Eibe
> > >
> > > > On 26 May 2017, at 12:20, Haytham Salhi <[hidden email]> wrote:
> > > >
> > > > Hello Weka,
> > > >
> > > > As we have now ClusterEvaluation (and ClassificationViaClustering) ignoring the instances whose class attribute is missing when doing "classes-to-clusters" evaluation, ClusterEvaluation is tested and seems to be working fine. However, ClassificationViaClustering along with Evaluation still behaves strangely.
> > > >
> > > > As an example, let's take the following simple case:
> > > >
> > > > 1- Assume we have the following data with a class attribute:
> > > >
> > > > @relation 'example'
> > > >
> > > > @attribute theClass {A,B}
> > > > @attribute I numeric
> > > > @attribute am numeric
> > > > @attribute are numeric
> > > > @attribute bebo numeric
> > > > @attribute different numeric
> > > > @attribute great numeric
> > > > @attribute haytham numeric
> > > > @attribute hello numeric
> > > > @attribute how numeric
> > > > @attribute man numeric
> > > > @attribute mazen numeric
> > > > @attribute movie numeric
> > > > @attribute samir numeric
> > > > @attribute sir numeric
> > > > @attribute totally numeric
> > > > @attribute you numeric
> > > >
> > > > @data
> > > > {6 1,12 2}
> > > > {0 ?,11 2,13 1}
> > > > {0 B,7 1,11 1,13 1}
> > > > {4 1,6 2,12 2}
> > > > {0 ?,3 1,8 1,9 1,10 1,16 1}
> > > > {0 ?,3 1,8 1,9 1,16 1}
> > > > {0 ?,3 1,8 1,9 1,14 1,16 1}
> > > > {0 ?,1 1,2 1,5 3,15 1}
> > > >
> > > > 2- As we can see, we have two instances with class A (first and fourth) and one with class B (third). Other instances' classes are missing.
> > > >
> > > > 3- Let's assume we want to do k-means clustering with k =4 and with kmeans++ as an initialization method. The model output is:
> > > >
> > > > Number of iterations: 2
> > > > Within cluster sum of squared errors: 2.5833333333333335
> > > >
> > > > Initial starting points (k-means++):
> > > >
> > > > Cluster 0: {2 1,7 1,8 1,15 1}
> > > > Cluster 1: {6 1,10 1,12 1}
> > > > Cluster 2: {3 1,5 2,11 2}
> > > > Cluster 3: {0 1,1 1,4 3,14 1}
> > > >
> > > > Missing values globally replaced with mean/mode
> > > >
> > > > Final cluster centroids:
> > > >                          Cluster#
> > > > Attribute    Full Data          0          1          2          3
> > > >                  (8.0)      (3.0)      (2.0)      (2.0)      (1.0)
> > > > ==================================================================
> > > > I                0.125          0          0          0          1
> > > > am               0.125          0          0          0          1
> > > > are              0.375          1          0          0          0
> > > > bebo             0.125          0          0        0.5          0
> > > > different        0.375          0          0          0          3
> > > > great            0.375          0          0        1.5          0
> > > > haytham          0.125          0        0.5          0          0
> > > > hello            0.375          1          0          0          0
> > > > how              0.375          1          0          0          0
> > > > man              0.125     0.3333          0          0          0
> > > > mazen            0.375          0        1.5          0          0
> > > > movie              0.5          0          0          2          0
> > > > samir             0.25          0          1          0          0
> > > > sir              0.125     0.3333          0          0          0
> > > > totally          0.125          0          0          0          1
> > > > you              0.375          1          0          0          0
> > > >
> > > > 4- After building the clusterer and doing the evaluation (using ClusterEvaluation), we have the following reasonable results:
> > > >
> > > > Clustered Instances
> > > >
> > > > 0      3 ( 38%)
> > > > 1      2 ( 25%)
> > > > 2      2 ( 25%)
> > > > 3      1 ( 13%)
> > > >
> > > >
> > > > Class attribute: theClass
> > > > Classes to Clusters:
> > > >
> > > >  1 2  <-- assigned to cluster
> > > >  0 2 | A
> > > >  1 0 | B
> > > >
> > > > Cluster 1 <-- B
> > > > Cluster 2 <-- A
> > > >
> > > > Incorrectly clustered instances :     0.0       0      %
> > > >
> > > > Cluster assignments: [2.0, 1.0, 1.0, 2.0, 0.0, 0.0, 0.0, 3.0]
> > > >
> > > > Here, class 3 and class 0 are ignored in "classses-to-clusters" evaluation and this makes total sense.
> > > >
> > > > 5- However, when buidling the model using ClassificationViaClustering with same clusterer settings, the model output is:
> > > >
> > > > Number of iterations: 2
> > > > Within cluster sum of squared errors: 0.0
> > > >
> > > > Initial starting points (k-means++):
> > > >
> > > > Cluster 0: {5 1,11 2}
> > > > Cluster 1: {6 1,10 1,12 1}
> > > > Cluster 2: {3 1,5 2,11 2}
> > > >
> > > > Missing values globally replaced with mean/mode
> > > >
> > > > Final cluster centroids:
> > > >                          Cluster#
> > > > Attribute    Full Data          0          1          2
> > > >                  (3.0)      (1.0)      (1.0)      (1.0)
> > > > =======================================================
> > > > I                    0          0          0          0
> > > > am                   0          0          0          0
> > > > are                  0          0          0          0
> > > > bebo            0.3333          0          0          1
> > > > different            0          0          0          0
> > > > great                1          1          0          2
> > > > haytham         0.3333          0          1          0
> > > > hello                0          0          0          0
> > > > how                  0          0          0          0
> > > > man                  0          0          0          0
> > > > mazen           0.3333          0          1          0
> > > > movie           1.3333          2          0          2
> > > > samir           0.3333          0          1          0
> > > > sir                  0          0          0          0
> > > > totally              0          0          0          0
> > > > you                  0          0          0          0
> > > >
> > > > What's strange here is, even though we set the number of clusters to 4, the model outputs only three clusters; thus, the evaluation of this model is not reasonable. Below is a part of model evaluation:
> > > >
> > > > Clusters to classes mapping:
> > > >   1. Cluster: no class
> > > >   2. Cluster: B (2)
> > > >   3. Cluster: A (1)
> > > >
> > > > Classes to clusters mapping:
> > > >   1. Class (A): 3. Cluster
> > > >   2. Class (B): 2. Cluster
> > > >
> > > >
> > > > === Summary ===
> > > >
> > > > Correctly Classified Instances           2               66.6667 %
> > > > Incorrectly Classified Instances         0                0      %
> > > > Kappa statistic                          1
> > > > Mean absolute error                      0
> > > > Root mean squared error                  0
> > > > Relative absolute error                  0      %
> > > > Root relative squared error              0      %
> > > > UnClassified Instances                   1               33.3333 %
> > > > Total Number of Instances                3
> > > > Ignored Class Unknown Instances                  5
> > > >
> > > > Note that I am using WEKA.3.9.2-SNAPSHOT.
> > > >
> > > > Best,
> > > > Haytham
> > > > _______________________________________________
> > > > Wekalist mailing list
> > > > Send posts to: [hidden email]
> > > > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > > > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> > >
> > > _______________________________________________
> > > Wekalist mailing list
> > > Send posts to: [hidden email]
> > > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> > >
> > > _______________________________________________
> > > Wekalist mailing list
> > > Send posts to: [hidden email]
> > > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> >
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> >
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent results between ClusterEvaluation and ClassificationViaClustering

haytham.salhi
Thanks a lot Eibe. When is it expected to be modified to support this behavior?

On Mon, Jun 5, 2017 at 6:17 AM, Eibe Frank <[hidden email]> wrote:
This is a good idea. We should make the other behaviour (no unclassified instances) an option in ClassificationViaClustering.

Cheers,
Eibe

> On 5 Jun 2017, at 07:49, Haytham Salhi <[hidden email]> wrote:
>
> Any idea on this issue?
>
> On Sat, Jun 3, 2017 at 12:37 AM, Haytham Salhi <[hidden email]> wrote:
> Just to correct this line for last message:
>
> As per classes to clusters evaluation, cluster 2 is predicted to be A, and cluster 1 is predicted to be B. All instances with missing class are ignored. The output of ClusterEvaluation is:
>
> Sorry for the confusion.
>
> On Sat, Jun 3, 2017 at 12:31 AM, Haytham Salhi <[hidden email]> wrote:
> Hello again,
>
> Much thanks Ebie and Mark for resolving the issue. I have tested it and it seems to be working fine.
>
> However, there is also an important case at which I guess it is worth looking:
>
> In short:
>
> labeled instances that are placed in clusters (regardless these clusters have classes or do not) other than the correct cluster (as per classes-to-clusters policy) are considered as incorrectly clustered instances in ClusterEvaluation, which makes total sense. However, ClassificationViaClustering considers instances that are placed in an "unlabeled" cluster as an unclassified instance; and thus they are not taken into account when calculating measures such as precesion/recall/F-measure by using Evaluation component. Attachements dataset1.arff and dataset2.arff are examples.
>
> More details:
>
> Let's say we have similar dataset (attached as dataset1.arff) as above having instances ordered from 1 to 8 and class labels: A and B. The algorithm outputs 4 clusters as follows: (Note: 3->B means instance 3 has actual class of B, 8->? means instance 8 has no class, and so on).
>
>
> Cluster 0: {2->?, 3->B}
> Cluster 1: {5->?, 6->?, 7->B}
> Cluster 2: {1->A, 4->A}
> Cluster 3: {8->?}
>
> As per classes to clusters evaluation, cluster 4 is predicted to be A, and cluster 3 is predicted to be B. All instances with missing class are ignored. The output of ClusterEvaluation is:
>
>  0 1 2  <-- assigned to cluster
>  0 0 2 | A
>  1 1 0 | B
>
> Cluster 0 <-- No class
> Cluster 1 <-- B
> Cluster 2 <-- A
>
> Incorrectly clustered instances :     1.0      12.5    %
>
> Which makes total sense. It considers instance #3 as an incorrectly clustered instance. ClassificationViaClustering, however, considers it as an unclassified instance. ClassificationViaClustering's output:
>
> === Summary ===
>
> Correctly Classified Instances           3               75      %
> Incorrectly Classified Instances         0                0      %
> Kappa statistic                          1
> Mean absolute error                      0
> Root mean squared error                  0
> Relative absolute error                  0      %
> Root relative squared error              0      %
> UnClassified Instances                   1               25      %
> Total Number of Instances                4
> Ignored Class Unknown Instances                  4
> Weighted precesion = 1.0
> Weighted recall = 1.0
> Weighted Macro F measure = 1.0
> Averaged Macro F measure = 1.0
> Averaged Micro F measure = 1.0
>
> === Confusion Matrix ===
>
>  a b   <-- classified as
>  2 0 | a = A
>  0 1 | b = B
>
> Now, here is the debate about instance #3. Which is more correct? to consider it as unclassified instance or incorrectly classified instance? I guess since it has a class (i.e., B) and is clustered in a cluster different than "the cluster B", it might be reasonable to consider it as incorrectly classified instance even though cluster 0 has no class. Additionally, even though the accuracy is correct and takes into account the unclassified instances, other measures do not. Look at the precesion/recall/F-measure. They all have 100% percentages and I think it should not be that case because instance 3 has actual class B and is placed in an incorrect cluster! That's why it might be perhaps better to count it as incorrectly classified instances.
>
>
> What do you think?
>
> Best,
> Haytham
>
> On Mon, May 29, 2017 at 1:10 AM, Eibe Frank <[hidden email]> wrote:
> Mark will hopefully have time to make the package release soon. You can check here whether it has been made: https://sourceforge.net/projects/weka/files/weka-packages/
>
> Alternatively, yes, you can build the package yourself by checking it out from SVN.
>
> Cheers,
> Eibe
>
> > On 29/05/2017, at 10:47 AM, Haytham Salhi <[hidden email]> wrote:
> >
> > Thanks a lot Eibe. Are there nightly snapshots of Weka packages or I should build it myself?
> >
> > On Sun, May 28, 2017 at 10:13 AM, Eibe Frank <[hidden email]> wrote:
> > This will be fixed in the next release (1.0.6) of the classificationViaClustering package (it's already fixed in the SVN repository). Instances with missing class values were deleted in ClassificationViaClustering, that's why you got a different result. This is no longer the case in the new version.
> >
> > Thanks for reporting this bug.
> >
> > Cheers,
> > Eibe
> >
> > > On 28 May 2017, at 02:35, Haytham Salhi <[hidden email]> wrote:
> > >
> > > The dataset is included in the first point above in ARFF format. Please let me know if you want any further info.
> > >
> > > Best,
> > > Haytham
> > >
> > > On Sat, May 27, 2017 at 7:58 AM, Eibe Frank <[hidden email]> wrote:
> > > Could you perhaps send us your data?
> > >
> > > Cheers,
> > > Eibe
> > >
> > > > On 26 May 2017, at 12:20, Haytham Salhi <[hidden email]> wrote:
> > > >
> > > > Hello Weka,
> > > >
> > > > As we have now ClusterEvaluation (and ClassificationViaClustering) ignoring the instances whose class attribute is missing when doing "classes-to-clusters" evaluation, ClusterEvaluation is tested and seems to be working fine. However, ClassificationViaClustering along with Evaluation still behaves strangely.
> > > >
> > > > As an example, let's take the following simple case:
> > > >
> > > > 1- Assume we have the following data with a class attribute:
> > > >
> > > > @relation 'example'
> > > >
> > > > @attribute theClass {A,B}
> > > > @attribute I numeric
> > > > @attribute am numeric
> > > > @attribute are numeric
> > > > @attribute bebo numeric
> > > > @attribute different numeric
> > > > @attribute great numeric
> > > > @attribute haytham numeric
> > > > @attribute hello numeric
> > > > @attribute how numeric
> > > > @attribute man numeric
> > > > @attribute mazen numeric
> > > > @attribute movie numeric
> > > > @attribute samir numeric
> > > > @attribute sir numeric
> > > > @attribute totally numeric
> > > > @attribute you numeric
> > > >
> > > > @data
> > > > {6 1,12 2}
> > > > {0 ?,11 2,13 1}
> > > > {0 B,7 1,11 1,13 1}
> > > > {4 1,6 2,12 2}
> > > > {0 ?,3 1,8 1,9 1,10 1,16 1}
> > > > {0 ?,3 1,8 1,9 1,16 1}
> > > > {0 ?,3 1,8 1,9 1,14 1,16 1}
> > > > {0 ?,1 1,2 1,5 3,15 1}
> > > >
> > > > 2- As we can see, we have two instances with class A (first and fourth) and one with class B (third). Other instances' classes are missing.
> > > >
> > > > 3- Let's assume we want to do k-means clustering with k =4 and with kmeans++ as an initialization method. The model output is:
> > > >
> > > > Number of iterations: 2
> > > > Within cluster sum of squared errors: 2.5833333333333335
> > > >
> > > > Initial starting points (k-means++):
> > > >
> > > > Cluster 0: {2 1,7 1,8 1,15 1}
> > > > Cluster 1: {6 1,10 1,12 1}
> > > > Cluster 2: {3 1,5 2,11 2}
> > > > Cluster 3: {0 1,1 1,4 3,14 1}
> > > >
> > > > Missing values globally replaced with mean/mode
> > > >
> > > > Final cluster centroids:
> > > >                          Cluster#
> > > > Attribute    Full Data          0          1          2          3
> > > >                  (8.0)      (3.0)      (2.0)      (2.0)      (1.0)
> > > > ==================================================================
> > > > I                0.125          0          0          0          1
> > > > am               0.125          0          0          0          1
> > > > are              0.375          1          0          0          0
> > > > bebo             0.125          0          0        0.5          0
> > > > different        0.375          0          0          0          3
> > > > great            0.375          0          0        1.5          0
> > > > haytham          0.125          0        0.5          0          0
> > > > hello            0.375          1          0          0          0
> > > > how              0.375          1          0          0          0
> > > > man              0.125     0.3333          0          0          0
> > > > mazen            0.375          0        1.5          0          0
> > > > movie              0.5          0          0          2          0
> > > > samir             0.25          0          1          0          0
> > > > sir              0.125     0.3333          0          0          0
> > > > totally          0.125          0          0          0          1
> > > > you              0.375          1          0          0          0
> > > >
> > > > 4- After building the clusterer and doing the evaluation (using ClusterEvaluation), we have the following reasonable results:
> > > >
> > > > Clustered Instances
> > > >
> > > > 0      3 ( 38%)
> > > > 1      2 ( 25%)
> > > > 2      2 ( 25%)
> > > > 3      1 ( 13%)
> > > >
> > > >
> > > > Class attribute: theClass
> > > > Classes to Clusters:
> > > >
> > > >  1 2  <-- assigned to cluster
> > > >  0 2 | A
> > > >  1 0 | B
> > > >
> > > > Cluster 1 <-- B
> > > > Cluster 2 <-- A
> > > >
> > > > Incorrectly clustered instances :     0.0       0      %
> > > >
> > > > Cluster assignments: [2.0, 1.0, 1.0, 2.0, 0.0, 0.0, 0.0, 3.0]
> > > >
> > > > Here, class 3 and class 0 are ignored in "classses-to-clusters" evaluation and this makes total sense.
> > > >
> > > > 5- However, when buidling the model using ClassificationViaClustering with same clusterer settings, the model output is:
> > > >
> > > > Number of iterations: 2
> > > > Within cluster sum of squared errors: 0.0
> > > >
> > > > Initial starting points (k-means++):
> > > >
> > > > Cluster 0: {5 1,11 2}
> > > > Cluster 1: {6 1,10 1,12 1}
> > > > Cluster 2: {3 1,5 2,11 2}
> > > >
> > > > Missing values globally replaced with mean/mode
> > > >
> > > > Final cluster centroids:
> > > >                          Cluster#
> > > > Attribute    Full Data          0          1          2
> > > >                  (3.0)      (1.0)      (1.0)      (1.0)
> > > > =======================================================
> > > > I                    0          0          0          0
> > > > am                   0          0          0          0
> > > > are                  0          0          0          0
> > > > bebo            0.3333          0          0          1
> > > > different            0          0          0          0
> > > > great                1          1          0          2
> > > > haytham         0.3333          0          1          0
> > > > hello                0          0          0          0
> > > > how                  0          0          0          0
> > > > man                  0          0          0          0
> > > > mazen           0.3333          0          1          0
> > > > movie           1.3333          2          0          2
> > > > samir           0.3333          0          1          0
> > > > sir                  0          0          0          0
> > > > totally              0          0          0          0
> > > > you                  0          0          0          0
> > > >
> > > > What's strange here is, even though we set the number of clusters to 4, the model outputs only three clusters; thus, the evaluation of this model is not reasonable. Below is a part of model evaluation:
> > > >
> > > > Clusters to classes mapping:
> > > >   1. Cluster: no class
> > > >   2. Cluster: B (2)
> > > >   3. Cluster: A (1)
> > > >
> > > > Classes to clusters mapping:
> > > >   1. Class (A): 3. Cluster
> > > >   2. Class (B): 2. Cluster
> > > >
> > > >
> > > > === Summary ===
> > > >
> > > > Correctly Classified Instances           2               66.6667 %
> > > > Incorrectly Classified Instances         0                0      %
> > > > Kappa statistic                          1
> > > > Mean absolute error                      0
> > > > Root mean squared error                  0
> > > > Relative absolute error                  0      %
> > > > Root relative squared error              0      %
> > > > UnClassified Instances                   1               33.3333 %
> > > > Total Number of Instances                3
> > > > Ignored Class Unknown Instances                  5
> > > >
> > > > Note that I am using WEKA.3.9.2-SNAPSHOT.
> > > >
> > > > Best,
> > > > Haytham
> > > > _______________________________________________
> > > > Wekalist mailing list
> > > > Send posts to: [hidden email]
> > > > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > > > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> > >
> > > _______________________________________________
> > > Wekalist mailing list
> > > Send posts to: [hidden email]
> > > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> > >
> > > _______________________________________________
> > > Wekalist mailing list
> > > Send posts to: [hidden email]
> > > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> >
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> >
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent results between ClusterEvaluation and ClassificationViaClustering

haytham.salhi
Hello Eibe,

Any updates on this? If not yet, can I update the code locally? If yes, could you please point me out where I can exactly modify? 

Thank you in advance.

Best,
Haytham

On Mon, Jun 5, 2017 at 7:33 PM, Haytham Salhi <[hidden email]> wrote:
Thanks a lot Eibe. When is it expected to be modified to support this behavior?

On Mon, Jun 5, 2017 at 6:17 AM, Eibe Frank <[hidden email]> wrote:
This is a good idea. We should make the other behaviour (no unclassified instances) an option in ClassificationViaClustering.

Cheers,
Eibe

> On 5 Jun 2017, at 07:49, Haytham Salhi <[hidden email]> wrote:
>
> Any idea on this issue?
>
> On Sat, Jun 3, 2017 at 12:37 AM, Haytham Salhi <[hidden email]> wrote:
> Just to correct this line for last message:
>
> As per classes to clusters evaluation, cluster 2 is predicted to be A, and cluster 1 is predicted to be B. All instances with missing class are ignored. The output of ClusterEvaluation is:
>
> Sorry for the confusion.
>
> On Sat, Jun 3, 2017 at 12:31 AM, Haytham Salhi <[hidden email]> wrote:
> Hello again,
>
> Much thanks Ebie and Mark for resolving the issue. I have tested it and it seems to be working fine.
>
> However, there is also an important case at which I guess it is worth looking:
>
> In short:
>
> labeled instances that are placed in clusters (regardless these clusters have classes or do not) other than the correct cluster (as per classes-to-clusters policy) are considered as incorrectly clustered instances in ClusterEvaluation, which makes total sense. However, ClassificationViaClustering considers instances that are placed in an "unlabeled" cluster as an unclassified instance; and thus they are not taken into account when calculating measures such as precesion/recall/F-measure by using Evaluation component. Attachements dataset1.arff and dataset2.arff are examples.
>
> More details:
>
> Let's say we have similar dataset (attached as dataset1.arff) as above having instances ordered from 1 to 8 and class labels: A and B. The algorithm outputs 4 clusters as follows: (Note: 3->B means instance 3 has actual class of B, 8->? means instance 8 has no class, and so on).
>
>
> Cluster 0: {2->?, 3->B}
> Cluster 1: {5->?, 6->?, 7->B}
> Cluster 2: {1->A, 4->A}
> Cluster 3: {8->?}
>
> As per classes to clusters evaluation, cluster 4 is predicted to be A, and cluster 3 is predicted to be B. All instances with missing class are ignored. The output of ClusterEvaluation is:
>
>  0 1 2  <-- assigned to cluster
>  0 0 2 | A
>  1 1 0 | B
>
> Cluster 0 <-- No class
> Cluster 1 <-- B
> Cluster 2 <-- A
>
> Incorrectly clustered instances :     1.0      12.5    %
>
> Which makes total sense. It considers instance #3 as an incorrectly clustered instance. ClassificationViaClustering, however, considers it as an unclassified instance. ClassificationViaClustering's output:
>
> === Summary ===
>
> Correctly Classified Instances           3               75      %
> Incorrectly Classified Instances         0                0      %
> Kappa statistic                          1
> Mean absolute error                      0
> Root mean squared error                  0
> Relative absolute error                  0      %
> Root relative squared error              0      %
> UnClassified Instances                   1               25      %
> Total Number of Instances                4
> Ignored Class Unknown Instances                  4
> Weighted precesion = 1.0
> Weighted recall = 1.0
> Weighted Macro F measure = 1.0
> Averaged Macro F measure = 1.0
> Averaged Micro F measure = 1.0
>
> === Confusion Matrix ===
>
>  a b   <-- classified as
>  2 0 | a = A
>  0 1 | b = B
>
> Now, here is the debate about instance #3. Which is more correct? to consider it as unclassified instance or incorrectly classified instance? I guess since it has a class (i.e., B) and is clustered in a cluster different than "the cluster B", it might be reasonable to consider it as incorrectly classified instance even though cluster 0 has no class. Additionally, even though the accuracy is correct and takes into account the unclassified instances, other measures do not. Look at the precesion/recall/F-measure. They all have 100% percentages and I think it should not be that case because instance 3 has actual class B and is placed in an incorrect cluster! That's why it might be perhaps better to count it as incorrectly classified instances.
>
>
> What do you think?
>
> Best,
> Haytham
>
> On Mon, May 29, 2017 at 1:10 AM, Eibe Frank <[hidden email]> wrote:
> Mark will hopefully have time to make the package release soon. You can check here whether it has been made: https://sourceforge.net/projects/weka/files/weka-packages/
>
> Alternatively, yes, you can build the package yourself by checking it out from SVN.
>
> Cheers,
> Eibe
>
> > On 29/05/2017, at 10:47 AM, Haytham Salhi <[hidden email]> wrote:
> >
> > Thanks a lot Eibe. Are there nightly snapshots of Weka packages or I should build it myself?
> >
> > On Sun, May 28, 2017 at 10:13 AM, Eibe Frank <[hidden email]> wrote:
> > This will be fixed in the next release (1.0.6) of the classificationViaClustering package (it's already fixed in the SVN repository). Instances with missing class values were deleted in ClassificationViaClustering, that's why you got a different result. This is no longer the case in the new version.
> >
> > Thanks for reporting this bug.
> >
> > Cheers,
> > Eibe
> >
> > > On 28 May 2017, at 02:35, Haytham Salhi <[hidden email]> wrote:
> > >
> > > The dataset is included in the first point above in ARFF format. Please let me know if you want any further info.
> > >
> > > Best,
> > > Haytham
> > >
> > > On Sat, May 27, 2017 at 7:58 AM, Eibe Frank <[hidden email]> wrote:
> > > Could you perhaps send us your data?
> > >
> > > Cheers,
> > > Eibe
> > >
> > > > On 26 May 2017, at 12:20, Haytham Salhi <[hidden email]> wrote:
> > > >
> > > > Hello Weka,
> > > >
> > > > As we have now ClusterEvaluation (and ClassificationViaClustering) ignoring the instances whose class attribute is missing when doing "classes-to-clusters" evaluation, ClusterEvaluation is tested and seems to be working fine. However, ClassificationViaClustering along with Evaluation still behaves strangely.
> > > >
> > > > As an example, let's take the following simple case:
> > > >
> > > > 1- Assume we have the following data with a class attribute:
> > > >
> > > > @relation 'example'
> > > >
> > > > @attribute theClass {A,B}
> > > > @attribute I numeric
> > > > @attribute am numeric
> > > > @attribute are numeric
> > > > @attribute bebo numeric
> > > > @attribute different numeric
> > > > @attribute great numeric
> > > > @attribute haytham numeric
> > > > @attribute hello numeric
> > > > @attribute how numeric
> > > > @attribute man numeric
> > > > @attribute mazen numeric
> > > > @attribute movie numeric
> > > > @attribute samir numeric
> > > > @attribute sir numeric
> > > > @attribute totally numeric
> > > > @attribute you numeric
> > > >
> > > > @data
> > > > {6 1,12 2}
> > > > {0 ?,11 2,13 1}
> > > > {0 B,7 1,11 1,13 1}
> > > > {4 1,6 2,12 2}
> > > > {0 ?,3 1,8 1,9 1,10 1,16 1}
> > > > {0 ?,3 1,8 1,9 1,16 1}
> > > > {0 ?,3 1,8 1,9 1,14 1,16 1}
> > > > {0 ?,1 1,2 1,5 3,15 1}
> > > >
> > > > 2- As we can see, we have two instances with class A (first and fourth) and one with class B (third). Other instances' classes are missing.
> > > >
> > > > 3- Let's assume we want to do k-means clustering with k =4 and with kmeans++ as an initialization method. The model output is:
> > > >
> > > > Number of iterations: 2
> > > > Within cluster sum of squared errors: 2.5833333333333335
> > > >
> > > > Initial starting points (k-means++):
> > > >
> > > > Cluster 0: {2 1,7 1,8 1,15 1}
> > > > Cluster 1: {6 1,10 1,12 1}
> > > > Cluster 2: {3 1,5 2,11 2}
> > > > Cluster 3: {0 1,1 1,4 3,14 1}
> > > >
> > > > Missing values globally replaced with mean/mode
> > > >
> > > > Final cluster centroids:
> > > >                          Cluster#
> > > > Attribute    Full Data          0          1          2          3
> > > >                  (8.0)      (3.0)      (2.0)      (2.0)      (1.0)
> > > > ==================================================================
> > > > I                0.125          0          0          0          1
> > > > am               0.125          0          0          0          1
> > > > are              0.375          1          0          0          0
> > > > bebo             0.125          0          0        0.5          0
> > > > different        0.375          0          0          0          3
> > > > great            0.375          0          0        1.5          0
> > > > haytham          0.125          0        0.5          0          0
> > > > hello            0.375          1          0          0          0
> > > > how              0.375          1          0          0          0
> > > > man              0.125     0.3333          0          0          0
> > > > mazen            0.375          0        1.5          0          0
> > > > movie              0.5          0          0          2          0
> > > > samir             0.25          0          1          0          0
> > > > sir              0.125     0.3333          0          0          0
> > > > totally          0.125          0          0          0          1
> > > > you              0.375          1          0          0          0
> > > >
> > > > 4- After building the clusterer and doing the evaluation (using ClusterEvaluation), we have the following reasonable results:
> > > >
> > > > Clustered Instances
> > > >
> > > > 0      3 ( 38%)
> > > > 1      2 ( 25%)
> > > > 2      2 ( 25%)
> > > > 3      1 ( 13%)
> > > >
> > > >
> > > > Class attribute: theClass
> > > > Classes to Clusters:
> > > >
> > > >  1 2  <-- assigned to cluster
> > > >  0 2 | A
> > > >  1 0 | B
> > > >
> > > > Cluster 1 <-- B
> > > > Cluster 2 <-- A
> > > >
> > > > Incorrectly clustered instances :     0.0       0      %
> > > >
> > > > Cluster assignments: [2.0, 1.0, 1.0, 2.0, 0.0, 0.0, 0.0, 3.0]
> > > >
> > > > Here, class 3 and class 0 are ignored in "classses-to-clusters" evaluation and this makes total sense.
> > > >
> > > > 5- However, when buidling the model using ClassificationViaClustering with same clusterer settings, the model output is:
> > > >
> > > > Number of iterations: 2
> > > > Within cluster sum of squared errors: 0.0
> > > >
> > > > Initial starting points (k-means++):
> > > >
> > > > Cluster 0: {5 1,11 2}
> > > > Cluster 1: {6 1,10 1,12 1}
> > > > Cluster 2: {3 1,5 2,11 2}
> > > >
> > > > Missing values globally replaced with mean/mode
> > > >
> > > > Final cluster centroids:
> > > >                          Cluster#
> > > > Attribute    Full Data          0          1          2
> > > >                  (3.0)      (1.0)      (1.0)      (1.0)
> > > > =======================================================
> > > > I                    0          0          0          0
> > > > am                   0          0          0          0
> > > > are                  0          0          0          0
> > > > bebo            0.3333          0          0          1
> > > > different            0          0          0          0
> > > > great                1          1          0          2
> > > > haytham         0.3333          0          1          0
> > > > hello                0          0          0          0
> > > > how                  0          0          0          0
> > > > man                  0          0          0          0
> > > > mazen           0.3333          0          1          0
> > > > movie           1.3333          2          0          2
> > > > samir           0.3333          0          1          0
> > > > sir                  0          0          0          0
> > > > totally              0          0          0          0
> > > > you                  0          0          0          0
> > > >
> > > > What's strange here is, even though we set the number of clusters to 4, the model outputs only three clusters; thus, the evaluation of this model is not reasonable. Below is a part of model evaluation:
> > > >
> > > > Clusters to classes mapping:
> > > >   1. Cluster: no class
> > > >   2. Cluster: B (2)
> > > >   3. Cluster: A (1)
> > > >
> > > > Classes to clusters mapping:
> > > >   1. Class (A): 3. Cluster
> > > >   2. Class (B): 2. Cluster
> > > >
> > > >
> > > > === Summary ===
> > > >
> > > > Correctly Classified Instances           2               66.6667 %
> > > > Incorrectly Classified Instances         0                0      %
> > > > Kappa statistic                          1
> > > > Mean absolute error                      0
> > > > Root mean squared error                  0
> > > > Relative absolute error                  0      %
> > > > Root relative squared error              0      %
> > > > UnClassified Instances                   1               33.3333 %
> > > > Total Number of Instances                3
> > > > Ignored Class Unknown Instances                  5
> > > >
> > > > Note that I am using WEKA.3.9.2-SNAPSHOT.
> > > >
> > > > Best,
> > > > Haytham
> > > > _______________________________________________
> > > > Wekalist mailing list
> > > > Send posts to: [hidden email]
> > > > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > > > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> > >
> > > _______________________________________________
> > > Wekalist mailing list
> > > Send posts to: [hidden email]
> > > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> > >
> > > _______________________________________________
> > > Wekalist mailing list
> > > Send posts to: [hidden email]
> > > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> >
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> >
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent results between ClusterEvaluation and ClassificationViaClustering

Eibe Frank-2
Administrator
I looked into this a bit. It turns out that this cannot really be done by modifying ClassificationViaClustering. The best way to get the desired information is to implement a new custom evaluation metric that counts an “unclassified” case as an incorrect prediction. This will not help you with the information in the confusion matrix though.

For info on custom evaluation metrics, take a look here:

  http://weka.wikispaces.com/Pluggable+evaluation+metrics

For some example metrics, take a look at the source code for the following packages:

  http://weka.sourceforge.net/packageMetaData/RankCorrelation/index.html
  http://weka.sourceforge.net/packageMetaData/logarithmicErrorMetrics/index.html
  http://weka.sourceforge.net/packageMetaData/percentageErrorMetrics/index.html

Cheers,
Eibe

> On 9/06/2017, at 3:55 PM, Haytham Salhi <[hidden email]> wrote:
>
> Hello Eibe,
>
> Any updates on this? If not yet, can I update the code locally? If yes, could you please point me out where I can exactly modify?
>
> Thank you in advance.
>
> Best,
> Haytham
>
> On Mon, Jun 5, 2017 at 7:33 PM, Haytham Salhi <[hidden email]> wrote:
> Thanks a lot Eibe. When is it expected to be modified to support this behavior?
>
> On Mon, Jun 5, 2017 at 6:17 AM, Eibe Frank <[hidden email]> wrote:
> This is a good idea. We should make the other behaviour (no unclassified instances) an option in ClassificationViaClustering.
>
> Cheers,
> Eibe
>
> > On 5 Jun 2017, at 07:49, Haytham Salhi <[hidden email]> wrote:
> >
> > Any idea on this issue?
> >
> > On Sat, Jun 3, 2017 at 12:37 AM, Haytham Salhi <[hidden email]> wrote:
> > Just to correct this line for last message:
> >
> > As per classes to clusters evaluation, cluster 2 is predicted to be A, and cluster 1 is predicted to be B. All instances with missing class are ignored. The output of ClusterEvaluation is:
> >
> > Sorry for the confusion.
> >
> > On Sat, Jun 3, 2017 at 12:31 AM, Haytham Salhi <[hidden email]> wrote:
> > Hello again,
> >
> > Much thanks Ebie and Mark for resolving the issue. I have tested it and it seems to be working fine.
> >
> > However, there is also an important case at which I guess it is worth looking:
> >
> > In short:
> >
> > labeled instances that are placed in clusters (regardless these clusters have classes or do not) other than the correct cluster (as per classes-to-clusters policy) are considered as incorrectly clustered instances in ClusterEvaluation, which makes total sense. However, ClassificationViaClustering considers instances that are placed in an "unlabeled" cluster as an unclassified instance; and thus they are not taken into account when calculating measures such as precesion/recall/F-measure by using Evaluation component. Attachements dataset1.arff and dataset2.arff are examples.
> >
> > More details:
> >
> > Let's say we have similar dataset (attached as dataset1.arff) as above having instances ordered from 1 to 8 and class labels: A and B. The algorithm outputs 4 clusters as follows: (Note: 3->B means instance 3 has actual class of B, 8->? means instance 8 has no class, and so on).
> >
> >
> > Cluster 0: {2->?, 3->B}
> > Cluster 1: {5->?, 6->?, 7->B}
> > Cluster 2: {1->A, 4->A}
> > Cluster 3: {8->?}
> >
> > As per classes to clusters evaluation, cluster 4 is predicted to be A, and cluster 3 is predicted to be B. All instances with missing class are ignored. The output of ClusterEvaluation is:
> >
> >  0 1 2  <-- assigned to cluster
> >  0 0 2 | A
> >  1 1 0 | B
> >
> > Cluster 0 <-- No class
> > Cluster 1 <-- B
> > Cluster 2 <-- A
> >
> > Incorrectly clustered instances :     1.0      12.5    %
> >
> > Which makes total sense. It considers instance #3 as an incorrectly clustered instance. ClassificationViaClustering, however, considers it as an unclassified instance. ClassificationViaClustering's output:
> >
> > === Summary ===
> >
> > Correctly Classified Instances           3               75      %
> > Incorrectly Classified Instances         0                0      %
> > Kappa statistic                          1
> > Mean absolute error                      0
> > Root mean squared error                  0
> > Relative absolute error                  0      %
> > Root relative squared error              0      %
> > UnClassified Instances                   1               25      %
> > Total Number of Instances                4
> > Ignored Class Unknown Instances                  4
> > Weighted precesion = 1.0
> > Weighted recall = 1.0
> > Weighted Macro F measure = 1.0
> > Averaged Macro F measure = 1.0
> > Averaged Micro F measure = 1.0
> >
> > === Confusion Matrix ===
> >
> >  a b   <-- classified as
> >  2 0 | a = A
> >  0 1 | b = B
> >
> > Now, here is the debate about instance #3. Which is more correct? to consider it as unclassified instance or incorrectly classified instance? I guess since it has a class (i.e., B) and is clustered in a cluster different than "the cluster B", it might be reasonable to consider it as incorrectly classified instance even though cluster 0 has no class. Additionally, even though the accuracy is correct and takes into account the unclassified instances, other measures do not. Look at the precesion/recall/F-measure. They all have 100% percentages and I think it should not be that case because instance 3 has actual class B and is placed in an incorrect cluster! That's why it might be perhaps better to count it as incorrectly classified instances.
> >
> >
> > What do you think?
> >
> > Best,
> > Haytham
> >
> > On Mon, May 29, 2017 at 1:10 AM, Eibe Frank <[hidden email]> wrote:
> > Mark will hopefully have time to make the package release soon. You can check here whether it has been made: https://sourceforge.net/projects/weka/files/weka-packages/
> >
> > Alternatively, yes, you can build the package yourself by checking it out from SVN.
> >
> > Cheers,
> > Eibe
> >
> > > On 29/05/2017, at 10:47 AM, Haytham Salhi <[hidden email]> wrote:
> > >
> > > Thanks a lot Eibe. Are there nightly snapshots of Weka packages or I should build it myself?
> > >
> > > On Sun, May 28, 2017 at 10:13 AM, Eibe Frank <[hidden email]> wrote:
> > > This will be fixed in the next release (1.0.6) of the classificationViaClustering package (it's already fixed in the SVN repository). Instances with missing class values were deleted in ClassificationViaClustering, that's why you got a different result. This is no longer the case in the new version.
> > >
> > > Thanks for reporting this bug.
> > >
> > > Cheers,
> > > Eibe
> > >
> > > > On 28 May 2017, at 02:35, Haytham Salhi <[hidden email]> wrote:
> > > >
> > > > The dataset is included in the first point above in ARFF format. Please let me know if you want any further info.
> > > >
> > > > Best,
> > > > Haytham
> > > >
> > > > On Sat, May 27, 2017 at 7:58 AM, Eibe Frank <[hidden email]> wrote:
> > > > Could you perhaps send us your data?
> > > >
> > > > Cheers,
> > > > Eibe
> > > >
> > > > > On 26 May 2017, at 12:20, Haytham Salhi <[hidden email]> wrote:
> > > > >
> > > > > Hello Weka,
> > > > >
> > > > > As we have now ClusterEvaluation (and ClassificationViaClustering) ignoring the instances whose class attribute is missing when doing "classes-to-clusters" evaluation, ClusterEvaluation is tested and seems to be working fine. However, ClassificationViaClustering along with Evaluation still behaves strangely.
> > > > >
> > > > > As an example, let's take the following simple case:
> > > > >
> > > > > 1- Assume we have the following data with a class attribute:
> > > > >
> > > > > @relation 'example'
> > > > >
> > > > > @attribute theClass {A,B}
> > > > > @attribute I numeric
> > > > > @attribute am numeric
> > > > > @attribute are numeric
> > > > > @attribute bebo numeric
> > > > > @attribute different numeric
> > > > > @attribute great numeric
> > > > > @attribute haytham numeric
> > > > > @attribute hello numeric
> > > > > @attribute how numeric
> > > > > @attribute man numeric
> > > > > @attribute mazen numeric
> > > > > @attribute movie numeric
> > > > > @attribute samir numeric
> > > > > @attribute sir numeric
> > > > > @attribute totally numeric
> > > > > @attribute you numeric
> > > > >
> > > > > @data
> > > > > {6 1,12 2}
> > > > > {0 ?,11 2,13 1}
> > > > > {0 B,7 1,11 1,13 1}
> > > > > {4 1,6 2,12 2}
> > > > > {0 ?,3 1,8 1,9 1,10 1,16 1}
> > > > > {0 ?,3 1,8 1,9 1,16 1}
> > > > > {0 ?,3 1,8 1,9 1,14 1,16 1}
> > > > > {0 ?,1 1,2 1,5 3,15 1}
> > > > >
> > > > > 2- As we can see, we have two instances with class A (first and fourth) and one with class B (third). Other instances' classes are missing.
> > > > >
> > > > > 3- Let's assume we want to do k-means clustering with k =4 and with kmeans++ as an initialization method. The model output is:
> > > > >
> > > > > Number of iterations: 2
> > > > > Within cluster sum of squared errors: 2.5833333333333335
> > > > >
> > > > > Initial starting points (k-means++):
> > > > >
> > > > > Cluster 0: {2 1,7 1,8 1,15 1}
> > > > > Cluster 1: {6 1,10 1,12 1}
> > > > > Cluster 2: {3 1,5 2,11 2}
> > > > > Cluster 3: {0 1,1 1,4 3,14 1}
> > > > >
> > > > > Missing values globally replaced with mean/mode
> > > > >
> > > > > Final cluster centroids:
> > > > >                          Cluster#
> > > > > Attribute    Full Data          0          1          2          3
> > > > >                  (8.0)      (3.0)      (2.0)      (2.0)      (1.0)
> > > > > ==================================================================
> > > > > I                0.125          0          0          0          1
> > > > > am               0.125          0          0          0          1
> > > > > are              0.375          1          0          0          0
> > > > > bebo             0.125          0          0        0.5          0
> > > > > different        0.375          0          0          0          3
> > > > > great            0.375          0          0        1.5          0
> > > > > haytham          0.125          0        0.5          0          0
> > > > > hello            0.375          1          0          0          0
> > > > > how              0.375          1          0          0          0
> > > > > man              0.125     0.3333          0          0          0
> > > > > mazen            0.375          0        1.5          0          0
> > > > > movie              0.5          0          0          2          0
> > > > > samir             0.25          0          1          0          0
> > > > > sir              0.125     0.3333          0          0          0
> > > > > totally          0.125          0          0          0          1
> > > > > you              0.375          1          0          0          0
> > > > >
> > > > > 4- After building the clusterer and doing the evaluation (using ClusterEvaluation), we have the following reasonable results:
> > > > >
> > > > > Clustered Instances
> > > > >
> > > > > 0      3 ( 38%)
> > > > > 1      2 ( 25%)
> > > > > 2      2 ( 25%)
> > > > > 3      1 ( 13%)
> > > > >
> > > > >
> > > > > Class attribute: theClass
> > > > > Classes to Clusters:
> > > > >
> > > > >  1 2  <-- assigned to cluster
> > > > >  0 2 | A
> > > > >  1 0 | B
> > > > >
> > > > > Cluster 1 <-- B
> > > > > Cluster 2 <-- A
> > > > >
> > > > > Incorrectly clustered instances :     0.0       0      %
> > > > >
> > > > > Cluster assignments: [2.0, 1.0, 1.0, 2.0, 0.0, 0.0, 0.0, 3.0]
> > > > >
> > > > > Here, class 3 and class 0 are ignored in "classses-to-clusters" evaluation and this makes total sense.
> > > > >
> > > > > 5- However, when buidling the model using ClassificationViaClustering with same clusterer settings, the model output is:
> > > > >
> > > > > Number of iterations: 2
> > > > > Within cluster sum of squared errors: 0.0
> > > > >
> > > > > Initial starting points (k-means++):
> > > > >
> > > > > Cluster 0: {5 1,11 2}
> > > > > Cluster 1: {6 1,10 1,12 1}
> > > > > Cluster 2: {3 1,5 2,11 2}
> > > > >
> > > > > Missing values globally replaced with mean/mode
> > > > >
> > > > > Final cluster centroids:
> > > > >                          Cluster#
> > > > > Attribute    Full Data          0          1          2
> > > > >                  (3.0)      (1.0)      (1.0)      (1.0)
> > > > > =======================================================
> > > > > I                    0          0          0          0
> > > > > am                   0          0          0          0
> > > > > are                  0          0          0          0
> > > > > bebo            0.3333          0          0          1
> > > > > different            0          0          0          0
> > > > > great                1          1          0          2
> > > > > haytham         0.3333          0          1          0
> > > > > hello                0          0          0          0
> > > > > how                  0          0          0          0
> > > > > man                  0          0          0          0
> > > > > mazen           0.3333          0          1          0
> > > > > movie           1.3333          2          0          2
> > > > > samir           0.3333          0          1          0
> > > > > sir                  0          0          0          0
> > > > > totally              0          0          0          0
> > > > > you                  0          0          0          0
> > > > >
> > > > > What's strange here is, even though we set the number of clusters to 4, the model outputs only three clusters; thus, the evaluation of this model is not reasonable. Below is a part of model evaluation:
> > > > >
> > > > > Clusters to classes mapping:
> > > > >   1. Cluster: no class
> > > > >   2. Cluster: B (2)
> > > > >   3. Cluster: A (1)
> > > > >
> > > > > Classes to clusters mapping:
> > > > >   1. Class (A): 3. Cluster
> > > > >   2. Class (B): 2. Cluster
> > > > >
> > > > >
> > > > > === Summary ===
> > > > >
> > > > > Correctly Classified Instances           2               66.6667 %
> > > > > Incorrectly Classified Instances         0                0      %
> > > > > Kappa statistic                          1
> > > > > Mean absolute error                      0
> > > > > Root mean squared error                  0
> > > > > Relative absolute error                  0      %
> > > > > Root relative squared error              0      %
> > > > > UnClassified Instances                   1               33.3333 %
> > > > > Total Number of Instances                3
> > > > > Ignored Class Unknown Instances                  5
> > > > >
> > > > > Note that I am using WEKA.3.9.2-SNAPSHOT.
> > > > >
> > > > > Best,
> > > > > Haytham
> > > > > _______________________________________________
> > > > > Wekalist mailing list
> > > > > Send posts to: [hidden email]
> > > > > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > > > > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> > > >
> > > > _______________________________________________
> > > > Wekalist mailing list
> > > > Send posts to: [hidden email]
> > > > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > > > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> > > >
> > > > _______________________________________________
> > > > Wekalist mailing list
> > > > Send posts to: [hidden email]
> > > > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > > > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> > >
> > > _______________________________________________
> > > Wekalist mailing list
> > > Send posts to: [hidden email]
> > > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> > >
> > > _______________________________________________
> > > Wekalist mailing list
> > > Send posts to: [hidden email]
> > > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> >
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> >
> >
> >
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent results between ClusterEvaluation and ClassificationViaClustering

haytham.salhi
Thanks a lot Ebie.

I guess since this will not help with the information in the confusion matrix, the precision/recall/f-measure would not be affected as I see they are calculated using the confusion matrix. What do you think?

On Fri, Jun 9, 2017 at 6:27 AM, Eibe Frank <[hidden email]> wrote:
I looked into this a bit. It turns out that this cannot really be done by modifying ClassificationViaClustering. The best way to get the desired information is to implement a new custom evaluation metric that counts an “unclassified” case as an incorrect prediction. This will not help you with the information in the confusion matrix though.

For info on custom evaluation metrics, take a look here:

  http://weka.wikispaces.com/Pluggable+evaluation+metrics

For some example metrics, take a look at the source code for the following packages:

  http://weka.sourceforge.net/packageMetaData/RankCorrelation/index.html
  http://weka.sourceforge.net/packageMetaData/logarithmicErrorMetrics/index.html
  http://weka.sourceforge.net/packageMetaData/percentageErrorMetrics/index.html

Cheers,
Eibe

> On 9/06/2017, at 3:55 PM, Haytham Salhi <[hidden email]> wrote:
>
> Hello Eibe,
>
> Any updates on this? If not yet, can I update the code locally? If yes, could you please point me out where I can exactly modify?
>
> Thank you in advance.
>
> Best,
> Haytham
>
> On Mon, Jun 5, 2017 at 7:33 PM, Haytham Salhi <[hidden email]> wrote:
> Thanks a lot Eibe. When is it expected to be modified to support this behavior?
>
> On Mon, Jun 5, 2017 at 6:17 AM, Eibe Frank <[hidden email]> wrote:
> This is a good idea. We should make the other behaviour (no unclassified instances) an option in ClassificationViaClustering.
>
> Cheers,
> Eibe
>
> > On 5 Jun 2017, at 07:49, Haytham Salhi <[hidden email]> wrote:
> >
> > Any idea on this issue?
> >
> > On Sat, Jun 3, 2017 at 12:37 AM, Haytham Salhi <[hidden email]> wrote:
> > Just to correct this line for last message:
> >
> > As per classes to clusters evaluation, cluster 2 is predicted to be A, and cluster 1 is predicted to be B. All instances with missing class are ignored. The output of ClusterEvaluation is:
> >
> > Sorry for the confusion.
> >
> > On Sat, Jun 3, 2017 at 12:31 AM, Haytham Salhi <[hidden email]> wrote:
> > Hello again,
> >
> > Much thanks Ebie and Mark for resolving the issue. I have tested it and it seems to be working fine.
> >
> > However, there is also an important case at which I guess it is worth looking:
> >
> > In short:
> >
> > labeled instances that are placed in clusters (regardless these clusters have classes or do not) other than the correct cluster (as per classes-to-clusters policy) are considered as incorrectly clustered instances in ClusterEvaluation, which makes total sense. However, ClassificationViaClustering considers instances that are placed in an "unlabeled" cluster as an unclassified instance; and thus they are not taken into account when calculating measures such as precesion/recall/F-measure by using Evaluation component. Attachements dataset1.arff and dataset2.arff are examples.
> >
> > More details:
> >
> > Let's say we have similar dataset (attached as dataset1.arff) as above having instances ordered from 1 to 8 and class labels: A and B. The algorithm outputs 4 clusters as follows: (Note: 3->B means instance 3 has actual class of B, 8->? means instance 8 has no class, and so on).
> >
> >
> > Cluster 0: {2->?, 3->B}
> > Cluster 1: {5->?, 6->?, 7->B}
> > Cluster 2: {1->A, 4->A}
> > Cluster 3: {8->?}
> >
> > As per classes to clusters evaluation, cluster 4 is predicted to be A, and cluster 3 is predicted to be B. All instances with missing class are ignored. The output of ClusterEvaluation is:
> >
> >  0 1 2  <-- assigned to cluster
> >  0 0 2 | A
> >  1 1 0 | B
> >
> > Cluster 0 <-- No class
> > Cluster 1 <-- B
> > Cluster 2 <-- A
> >
> > Incorrectly clustered instances :     1.0      12.5    %
> >
> > Which makes total sense. It considers instance #3 as an incorrectly clustered instance. ClassificationViaClustering, however, considers it as an unclassified instance. ClassificationViaClustering's output:
> >
> > === Summary ===
> >
> > Correctly Classified Instances           3               75      %
> > Incorrectly Classified Instances         0                0      %
> > Kappa statistic                          1
> > Mean absolute error                      0
> > Root mean squared error                  0
> > Relative absolute error                  0      %
> > Root relative squared error              0      %
> > UnClassified Instances                   1               25      %
> > Total Number of Instances                4
> > Ignored Class Unknown Instances                  4
> > Weighted precesion = 1.0
> > Weighted recall = 1.0
> > Weighted Macro F measure = 1.0
> > Averaged Macro F measure = 1.0
> > Averaged Micro F measure = 1.0
> >
> > === Confusion Matrix ===
> >
> >  a b   <-- classified as
> >  2 0 | a = A
> >  0 1 | b = B
> >
> > Now, here is the debate about instance #3. Which is more correct? to consider it as unclassified instance or incorrectly classified instance? I guess since it has a class (i.e., B) and is clustered in a cluster different than "the cluster B", it might be reasonable to consider it as incorrectly classified instance even though cluster 0 has no class. Additionally, even though the accuracy is correct and takes into account the unclassified instances, other measures do not. Look at the precesion/recall/F-measure. They all have 100% percentages and I think it should not be that case because instance 3 has actual class B and is placed in an incorrect cluster! That's why it might be perhaps better to count it as incorrectly classified instances.
> >
> >
> > What do you think?
> >
> > Best,
> > Haytham
> >
> > On Mon, May 29, 2017 at 1:10 AM, Eibe Frank <[hidden email]> wrote:
> > Mark will hopefully have time to make the package release soon. You can check here whether it has been made: https://sourceforge.net/projects/weka/files/weka-packages/
> >
> > Alternatively, yes, you can build the package yourself by checking it out from SVN.
> >
> > Cheers,
> > Eibe
> >
> > > On 29/05/2017, at 10:47 AM, Haytham Salhi <[hidden email]> wrote:
> > >
> > > Thanks a lot Eibe. Are there nightly snapshots of Weka packages or I should build it myself?
> > >
> > > On Sun, May 28, 2017 at 10:13 AM, Eibe Frank <[hidden email]> wrote:
> > > This will be fixed in the next release (1.0.6) of the classificationViaClustering package (it's already fixed in the SVN repository). Instances with missing class values were deleted in ClassificationViaClustering, that's why you got a different result. This is no longer the case in the new version.
> > >
> > > Thanks for reporting this bug.
> > >
> > > Cheers,
> > > Eibe
> > >
> > > > On 28 May 2017, at 02:35, Haytham Salhi <[hidden email]> wrote:
> > > >
> > > > The dataset is included in the first point above in ARFF format. Please let me know if you want any further info.
> > > >
> > > > Best,
> > > > Haytham
> > > >
> > > > On Sat, May 27, 2017 at 7:58 AM, Eibe Frank <[hidden email]> wrote:
> > > > Could you perhaps send us your data?
> > > >
> > > > Cheers,
> > > > Eibe
> > > >
> > > > > On 26 May 2017, at 12:20, Haytham Salhi <[hidden email]> wrote:
> > > > >
> > > > > Hello Weka,
> > > > >
> > > > > As we have now ClusterEvaluation (and ClassificationViaClustering) ignoring the instances whose class attribute is missing when doing "classes-to-clusters" evaluation, ClusterEvaluation is tested and seems to be working fine. However, ClassificationViaClustering along with Evaluation still behaves strangely.
> > > > >
> > > > > As an example, let's take the following simple case:
> > > > >
> > > > > 1- Assume we have the following data with a class attribute:
> > > > >
> > > > > @relation 'example'
> > > > >
> > > > > @attribute theClass {A,B}
> > > > > @attribute I numeric
> > > > > @attribute am numeric
> > > > > @attribute are numeric
> > > > > @attribute bebo numeric
> > > > > @attribute different numeric
> > > > > @attribute great numeric
> > > > > @attribute haytham numeric
> > > > > @attribute hello numeric
> > > > > @attribute how numeric
> > > > > @attribute man numeric
> > > > > @attribute mazen numeric
> > > > > @attribute movie numeric
> > > > > @attribute samir numeric
> > > > > @attribute sir numeric
> > > > > @attribute totally numeric
> > > > > @attribute you numeric
> > > > >
> > > > > @data
> > > > > {6 1,12 2}
> > > > > {0 ?,11 2,13 1}
> > > > > {0 B,7 1,11 1,13 1}
> > > > > {4 1,6 2,12 2}
> > > > > {0 ?,3 1,8 1,9 1,10 1,16 1}
> > > > > {0 ?,3 1,8 1,9 1,16 1}
> > > > > {0 ?,3 1,8 1,9 1,14 1,16 1}
> > > > > {0 ?,1 1,2 1,5 3,15 1}
> > > > >
> > > > > 2- As we can see, we have two instances with class A (first and fourth) and one with class B (third). Other instances' classes are missing.
> > > > >
> > > > > 3- Let's assume we want to do k-means clustering with k =4 and with kmeans++ as an initialization method. The model output is:
> > > > >
> > > > > Number of iterations: 2
> > > > > Within cluster sum of squared errors: 2.5833333333333335
> > > > >
> > > > > Initial starting points (k-means++):
> > > > >
> > > > > Cluster 0: {2 1,7 1,8 1,15 1}
> > > > > Cluster 1: {6 1,10 1,12 1}
> > > > > Cluster 2: {3 1,5 2,11 2}
> > > > > Cluster 3: {0 1,1 1,4 3,14 1}
> > > > >
> > > > > Missing values globally replaced with mean/mode
> > > > >
> > > > > Final cluster centroids:
> > > > >                          Cluster#
> > > > > Attribute    Full Data          0          1          2          3
> > > > >                  (8.0)      (3.0)      (2.0)      (2.0)      (1.0)
> > > > > ==================================================================
> > > > > I                0.125          0          0          0          1
> > > > > am               0.125          0          0          0          1
> > > > > are              0.375          1          0          0          0
> > > > > bebo             0.125          0          0        0.5          0
> > > > > different        0.375          0          0          0          3
> > > > > great            0.375          0          0        1.5          0
> > > > > haytham          0.125          0        0.5          0          0
> > > > > hello            0.375          1          0          0          0
> > > > > how              0.375          1          0          0          0
> > > > > man              0.125     0.3333          0          0          0
> > > > > mazen            0.375          0        1.5          0          0
> > > > > movie              0.5          0          0          2          0
> > > > > samir             0.25          0          1          0          0
> > > > > sir              0.125     0.3333          0          0          0
> > > > > totally          0.125          0          0          0          1
> > > > > you              0.375          1          0          0          0
> > > > >
> > > > > 4- After building the clusterer and doing the evaluation (using ClusterEvaluation), we have the following reasonable results:
> > > > >
> > > > > Clustered Instances
> > > > >
> > > > > 0      3 ( 38%)
> > > > > 1      2 ( 25%)
> > > > > 2      2 ( 25%)
> > > > > 3      1 ( 13%)
> > > > >
> > > > >
> > > > > Class attribute: theClass
> > > > > Classes to Clusters:
> > > > >
> > > > >  1 2  <-- assigned to cluster
> > > > >  0 2 | A
> > > > >  1 0 | B
> > > > >
> > > > > Cluster 1 <-- B
> > > > > Cluster 2 <-- A
> > > > >
> > > > > Incorrectly clustered instances :     0.0       0      %
> > > > >
> > > > > Cluster assignments: [2.0, 1.0, 1.0, 2.0, 0.0, 0.0, 0.0, 3.0]
> > > > >
> > > > > Here, class 3 and class 0 are ignored in "classses-to-clusters" evaluation and this makes total sense.
> > > > >
> > > > > 5- However, when buidling the model using ClassificationViaClustering with same clusterer settings, the model output is:
> > > > >
> > > > > Number of iterations: 2
> > > > > Within cluster sum of squared errors: 0.0
> > > > >
> > > > > Initial starting points (k-means++):
> > > > >
> > > > > Cluster 0: {5 1,11 2}
> > > > > Cluster 1: {6 1,10 1,12 1}
> > > > > Cluster 2: {3 1,5 2,11 2}
> > > > >
> > > > > Missing values globally replaced with mean/mode
> > > > >
> > > > > Final cluster centroids:
> > > > >                          Cluster#
> > > > > Attribute    Full Data          0          1          2
> > > > >                  (3.0)      (1.0)      (1.0)      (1.0)
> > > > > =======================================================
> > > > > I                    0          0          0          0
> > > > > am                   0          0          0          0
> > > > > are                  0          0          0          0
> > > > > bebo            0.3333          0          0          1
> > > > > different            0          0          0          0
> > > > > great                1          1          0          2
> > > > > haytham         0.3333          0          1          0
> > > > > hello                0          0          0          0
> > > > > how                  0          0          0          0
> > > > > man                  0          0          0          0
> > > > > mazen           0.3333          0          1          0
> > > > > movie           1.3333          2          0          2
> > > > > samir           0.3333          0          1          0
> > > > > sir                  0          0          0          0
> > > > > totally              0          0          0          0
> > > > > you                  0          0          0          0
> > > > >
> > > > > What's strange here is, even though we set the number of clusters to 4, the model outputs only three clusters; thus, the evaluation of this model is not reasonable. Below is a part of model evaluation:
> > > > >
> > > > > Clusters to classes mapping:
> > > > >   1. Cluster: no class
> > > > >   2. Cluster: B (2)
> > > > >   3. Cluster: A (1)
> > > > >
> > > > > Classes to clusters mapping:
> > > > >   1. Class (A): 3. Cluster
> > > > >   2. Class (B): 2. Cluster
> > > > >
> > > > >
> > > > > === Summary ===
> > > > >
> > > > > Correctly Classified Instances           2               66.6667 %
> > > > > Incorrectly Classified Instances         0                0      %
> > > > > Kappa statistic                          1
> > > > > Mean absolute error                      0
> > > > > Root mean squared error                  0
> > > > > Relative absolute error                  0      %
> > > > > Root relative squared error              0      %
> > > > > UnClassified Instances                   1               33.3333 %
> > > > > Total Number of Instances                3
> > > > > Ignored Class Unknown Instances                  5
> > > > >
> > > > > Note that I am using WEKA.3.9.2-SNAPSHOT.
> > > > >
> > > > > Best,
> > > > > Haytham
> > > > > _______________________________________________
> > > > > Wekalist mailing list
> > > > > Send posts to: [hidden email]
> > > > > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > > > > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> > > >
> > > > _______________________________________________
> > > > Wekalist mailing list
> > > > Send posts to: [hidden email]
> > > > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > > > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> > > >
> > > > _______________________________________________
> > > > Wekalist mailing list
> > > > Send posts to: [hidden email]
> > > > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > > > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> > >
> > > _______________________________________________
> > > Wekalist mailing list
> > > Send posts to: [hidden email]
> > > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> > >
> > > _______________________________________________
> > > Wekalist mailing list
> > > Send posts to: [hidden email]
> > > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> >
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> >
> >
> >
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent results between ClusterEvaluation and ClassificationViaClustering

Eibe Frank-2
Administrator
What I meant is that you can’t change the output shown as the confusion matrix. I think it should be possible implement custom versions of precision, recall, etc., by taking the number of unclassified instances into account. This number is available through the evaluation object just like other stats such as the number of true positives, etc.

Cheers,
Eibe

> On 8/07/2017, at 2:21 PM, Haytham Salhi <[hidden email]> wrote:
>
> Thanks a lot Ebie.
>
> I guess since this will not help with the information in the confusion matrix, the precision/recall/f-measure would not be affected as I see they are calculated using the confusion matrix. What do you think?
>
> On Fri, Jun 9, 2017 at 6:27 AM, Eibe Frank <[hidden email]> wrote:
> I looked into this a bit. It turns out that this cannot really be done by modifying ClassificationViaClustering. The best way to get the desired information is to implement a new custom evaluation metric that counts an “unclassified” case as an incorrect prediction. This will not help you with the information in the confusion matrix though.
>
> For info on custom evaluation metrics, take a look here:
>
>   http://weka.wikispaces.com/Pluggable+evaluation+metrics
>
> For some example metrics, take a look at the source code for the following packages:
>
>   http://weka.sourceforge.net/packageMetaData/RankCorrelation/index.html
>   http://weka.sourceforge.net/packageMetaData/logarithmicErrorMetrics/index.html
>   http://weka.sourceforge.net/packageMetaData/percentageErrorMetrics/index.html
>
> Cheers,
> Eibe
>
> > On 9/06/2017, at 3:55 PM, Haytham Salhi <[hidden email]> wrote:
> >
> > Hello Eibe,
> >
> > Any updates on this? If not yet, can I update the code locally? If yes, could you please point me out where I can exactly modify?
> >
> > Thank you in advance.
> >
> > Best,
> > Haytham
> >
> > On Mon, Jun 5, 2017 at 7:33 PM, Haytham Salhi <[hidden email]> wrote:
> > Thanks a lot Eibe. When is it expected to be modified to support this behavior?
> >
> > On Mon, Jun 5, 2017 at 6:17 AM, Eibe Frank <[hidden email]> wrote:
> > This is a good idea. We should make the other behaviour (no unclassified instances) an option in ClassificationViaClustering.
> >
> > Cheers,
> > Eibe
> >
> > > On 5 Jun 2017, at 07:49, Haytham Salhi <[hidden email]> wrote:
> > >
> > > Any idea on this issue?
> > >
> > > On Sat, Jun 3, 2017 at 12:37 AM, Haytham Salhi <[hidden email]> wrote:
> > > Just to correct this line for last message:
> > >
> > > As per classes to clusters evaluation, cluster 2 is predicted to be A, and cluster 1 is predicted to be B. All instances with missing class are ignored. The output of ClusterEvaluation is:
> > >
> > > Sorry for the confusion.
> > >
> > > On Sat, Jun 3, 2017 at 12:31 AM, Haytham Salhi <[hidden email]> wrote:
> > > Hello again,
> > >
> > > Much thanks Ebie and Mark for resolving the issue. I have tested it and it seems to be working fine.
> > >
> > > However, there is also an important case at which I guess it is worth looking:
> > >
> > > In short:
> > >
> > > labeled instances that are placed in clusters (regardless these clusters have classes or do not) other than the correct cluster (as per classes-to-clusters policy) are considered as incorrectly clustered instances in ClusterEvaluation, which makes total sense. However, ClassificationViaClustering considers instances that are placed in an "unlabeled" cluster as an unclassified instance; and thus they are not taken into account when calculating measures such as precesion/recall/F-measure by using Evaluation component. Attachements dataset1.arff and dataset2.arff are examples.
> > >
> > > More details:
> > >
> > > Let's say we have similar dataset (attached as dataset1.arff) as above having instances ordered from 1 to 8 and class labels: A and B. The algorithm outputs 4 clusters as follows: (Note: 3->B means instance 3 has actual class of B, 8->? means instance 8 has no class, and so on).
> > >
> > >
> > > Cluster 0: {2->?, 3->B}
> > > Cluster 1: {5->?, 6->?, 7->B}
> > > Cluster 2: {1->A, 4->A}
> > > Cluster 3: {8->?}
> > >
> > > As per classes to clusters evaluation, cluster 4 is predicted to be A, and cluster 3 is predicted to be B. All instances with missing class are ignored. The output of ClusterEvaluation is:
> > >
> > >  0 1 2  <-- assigned to cluster
> > >  0 0 2 | A
> > >  1 1 0 | B
> > >
> > > Cluster 0 <-- No class
> > > Cluster 1 <-- B
> > > Cluster 2 <-- A
> > >
> > > Incorrectly clustered instances :     1.0      12.5    %
> > >
> > > Which makes total sense. It considers instance #3 as an incorrectly clustered instance. ClassificationViaClustering, however, considers it as an unclassified instance. ClassificationViaClustering's output:
> > >
> > > === Summary ===
> > >
> > > Correctly Classified Instances           3               75      %
> > > Incorrectly Classified Instances         0                0      %
> > > Kappa statistic                          1
> > > Mean absolute error                      0
> > > Root mean squared error                  0
> > > Relative absolute error                  0      %
> > > Root relative squared error              0      %
> > > UnClassified Instances                   1               25      %
> > > Total Number of Instances                4
> > > Ignored Class Unknown Instances                  4
> > > Weighted precesion = 1.0
> > > Weighted recall = 1.0
> > > Weighted Macro F measure = 1.0
> > > Averaged Macro F measure = 1.0
> > > Averaged Micro F measure = 1.0
> > >
> > > === Confusion Matrix ===
> > >
> > >  a b   <-- classified as
> > >  2 0 | a = A
> > >  0 1 | b = B
> > >
> > > Now, here is the debate about instance #3. Which is more correct? to consider it as unclassified instance or incorrectly classified instance? I guess since it has a class (i.e., B) and is clustered in a cluster different than "the cluster B", it might be reasonable to consider it as incorrectly classified instance even though cluster 0 has no class. Additionally, even though the accuracy is correct and takes into account the unclassified instances, other measures do not. Look at the precesion/recall/F-measure. They all have 100% percentages and I think it should not be that case because instance 3 has actual class B and is placed in an incorrect cluster! That's why it might be perhaps better to count it as incorrectly classified instances.
> > >
> > >
> > > What do you think?
> > >
> > > Best,
> > > Haytham
> > >
> > > On Mon, May 29, 2017 at 1:10 AM, Eibe Frank <[hidden email]> wrote:
> > > Mark will hopefully have time to make the package release soon. You can check here whether it has been made: https://sourceforge.net/projects/weka/files/weka-packages/
> > >
> > > Alternatively, yes, you can build the package yourself by checking it out from SVN.
> > >
> > > Cheers,
> > > Eibe
> > >
> > > > On 29/05/2017, at 10:47 AM, Haytham Salhi <[hidden email]> wrote:
> > > >
> > > > Thanks a lot Eibe. Are there nightly snapshots of Weka packages or I should build it myself?
> > > >
> > > > On Sun, May 28, 2017 at 10:13 AM, Eibe Frank <[hidden email]> wrote:
> > > > This will be fixed in the next release (1.0.6) of the classificationViaClustering package (it's already fixed in the SVN repository). Instances with missing class values were deleted in ClassificationViaClustering, that's why you got a different result. This is no longer the case in the new version.
> > > >
> > > > Thanks for reporting this bug.
> > > >
> > > > Cheers,
> > > > Eibe
> > > >
> > > > > On 28 May 2017, at 02:35, Haytham Salhi <[hidden email]> wrote:
> > > > >
> > > > > The dataset is included in the first point above in ARFF format. Please let me know if you want any further info.
> > > > >
> > > > > Best,
> > > > > Haytham
> > > > >
> > > > > On Sat, May 27, 2017 at 7:58 AM, Eibe Frank <[hidden email]> wrote:
> > > > > Could you perhaps send us your data?
> > > > >
> > > > > Cheers,
> > > > > Eibe
> > > > >
> > > > > > On 26 May 2017, at 12:20, Haytham Salhi <[hidden email]> wrote:
> > > > > >
> > > > > > Hello Weka,
> > > > > >
> > > > > > As we have now ClusterEvaluation (and ClassificationViaClustering) ignoring the instances whose class attribute is missing when doing "classes-to-clusters" evaluation, ClusterEvaluation is tested and seems to be working fine. However, ClassificationViaClustering along with Evaluation still behaves strangely.
> > > > > >
> > > > > > As an example, let's take the following simple case:
> > > > > >
> > > > > > 1- Assume we have the following data with a class attribute:
> > > > > >
> > > > > > @relation 'example'
> > > > > >
> > > > > > @attribute theClass {A,B}
> > > > > > @attribute I numeric
> > > > > > @attribute am numeric
> > > > > > @attribute are numeric
> > > > > > @attribute bebo numeric
> > > > > > @attribute different numeric
> > > > > > @attribute great numeric
> > > > > > @attribute haytham numeric
> > > > > > @attribute hello numeric
> > > > > > @attribute how numeric
> > > > > > @attribute man numeric
> > > > > > @attribute mazen numeric
> > > > > > @attribute movie numeric
> > > > > > @attribute samir numeric
> > > > > > @attribute sir numeric
> > > > > > @attribute totally numeric
> > > > > > @attribute you numeric
> > > > > >
> > > > > > @data
> > > > > > {6 1,12 2}
> > > > > > {0 ?,11 2,13 1}
> > > > > > {0 B,7 1,11 1,13 1}
> > > > > > {4 1,6 2,12 2}
> > > > > > {0 ?,3 1,8 1,9 1,10 1,16 1}
> > > > > > {0 ?,3 1,8 1,9 1,16 1}
> > > > > > {0 ?,3 1,8 1,9 1,14 1,16 1}
> > > > > > {0 ?,1 1,2 1,5 3,15 1}
> > > > > >
> > > > > > 2- As we can see, we have two instances with class A (first and fourth) and one with class B (third). Other instances' classes are missing.
> > > > > >
> > > > > > 3- Let's assume we want to do k-means clustering with k =4 and with kmeans++ as an initialization method. The model output is:
> > > > > >
> > > > > > Number of iterations: 2
> > > > > > Within cluster sum of squared errors: 2.5833333333333335
> > > > > >
> > > > > > Initial starting points (k-means++):
> > > > > >
> > > > > > Cluster 0: {2 1,7 1,8 1,15 1}
> > > > > > Cluster 1: {6 1,10 1,12 1}
> > > > > > Cluster 2: {3 1,5 2,11 2}
> > > > > > Cluster 3: {0 1,1 1,4 3,14 1}
> > > > > >
> > > > > > Missing values globally replaced with mean/mode
> > > > > >
> > > > > > Final cluster centroids:
> > > > > >                          Cluster#
> > > > > > Attribute    Full Data          0          1          2          3
> > > > > >                  (8.0)      (3.0)      (2.0)      (2.0)      (1.0)
> > > > > > ==================================================================
> > > > > > I                0.125          0          0          0          1
> > > > > > am               0.125          0          0          0          1
> > > > > > are              0.375          1          0          0          0
> > > > > > bebo             0.125          0          0        0.5          0
> > > > > > different        0.375          0          0          0          3
> > > > > > great            0.375          0          0        1.5          0
> > > > > > haytham          0.125          0        0.5          0          0
> > > > > > hello            0.375          1          0          0          0
> > > > > > how              0.375          1          0          0          0
> > > > > > man              0.125     0.3333          0          0          0
> > > > > > mazen            0.375          0        1.5          0          0
> > > > > > movie              0.5          0          0          2          0
> > > > > > samir             0.25          0          1          0          0
> > > > > > sir              0.125     0.3333          0          0          0
> > > > > > totally          0.125          0          0          0          1
> > > > > > you              0.375          1          0          0          0
> > > > > >
> > > > > > 4- After building the clusterer and doing the evaluation (using ClusterEvaluation), we have the following reasonable results:
> > > > > >
> > > > > > Clustered Instances
> > > > > >
> > > > > > 0      3 ( 38%)
> > > > > > 1      2 ( 25%)
> > > > > > 2      2 ( 25%)
> > > > > > 3      1 ( 13%)
> > > > > >
> > > > > >
> > > > > > Class attribute: theClass
> > > > > > Classes to Clusters:
> > > > > >
> > > > > >  1 2  <-- assigned to cluster
> > > > > >  0 2 | A
> > > > > >  1 0 | B
> > > > > >
> > > > > > Cluster 1 <-- B
> > > > > > Cluster 2 <-- A
> > > > > >
> > > > > > Incorrectly clustered instances :     0.0       0      %
> > > > > >
> > > > > > Cluster assignments: [2.0, 1.0, 1.0, 2.0, 0.0, 0.0, 0.0, 3.0]
> > > > > >
> > > > > > Here, class 3 and class 0 are ignored in "classses-to-clusters" evaluation and this makes total sense.
> > > > > >
> > > > > > 5- However, when buidling the model using ClassificationViaClustering with same clusterer settings, the model output is:
> > > > > >
> > > > > > Number of iterations: 2
> > > > > > Within cluster sum of squared errors: 0.0
> > > > > >
> > > > > > Initial starting points (k-means++):
> > > > > >
> > > > > > Cluster 0: {5 1,11 2}
> > > > > > Cluster 1: {6 1,10 1,12 1}
> > > > > > Cluster 2: {3 1,5 2,11 2}
> > > > > >
> > > > > > Missing values globally replaced with mean/mode
> > > > > >
> > > > > > Final cluster centroids:
> > > > > >                          Cluster#
> > > > > > Attribute    Full Data          0          1          2
> > > > > >                  (3.0)      (1.0)      (1.0)      (1.0)
> > > > > > =======================================================
> > > > > > I                    0          0          0          0
> > > > > > am                   0          0          0          0
> > > > > > are                  0          0          0          0
> > > > > > bebo            0.3333          0          0          1
> > > > > > different            0          0          0          0
> > > > > > great                1          1          0          2
> > > > > > haytham         0.3333          0          1          0
> > > > > > hello                0          0          0          0
> > > > > > how                  0          0          0          0
> > > > > > man                  0          0          0          0
> > > > > > mazen           0.3333          0          1          0
> > > > > > movie           1.3333          2          0          2
> > > > > > samir           0.3333          0          1          0
> > > > > > sir                  0          0          0          0
> > > > > > totally              0          0          0          0
> > > > > > you                  0          0          0          0
> > > > > >
> > > > > > What's strange here is, even though we set the number of clusters to 4, the model outputs only three clusters; thus, the evaluation of this model is not reasonable. Below is a part of model evaluation:
> > > > > >
> > > > > > Clusters to classes mapping:
> > > > > >   1. Cluster: no class
> > > > > >   2. Cluster: B (2)
> > > > > >   3. Cluster: A (1)
> > > > > >
> > > > > > Classes to clusters mapping:
> > > > > >   1. Class (A): 3. Cluster
> > > > > >   2. Class (B): 2. Cluster
> > > > > >
> > > > > >
> > > > > > === Summary ===
> > > > > >
> > > > > > Correctly Classified Instances           2               66.6667 %
> > > > > > Incorrectly Classified Instances         0                0      %
> > > > > > Kappa statistic                          1
> > > > > > Mean absolute error                      0
> > > > > > Root mean squared error                  0
> > > > > > Relative absolute error                  0      %
> > > > > > Root relative squared error              0      %
> > > > > > UnClassified Instances                   1               33.3333 %
> > > > > > Total Number of Instances                3
> > > > > > Ignored Class Unknown Instances                  5
> > > > > >
> > > > > > Note that I am using WEKA.3.9.2-SNAPSHOT.
> > > > > >
> > > > > > Best,
> > > > > > Haytham
> > > > > > _______________________________________________
> > > > > > Wekalist mailing list
> > > > > > Send posts to: [hidden email]
> > > > > > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > > > > > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> > > > >
> > > > > _______________________________________________
> > > > > Wekalist mailing list
> > > > > Send posts to: [hidden email]
> > > > > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > > > > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> > > > >
> > > > > _______________________________________________
> > > > > Wekalist mailing list
> > > > > Send posts to: [hidden email]
> > > > > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > > > > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> > > >
> > > > _______________________________________________
> > > > Wekalist mailing list
> > > > Send posts to: [hidden email]
> > > > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > > > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> > > >
> > > > _______________________________________________
> > > > Wekalist mailing list
> > > > Send posts to: [hidden email]
> > > > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > > > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> > >
> > > _______________________________________________
> > > Wekalist mailing list
> > > Send posts to: [hidden email]
> > > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> > >
> > >
> > >
> > > _______________________________________________
> > > Wekalist mailing list
> > > Send posts to: [hidden email]
> > > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> >
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> >
> >
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent results between ClusterEvaluation and ClassificationViaClustering

haytham.salhi
Thanks a lot!

I see. I do agree that I should think about implementing custom versions of those measures by taking the number of unclassified instances into account. 

As I am using Weka API (i.e., programmatically), not sure how my custom metric class would be injected in Evaluation class when I build the Evaluation object and evaluate. Should I invoke a specific method? What should I do programmatically to integrate my custom evaliation metric with Evaluation?

Best,
Haytham

On Sat, Jul 8, 2017 at 4:42 AM, Eibe Frank <[hidden email]> wrote:
What I meant is that you can’t change the output shown as the confusion matrix. I think it should be possible implement custom versions of precision, recall, etc., by taking the number of unclassified instances into account. This number is available through the evaluation object just like other stats such as the number of true positives, etc.

Cheers,
Eibe

> On 8/07/2017, at 2:21 PM, Haytham Salhi <[hidden email]> wrote:
>
> Thanks a lot Ebie.
>
> I guess since this will not help with the information in the confusion matrix, the precision/recall/f-measure would not be affected as I see they are calculated using the confusion matrix. What do you think?
>
> On Fri, Jun 9, 2017 at 6:27 AM, Eibe Frank <[hidden email]> wrote:
> I looked into this a bit. It turns out that this cannot really be done by modifying ClassificationViaClustering. The best way to get the desired information is to implement a new custom evaluation metric that counts an “unclassified” case as an incorrect prediction. This will not help you with the information in the confusion matrix though.
>
> For info on custom evaluation metrics, take a look here:
>
>   http://weka.wikispaces.com/Pluggable+evaluation+metrics
>
> For some example metrics, take a look at the source code for the following packages:
>
>   http://weka.sourceforge.net/packageMetaData/RankCorrelation/index.html
>   http://weka.sourceforge.net/packageMetaData/logarithmicErrorMetrics/index.html
>   http://weka.sourceforge.net/packageMetaData/percentageErrorMetrics/index.html
>
> Cheers,
> Eibe
>
> > On 9/06/2017, at 3:55 PM, Haytham Salhi <[hidden email]> wrote:
> >
> > Hello Eibe,
> >
> > Any updates on this? If not yet, can I update the code locally? If yes, could you please point me out where I can exactly modify?
> >
> > Thank you in advance.
> >
> > Best,
> > Haytham
> >
> > On Mon, Jun 5, 2017 at 7:33 PM, Haytham Salhi <[hidden email]> wrote:
> > Thanks a lot Eibe. When is it expected to be modified to support this behavior?
> >
> > On Mon, Jun 5, 2017 at 6:17 AM, Eibe Frank <[hidden email]> wrote:
> > This is a good idea. We should make the other behaviour (no unclassified instances) an option in ClassificationViaClustering.
> >
> > Cheers,
> > Eibe
> >
> > > On 5 Jun 2017, at 07:49, Haytham Salhi <[hidden email]> wrote:
> > >
> > > Any idea on this issue?
> > >
> > > On Sat, Jun 3, 2017 at 12:37 AM, Haytham Salhi <[hidden email]> wrote:
> > > Just to correct this line for last message:
> > >
> > > As per classes to clusters evaluation, cluster 2 is predicted to be A, and cluster 1 is predicted to be B. All instances with missing class are ignored. The output of ClusterEvaluation is:
> > >
> > > Sorry for the confusion.
> > >
> > > On Sat, Jun 3, 2017 at 12:31 AM, Haytham Salhi <[hidden email]> wrote:
> > > Hello again,
> > >
> > > Much thanks Ebie and Mark for resolving the issue. I have tested it and it seems to be working fine.
> > >
> > > However, there is also an important case at which I guess it is worth looking:
> > >
> > > In short:
> > >
> > > labeled instances that are placed in clusters (regardless these clusters have classes or do not) other than the correct cluster (as per classes-to-clusters policy) are considered as incorrectly clustered instances in ClusterEvaluation, which makes total sense. However, ClassificationViaClustering considers instances that are placed in an "unlabeled" cluster as an unclassified instance; and thus they are not taken into account when calculating measures such as precesion/recall/F-measure by using Evaluation component. Attachements dataset1.arff and dataset2.arff are examples.
> > >
> > > More details:
> > >
> > > Let's say we have similar dataset (attached as dataset1.arff) as above having instances ordered from 1 to 8 and class labels: A and B. The algorithm outputs 4 clusters as follows: (Note: 3->B means instance 3 has actual class of B, 8->? means instance 8 has no class, and so on).
> > >
> > >
> > > Cluster 0: {2->?, 3->B}
> > > Cluster 1: {5->?, 6->?, 7->B}
> > > Cluster 2: {1->A, 4->A}
> > > Cluster 3: {8->?}
> > >
> > > As per classes to clusters evaluation, cluster 4 is predicted to be A, and cluster 3 is predicted to be B. All instances with missing class are ignored. The output of ClusterEvaluation is:
> > >
> > >  0 1 2  <-- assigned to cluster
> > >  0 0 2 | A
> > >  1 1 0 | B
> > >
> > > Cluster 0 <-- No class
> > > Cluster 1 <-- B
> > > Cluster 2 <-- A
> > >
> > > Incorrectly clustered instances :     1.0      12.5    %
> > >
> > > Which makes total sense. It considers instance #3 as an incorrectly clustered instance. ClassificationViaClustering, however, considers it as an unclassified instance. ClassificationViaClustering's output:
> > >
> > > === Summary ===
> > >
> > > Correctly Classified Instances           3               75      %
> > > Incorrectly Classified Instances         0                0      %
> > > Kappa statistic                          1
> > > Mean absolute error                      0
> > > Root mean squared error                  0
> > > Relative absolute error                  0      %
> > > Root relative squared error              0      %
> > > UnClassified Instances                   1               25      %
> > > Total Number of Instances                4
> > > Ignored Class Unknown Instances                  4
> > > Weighted precesion = 1.0
> > > Weighted recall = 1.0
> > > Weighted Macro F measure = 1.0
> > > Averaged Macro F measure = 1.0
> > > Averaged Micro F measure = 1.0
> > >
> > > === Confusion Matrix ===
> > >
> > >  a b   <-- classified as
> > >  2 0 | a = A
> > >  0 1 | b = B
> > >
> > > Now, here is the debate about instance #3. Which is more correct? to consider it as unclassified instance or incorrectly classified instance? I guess since it has a class (i.e., B) and is clustered in a cluster different than "the cluster B", it might be reasonable to consider it as incorrectly classified instance even though cluster 0 has no class. Additionally, even though the accuracy is correct and takes into account the unclassified instances, other measures do not. Look at the precesion/recall/F-measure. They all have 100% percentages and I think it should not be that case because instance 3 has actual class B and is placed in an incorrect cluster! That's why it might be perhaps better to count it as incorrectly classified instances.
> > >
> > >
> > > What do you think?
> > >
> > > Best,
> > > Haytham
> > >
> > > On Mon, May 29, 2017 at 1:10 AM, Eibe Frank <[hidden email]> wrote:
> > > Mark will hopefully have time to make the package release soon. You can check here whether it has been made: https://sourceforge.net/projects/weka/files/weka-packages/
> > >
> > > Alternatively, yes, you can build the package yourself by checking it out from SVN.
> > >
> > > Cheers,
> > > Eibe
> > >
> > > > On 29/05/2017, at 10:47 AM, Haytham Salhi <[hidden email]> wrote:
> > > >
> > > > Thanks a lot Eibe. Are there nightly snapshots of Weka packages or I should build it myself?
> > > >
> > > > On Sun, May 28, 2017 at 10:13 AM, Eibe Frank <[hidden email]> wrote:
> > > > This will be fixed in the next release (1.0.6) of the classificationViaClustering package (it's already fixed in the SVN repository). Instances with missing class values were deleted in ClassificationViaClustering, that's why you got a different result. This is no longer the case in the new version.
> > > >
> > > > Thanks for reporting this bug.
> > > >
> > > > Cheers,
> > > > Eibe
> > > >
> > > > > On 28 May 2017, at 02:35, Haytham Salhi <[hidden email]> wrote:
> > > > >
> > > > > The dataset is included in the first point above in ARFF format. Please let me know if you want any further info.
> > > > >
> > > > > Best,
> > > > > Haytham
> > > > >
> > > > > On Sat, May 27, 2017 at 7:58 AM, Eibe Frank <[hidden email]> wrote:
> > > > > Could you perhaps send us your data?
> > > > >
> > > > > Cheers,
> > > > > Eibe
> > > > >
> > > > > > On 26 May 2017, at 12:20, Haytham Salhi <[hidden email]> wrote:
> > > > > >
> > > > > > Hello Weka,
> > > > > >
> > > > > > As we have now ClusterEvaluation (and ClassificationViaClustering) ignoring the instances whose class attribute is missing when doing "classes-to-clusters" evaluation, ClusterEvaluation is tested and seems to be working fine. However, ClassificationViaClustering along with Evaluation still behaves strangely.
> > > > > >
> > > > > > As an example, let's take the following simple case:
> > > > > >
> > > > > > 1- Assume we have the following data with a class attribute:
> > > > > >
> > > > > > @relation 'example'
> > > > > >
> > > > > > @attribute theClass {A,B}
> > > > > > @attribute I numeric
> > > > > > @attribute am numeric
> > > > > > @attribute are numeric
> > > > > > @attribute bebo numeric
> > > > > > @attribute different numeric
> > > > > > @attribute great numeric
> > > > > > @attribute haytham numeric
> > > > > > @attribute hello numeric
> > > > > > @attribute how numeric
> > > > > > @attribute man numeric
> > > > > > @attribute mazen numeric
> > > > > > @attribute movie numeric
> > > > > > @attribute samir numeric
> > > > > > @attribute sir numeric
> > > > > > @attribute totally numeric
> > > > > > @attribute you numeric
> > > > > >
> > > > > > @data
> > > > > > {6 1,12 2}
> > > > > > {0 ?,11 2,13 1}
> > > > > > {0 B,7 1,11 1,13 1}
> > > > > > {4 1,6 2,12 2}
> > > > > > {0 ?,3 1,8 1,9 1,10 1,16 1}
> > > > > > {0 ?,3 1,8 1,9 1,16 1}
> > > > > > {0 ?,3 1,8 1,9 1,14 1,16 1}
> > > > > > {0 ?,1 1,2 1,5 3,15 1}
> > > > > >
> > > > > > 2- As we can see, we have two instances with class A (first and fourth) and one with class B (third). Other instances' classes are missing.
> > > > > >
> > > > > > 3- Let's assume we want to do k-means clustering with k =4 and with kmeans++ as an initialization method. The model output is:
> > > > > >
> > > > > > Number of iterations: 2
> > > > > > Within cluster sum of squared errors: 2.5833333333333335
> > > > > >
> > > > > > Initial starting points (k-means++):
> > > > > >
> > > > > > Cluster 0: {2 1,7 1,8 1,15 1}
> > > > > > Cluster 1: {6 1,10 1,12 1}
> > > > > > Cluster 2: {3 1,5 2,11 2}
> > > > > > Cluster 3: {0 1,1 1,4 3,14 1}
> > > > > >
> > > > > > Missing values globally replaced with mean/mode
> > > > > >
> > > > > > Final cluster centroids:
> > > > > >                          Cluster#
> > > > > > Attribute    Full Data          0          1          2          3
> > > > > >                  (8.0)      (3.0)      (2.0)      (2.0)      (1.0)
> > > > > > ==================================================================
> > > > > > I                0.125          0          0          0          1
> > > > > > am               0.125          0          0          0          1
> > > > > > are              0.375          1          0          0          0
> > > > > > bebo             0.125          0          0        0.5          0
> > > > > > different        0.375          0          0          0          3
> > > > > > great            0.375          0          0        1.5          0
> > > > > > haytham          0.125          0        0.5          0          0
> > > > > > hello            0.375          1          0          0          0
> > > > > > how              0.375          1          0          0          0
> > > > > > man              0.125     0.3333          0          0          0
> > > > > > mazen            0.375          0        1.5          0          0
> > > > > > movie              0.5          0          0          2          0
> > > > > > samir             0.25          0          1          0          0
> > > > > > sir              0.125     0.3333          0          0          0
> > > > > > totally          0.125          0          0          0          1
> > > > > > you              0.375          1          0          0          0
> > > > > >
> > > > > > 4- After building the clusterer and doing the evaluation (using ClusterEvaluation), we have the following reasonable results:
> > > > > >
> > > > > > Clustered Instances
> > > > > >
> > > > > > 0      3 ( 38%)
> > > > > > 1      2 ( 25%)
> > > > > > 2      2 ( 25%)
> > > > > > 3      1 ( 13%)
> > > > > >
> > > > > >
> > > > > > Class attribute: theClass
> > > > > > Classes to Clusters:
> > > > > >
> > > > > >  1 2  <-- assigned to cluster
> > > > > >  0 2 | A
> > > > > >  1 0 | B
> > > > > >
> > > > > > Cluster 1 <-- B
> > > > > > Cluster 2 <-- A
> > > > > >
> > > > > > Incorrectly clustered instances :     0.0       0      %
> > > > > >
> > > > > > Cluster assignments: [2.0, 1.0, 1.0, 2.0, 0.0, 0.0, 0.0, 3.0]
> > > > > >
> > > > > > Here, class 3 and class 0 are ignored in "classses-to-clusters" evaluation and this makes total sense.
> > > > > >
> > > > > > 5- However, when buidling the model using ClassificationViaClustering with same clusterer settings, the model output is:
> > > > > >
> > > > > > Number of iterations: 2
> > > > > > Within cluster sum of squared errors: 0.0
> > > > > >
> > > > > > Initial starting points (k-means++):
> > > > > >
> > > > > > Cluster 0: {5 1,11 2}
> > > > > > Cluster 1: {6 1,10 1,12 1}
> > > > > > Cluster 2: {3 1,5 2,11 2}
> > > > > >
> > > > > > Missing values globally replaced with mean/mode
> > > > > >
> > > > > > Final cluster centroids:
> > > > > >                          Cluster#
> > > > > > Attribute    Full Data          0          1          2
> > > > > >                  (3.0)      (1.0)      (1.0)      (1.0)
> > > > > > =======================================================
> > > > > > I                    0          0          0          0
> > > > > > am                   0          0          0          0
> > > > > > are                  0          0          0          0
> > > > > > bebo            0.3333          0          0          1
> > > > > > different            0          0          0          0
> > > > > > great                1          1          0          2
> > > > > > haytham         0.3333          0          1          0
> > > > > > hello                0          0          0          0
> > > > > > how                  0          0          0          0
> > > > > > man                  0          0          0          0
> > > > > > mazen           0.3333          0          1          0
> > > > > > movie           1.3333          2          0          2
> > > > > > samir           0.3333          0          1          0
> > > > > > sir                  0          0          0          0
> > > > > > totally              0          0          0          0
> > > > > > you                  0          0          0          0
> > > > > >
> > > > > > What's strange here is, even though we set the number of clusters to 4, the model outputs only three clusters; thus, the evaluation of this model is not reasonable. Below is a part of model evaluation:
> > > > > >
> > > > > > Clusters to classes mapping:
> > > > > >   1. Cluster: no class
> > > > > >   2. Cluster: B (2)
> > > > > >   3. Cluster: A (1)
> > > > > >
> > > > > > Classes to clusters mapping:
> > > > > >   1. Class (A): 3. Cluster
> > > > > >   2. Class (B): 2. Cluster
> > > > > >
> > > > > >
> > > > > > === Summary ===
> > > > > >
> > > > > > Correctly Classified Instances           2               66.6667 %
> > > > > > Incorrectly Classified Instances         0                0      %
> > > > > > Kappa statistic                          1
> > > > > > Mean absolute error                      0
> > > > > > Root mean squared error                  0
> > > > > > Relative absolute error                  0      %
> > > > > > Root relative squared error              0      %
> > > > > > UnClassified Instances                   1               33.3333 %
> > > > > > Total Number of Instances                3
> > > > > > Ignored Class Unknown Instances                  5
> > > > > >
> > > > > > Note that I am using WEKA.3.9.2-SNAPSHOT.
> > > > > >
> > > > > > Best,
> > > > > > Haytham
> > > > > > _______________________________________________
> > > > > > Wekalist mailing list
> > > > > > Send posts to: [hidden email]
> > > > > > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > > > > > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> > > > >
> > > > > _______________________________________________
> > > > > Wekalist mailing list
> > > > > Send posts to: [hidden email]
> > > > > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > > > > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> > > > >
> > > > > _______________________________________________
> > > > > Wekalist mailing list
> > > > > Send posts to: [hidden email]
> > > > > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > > > > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> > > >
> > > > _______________________________________________
> > > > Wekalist mailing list
> > > > Send posts to: [hidden email]
> > > > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > > > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> > > >
> > > > _______________________________________________
> > > > Wekalist mailing list
> > > > Send posts to: [hidden email]
> > > > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > > > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> > >
> > > _______________________________________________
> > > Wekalist mailing list
> > > Send posts to: [hidden email]
> > > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> > >
> > >
> > >
> > > _______________________________________________
> > > Wekalist mailing list
> > > Send posts to: [hidden email]
> > > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> >
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> >
> >
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Inconsistent results between ClusterEvaluation and ClassificationViaClustering

Eibe Frank-2
Administrator
Just put the evaluation metric into a separate package and load all packages by applying the WEKA package manager in your program.

Cheers,
Eibe

> On 8/07/2017, at 3:03 PM, Haytham Salhi <[hidden email]> wrote:
>
> Thanks a lot!
>
> I see. I do agree that I should think about implementing custom versions of those measures by taking the number of unclassified instances into account.
>
> As I am using Weka API (i.e., programmatically), not sure how my custom metric class would be injected in Evaluation class when I build the Evaluation object and evaluate. Should I invoke a specific method? What should I do programmatically to integrate my custom evaliation metric with Evaluation?
>
> Best,
> Haytham
>
> On Sat, Jul 8, 2017 at 4:42 AM, Eibe Frank <[hidden email]> wrote:
> What I meant is that you can’t change the output shown as the confusion matrix. I think it should be possible implement custom versions of precision, recall, etc., by taking the number of unclassified instances into account. This number is available through the evaluation object just like other stats such as the number of true positives, etc.
>
> Cheers,
> Eibe
>
> > On 8/07/2017, at 2:21 PM, Haytham Salhi <[hidden email]> wrote:
> >
> > Thanks a lot Ebie.
> >
> > I guess since this will not help with the information in the confusion matrix, the precision/recall/f-measure would not be affected as I see they are calculated using the confusion matrix. What do you think?
> >
> > On Fri, Jun 9, 2017 at 6:27 AM, Eibe Frank <[hidden email]> wrote:
> > I looked into this a bit. It turns out that this cannot really be done by modifying ClassificationViaClustering. The best way to get the desired information is to implement a new custom evaluation metric that counts an “unclassified” case as an incorrect prediction. This will not help you with the information in the confusion matrix though.
> >
> > For info on custom evaluation metrics, take a look here:
> >
> >   http://weka.wikispaces.com/Pluggable+evaluation+metrics
> >
> > For some example metrics, take a look at the source code for the following packages:
> >
> >   http://weka.sourceforge.net/packageMetaData/RankCorrelation/index.html
> >   http://weka.sourceforge.net/packageMetaData/logarithmicErrorMetrics/index.html
> >   http://weka.sourceforge.net/packageMetaData/percentageErrorMetrics/index.html
> >
> > Cheers,
> > Eibe
> >
> > > On 9/06/2017, at 3:55 PM, Haytham Salhi <[hidden email]> wrote:
> > >
> > > Hello Eibe,
> > >
> > > Any updates on this? If not yet, can I update the code locally? If yes, could you please point me out where I can exactly modify?
> > >
> > > Thank you in advance.
> > >
> > > Best,
> > > Haytham
> > >
> > > On Mon, Jun 5, 2017 at 7:33 PM, Haytham Salhi <[hidden email]> wrote:
> > > Thanks a lot Eibe. When is it expected to be modified to support this behavior?
> > >
> > > On Mon, Jun 5, 2017 at 6:17 AM, Eibe Frank <[hidden email]> wrote:
> > > This is a good idea. We should make the other behaviour (no unclassified instances) an option in ClassificationViaClustering.
> > >
> > > Cheers,
> > > Eibe
> > >
> > > > On 5 Jun 2017, at 07:49, Haytham Salhi <[hidden email]> wrote:
> > > >
> > > > Any idea on this issue?
> > > >
> > > > On Sat, Jun 3, 2017 at 12:37 AM, Haytham Salhi <[hidden email]> wrote:
> > > > Just to correct this line for last message:
> > > >
> > > > As per classes to clusters evaluation, cluster 2 is predicted to be A, and cluster 1 is predicted to be B. All instances with missing class are ignored. The output of ClusterEvaluation is:
> > > >
> > > > Sorry for the confusion.
> > > >
> > > > On Sat, Jun 3, 2017 at 12:31 AM, Haytham Salhi <[hidden email]> wrote:
> > > > Hello again,
> > > >
> > > > Much thanks Ebie and Mark for resolving the issue. I have tested it and it seems to be working fine.
> > > >
> > > > However, there is also an important case at which I guess it is worth looking:
> > > >
> > > > In short:
> > > >
> > > > labeled instances that are placed in clusters (regardless these clusters have classes or do not) other than the correct cluster (as per classes-to-clusters policy) are considered as incorrectly clustered instances in ClusterEvaluation, which makes total sense. However, ClassificationViaClustering considers instances that are placed in an "unlabeled" cluster as an unclassified instance; and thus they are not taken into account when calculating measures such as precesion/recall/F-measure by using Evaluation component. Attachements dataset1.arff and dataset2.arff are examples.
> > > >
> > > > More details:
> > > >
> > > > Let's say we have similar dataset (attached as dataset1.arff) as above having instances ordered from 1 to 8 and class labels: A and B. The algorithm outputs 4 clusters as follows: (Note: 3->B means instance 3 has actual class of B, 8->? means instance 8 has no class, and so on).
> > > >
> > > >
> > > > Cluster 0: {2->?, 3->B}
> > > > Cluster 1: {5->?, 6->?, 7->B}
> > > > Cluster 2: {1->A, 4->A}
> > > > Cluster 3: {8->?}
> > > >
> > > > As per classes to clusters evaluation, cluster 4 is predicted to be A, and cluster 3 is predicted to be B. All instances with missing class are ignored. The output of ClusterEvaluation is:
> > > >
> > > >  0 1 2  <-- assigned to cluster
> > > >  0 0 2 | A
> > > >  1 1 0 | B
> > > >
> > > > Cluster 0 <-- No class
> > > > Cluster 1 <-- B
> > > > Cluster 2 <-- A
> > > >
> > > > Incorrectly clustered instances :     1.0      12.5    %
> > > >
> > > > Which makes total sense. It considers instance #3 as an incorrectly clustered instance. ClassificationViaClustering, however, considers it as an unclassified instance. ClassificationViaClustering's output:
> > > >
> > > > === Summary ===
> > > >
> > > > Correctly Classified Instances           3               75      %
> > > > Incorrectly Classified Instances         0                0      %
> > > > Kappa statistic                          1
> > > > Mean absolute error                      0
> > > > Root mean squared error                  0
> > > > Relative absolute error                  0      %
> > > > Root relative squared error              0      %
> > > > UnClassified Instances                   1               25      %
> > > > Total Number of Instances                4
> > > > Ignored Class Unknown Instances                  4
> > > > Weighted precesion = 1.0
> > > > Weighted recall = 1.0
> > > > Weighted Macro F measure = 1.0
> > > > Averaged Macro F measure = 1.0
> > > > Averaged Micro F measure = 1.0
> > > >
> > > > === Confusion Matrix ===
> > > >
> > > >  a b   <-- classified as
> > > >  2 0 | a = A
> > > >  0 1 | b = B
> > > >
> > > > Now, here is the debate about instance #3. Which is more correct? to consider it as unclassified instance or incorrectly classified instance? I guess since it has a class (i.e., B) and is clustered in a cluster different than "the cluster B", it might be reasonable to consider it as incorrectly classified instance even though cluster 0 has no class. Additionally, even though the accuracy is correct and takes into account the unclassified instances, other measures do not. Look at the precesion/recall/F-measure. They all have 100% percentages and I think it should not be that case because instance 3 has actual class B and is placed in an incorrect cluster! That's why it might be perhaps better to count it as incorrectly classified instances.
> > > >
> > > >
> > > > What do you think?
> > > >
> > > > Best,
> > > > Haytham
> > > >
> > > > On Mon, May 29, 2017 at 1:10 AM, Eibe Frank <[hidden email]> wrote:
> > > > Mark will hopefully have time to make the package release soon. You can check here whether it has been made: https://sourceforge.net/projects/weka/files/weka-packages/
> > > >
> > > > Alternatively, yes, you can build the package yourself by checking it out from SVN.
> > > >
> > > > Cheers,
> > > > Eibe
> > > >
> > > > > On 29/05/2017, at 10:47 AM, Haytham Salhi <[hidden email]> wrote:
> > > > >
> > > > > Thanks a lot Eibe. Are there nightly snapshots of Weka packages or I should build it myself?
> > > > >
> > > > > On Sun, May 28, 2017 at 10:13 AM, Eibe Frank <[hidden email]> wrote:
> > > > > This will be fixed in the next release (1.0.6) of the classificationViaClustering package (it's already fixed in the SVN repository). Instances with missing class values were deleted in ClassificationViaClustering, that's why you got a different result. This is no longer the case in the new version.
> > > > >
> > > > > Thanks for reporting this bug.
> > > > >
> > > > > Cheers,
> > > > > Eibe
> > > > >
> > > > > > On 28 May 2017, at 02:35, Haytham Salhi <[hidden email]> wrote:
> > > > > >
> > > > > > The dataset is included in the first point above in ARFF format. Please let me know if you want any further info.
> > > > > >
> > > > > > Best,
> > > > > > Haytham
> > > > > >
> > > > > > On Sat, May 27, 2017 at 7:58 AM, Eibe Frank <[hidden email]> wrote:
> > > > > > Could you perhaps send us your data?
> > > > > >
> > > > > > Cheers,
> > > > > > Eibe
> > > > > >
> > > > > > > On 26 May 2017, at 12:20, Haytham Salhi <[hidden email]> wrote:
> > > > > > >
> > > > > > > Hello Weka,
> > > > > > >
> > > > > > > As we have now ClusterEvaluation (and ClassificationViaClustering) ignoring the instances whose class attribute is missing when doing "classes-to-clusters" evaluation, ClusterEvaluation is tested and seems to be working fine. However, ClassificationViaClustering along with Evaluation still behaves strangely.
> > > > > > >
> > > > > > > As an example, let's take the following simple case:
> > > > > > >
> > > > > > > 1- Assume we have the following data with a class attribute:
> > > > > > >
> > > > > > > @relation 'example'
> > > > > > >
> > > > > > > @attribute theClass {A,B}
> > > > > > > @attribute I numeric
> > > > > > > @attribute am numeric
> > > > > > > @attribute are numeric
> > > > > > > @attribute bebo numeric
> > > > > > > @attribute different numeric
> > > > > > > @attribute great numeric
> > > > > > > @attribute haytham numeric
> > > > > > > @attribute hello numeric
> > > > > > > @attribute how numeric
> > > > > > > @attribute man numeric
> > > > > > > @attribute mazen numeric
> > > > > > > @attribute movie numeric
> > > > > > > @attribute samir numeric
> > > > > > > @attribute sir numeric
> > > > > > > @attribute totally numeric
> > > > > > > @attribute you numeric
> > > > > > >
> > > > > > > @data
> > > > > > > {6 1,12 2}
> > > > > > > {0 ?,11 2,13 1}
> > > > > > > {0 B,7 1,11 1,13 1}
> > > > > > > {4 1,6 2,12 2}
> > > > > > > {0 ?,3 1,8 1,9 1,10 1,16 1}
> > > > > > > {0 ?,3 1,8 1,9 1,16 1}
> > > > > > > {0 ?,3 1,8 1,9 1,14 1,16 1}
> > > > > > > {0 ?,1 1,2 1,5 3,15 1}
> > > > > > >
> > > > > > > 2- As we can see, we have two instances with class A (first and fourth) and one with class B (third). Other instances' classes are missing.
> > > > > > >
> > > > > > > 3- Let's assume we want to do k-means clustering with k =4 and with kmeans++ as an initialization method. The model output is:
> > > > > > >
> > > > > > > Number of iterations: 2
> > > > > > > Within cluster sum of squared errors: 2.5833333333333335
> > > > > > >
> > > > > > > Initial starting points (k-means++):
> > > > > > >
> > > > > > > Cluster 0: {2 1,7 1,8 1,15 1}
> > > > > > > Cluster 1: {6 1,10 1,12 1}
> > > > > > > Cluster 2: {3 1,5 2,11 2}
> > > > > > > Cluster 3: {0 1,1 1,4 3,14 1}
> > > > > > >
> > > > > > > Missing values globally replaced with mean/mode
> > > > > > >
> > > > > > > Final cluster centroids:
> > > > > > >                          Cluster#
> > > > > > > Attribute    Full Data          0          1          2          3
> > > > > > >                  (8.0)      (3.0)      (2.0)      (2.0)      (1.0)
> > > > > > > ==================================================================
> > > > > > > I                0.125          0          0          0          1
> > > > > > > am               0.125          0          0          0          1
> > > > > > > are              0.375          1          0          0          0
> > > > > > > bebo             0.125          0          0        0.5          0
> > > > > > > different        0.375          0          0          0          3
> > > > > > > great            0.375          0          0        1.5          0
> > > > > > > haytham          0.125          0        0.5          0          0
> > > > > > > hello            0.375          1          0          0          0
> > > > > > > how              0.375          1          0          0          0
> > > > > > > man              0.125     0.3333          0          0          0
> > > > > > > mazen            0.375          0        1.5          0          0
> > > > > > > movie              0.5          0          0          2          0
> > > > > > > samir             0.25          0          1          0          0
> > > > > > > sir              0.125     0.3333          0          0          0
> > > > > > > totally          0.125          0          0          0          1
> > > > > > > you              0.375          1          0          0          0
> > > > > > >
> > > > > > > 4- After building the clusterer and doing the evaluation (using ClusterEvaluation), we have the following reasonable results:
> > > > > > >
> > > > > > > Clustered Instances
> > > > > > >
> > > > > > > 0      3 ( 38%)
> > > > > > > 1      2 ( 25%)
> > > > > > > 2      2 ( 25%)
> > > > > > > 3      1 ( 13%)
> > > > > > >
> > > > > > >
> > > > > > > Class attribute: theClass
> > > > > > > Classes to Clusters:
> > > > > > >
> > > > > > >  1 2  <-- assigned to cluster
> > > > > > >  0 2 | A
> > > > > > >  1 0 | B
> > > > > > >
> > > > > > > Cluster 1 <-- B
> > > > > > > Cluster 2 <-- A
> > > > > > >
> > > > > > > Incorrectly clustered instances :     0.0       0      %
> > > > > > >
> > > > > > > Cluster assignments: [2.0, 1.0, 1.0, 2.0, 0.0, 0.0, 0.0, 3.0]
> > > > > > >
> > > > > > > Here, class 3 and class 0 are ignored in "classses-to-clusters" evaluation and this makes total sense.
> > > > > > >
> > > > > > > 5- However, when buidling the model using ClassificationViaClustering with same clusterer settings, the model output is:
> > > > > > >
> > > > > > > Number of iterations: 2
> > > > > > > Within cluster sum of squared errors: 0.0
> > > > > > >
> > > > > > > Initial starting points (k-means++):
> > > > > > >
> > > > > > > Cluster 0: {5 1,11 2}
> > > > > > > Cluster 1: {6 1,10 1,12 1}
> > > > > > > Cluster 2: {3 1,5 2,11 2}
> > > > > > >
> > > > > > > Missing values globally replaced with mean/mode
> > > > > > >
> > > > > > > Final cluster centroids:
> > > > > > >                          Cluster#
> > > > > > > Attribute    Full Data          0          1          2
> > > > > > >                  (3.0)      (1.0)      (1.0)      (1.0)
> > > > > > > =======================================================
> > > > > > > I                    0          0          0          0
> > > > > > > am                   0          0          0          0
> > > > > > > are                  0          0          0          0
> > > > > > > bebo            0.3333          0          0          1
> > > > > > > different            0          0          0          0
> > > > > > > great                1          1          0          2
> > > > > > > haytham         0.3333          0          1          0
> > > > > > > hello                0          0          0          0
> > > > > > > how                  0          0          0          0
> > > > > > > man                  0          0          0          0
> > > > > > > mazen           0.3333          0          1          0
> > > > > > > movie           1.3333          2          0          2
> > > > > > > samir           0.3333          0          1          0
> > > > > > > sir                  0          0          0          0
> > > > > > > totally              0          0          0          0
> > > > > > > you                  0          0          0          0
> > > > > > >
> > > > > > > What's strange here is, even though we set the number of clusters to 4, the model outputs only three clusters; thus, the evaluation of this model is not reasonable. Below is a part of model evaluation:
> > > > > > >
> > > > > > > Clusters to classes mapping:
> > > > > > >   1. Cluster: no class
> > > > > > >   2. Cluster: B (2)
> > > > > > >   3. Cluster: A (1)
> > > > > > >
> > > > > > > Classes to clusters mapping:
> > > > > > >   1. Class (A): 3. Cluster
> > > > > > >   2. Class (B): 2. Cluster
> > > > > > >
> > > > > > >
> > > > > > > === Summary ===
> > > > > > >
> > > > > > > Correctly Classified Instances           2               66.6667 %
> > > > > > > Incorrectly Classified Instances         0                0      %
> > > > > > > Kappa statistic                          1
> > > > > > > Mean absolute error                      0
> > > > > > > Root mean squared error                  0
> > > > > > > Relative absolute error                  0      %
> > > > > > > Root relative squared error              0      %
> > > > > > > UnClassified Instances                   1               33.3333 %
> > > > > > > Total Number of Instances                3
> > > > > > > Ignored Class Unknown Instances                  5
> > > > > > >
> > > > > > > Note that I am using WEKA.3.9.2-SNAPSHOT.
> > > > > > >
> > > > > > > Best,
> > > > > > > Haytham
> > > > > > > _______________________________________________
> > > > > > > Wekalist mailing list
> > > > > > > Send posts to: [hidden email]
> > > > > > > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > > > > > > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> > > > > >
> > > > > > _______________________________________________
> > > > > > Wekalist mailing list
> > > > > > Send posts to: [hidden email]
> > > > > > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > > > > > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> > > > > >
> > > > > > _______________________________________________
> > > > > > Wekalist mailing list
> > > > > > Send posts to: [hidden email]
> > > > > > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > > > > > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> > > > >
> > > > > _______________________________________________
> > > > > Wekalist mailing list
> > > > > Send posts to: [hidden email]
> > > > > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > > > > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> > > > >
> > > > > _______________________________________________
> > > > > Wekalist mailing list
> > > > > Send posts to: [hidden email]
> > > > > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > > > > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> > > >
> > > > _______________________________________________
> > > > Wekalist mailing list
> > > > Send posts to: [hidden email]
> > > > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > > > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> > > >
> > > >
> > > >
> > > > _______________________________________________
> > > > Wekalist mailing list
> > > > Send posts to: [hidden email]
> > > > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > > > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> > >
> > > _______________________________________________
> > > Wekalist mailing list
> > > Send posts to: [hidden email]
> > > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> > >
> > >
> > > _______________________________________________
> > > Wekalist mailing list
> > > Send posts to: [hidden email]
> > > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> >
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> >
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html