CfsSubsetEval between numeric and nominal

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

CfsSubsetEval between numeric and nominal

Abdrahman0x
Hi all,

I am not sure if this is an old question, but I did a search and couldn't
reach to something similar.

How does the CfsSubsetEval us used to calculated the correlation between the
numeric and nominal attributes. Suppose I have the following data with 100
attributes, and a 2 nominal class

attr1 attr2 attr3 attr4 attr5 ...... attr99 attr100  class
15     23    57     84    3             18       74         white
27      0     55     68   97             48      91         white
81     58    75      34   17            27      9           black

I know that the CfsSubsetEval uses the Pearson correlation behine the scene,
but it how it works to find the correlation between the Attribute and the
Class (Rfc).

I hope the question is clear

Many thanks



--
Sent from: http://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: CfsSubsetEval between numeric and nominal

Mark Hall
In this case, the nominal attribute in question is effectively binarized (via one-hot encoding) and the Pearson's correlation is computed between the numeric attribute and each of the new indicator attributes. The final score is a weighted (by frequency of occurrence of each of the nominal values) sum of these individual Pearson's scores.

Cheers,
Mark.

On 15/10/18, 2:27 AM, "Abdrahman0x" <[hidden email] on behalf of [hidden email]> wrote:

    Hi all,
   
    I am not sure if this is an old question, but I did a search and couldn't
    reach to something similar.
   
    How does the CfsSubsetEval us used to calculated the correlation between the
    numeric and nominal attributes. Suppose I have the following data with 100
    attributes, and a 2 nominal class
   
    attr1 attr2 attr3 attr4 attr5 ...... attr99 attr100  class
    15     23    57     84    3             18       74         white
    27      0     55     68   97             48      91         white
    81     58    75      34   17            27      9           black
   
    I know that the CfsSubsetEval uses the Pearson correlation behine the scene,
    but it how it works to find the correlation between the Attribute and the
    Class (Rfc).
   
    I hope the question is clear
   
    Many thanks
   
   
   
    --
    Sent from: http://weka.8497.n7.nabble.com/
    _______________________________________________
    Wekalist mailing list
    Send posts to: [hidden email]
    List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
    List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
   


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: CfsSubsetEval between numeric and nominal

Abdrahman0x
Hi Mark,

The nominal attribute here is the class attribute. I tried using the
NominalToBinary filtering in Weka, but it didnt work for the class
attribute, it only converts other numeric attributes except the class
attributes which is in my case is nominal.

Is there another methods to convert the class attribute Only into nominal.

Thanks



--
Sent from: http://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: CfsSubsetEval between numeric and nominal

Peter Reutemann
> The nominal attribute here is the class attribute. I tried using the
> NominalToBinary filtering in Weka, but it didnt work for the class
> attribute, it only converts other numeric attributes except the class
> attributes which is in my case is nominal.

If in the Explorer, "unselect" the class attribute in the combobox in
the Preprocess panel and then apply the filter.

Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 858-5174
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: CfsSubsetEval between numeric and nominal

Abdrahman0x
Hi Peter,

Even though, I followed what you said and nothing happened. The class
attribute is still nominal!
Please advise.

Thanks



--
Sent from: http://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: CfsSubsetEval between numeric and nominal

Peter Reutemann
> Even though, I followed what you said and nothing happened. The class
> attribute is still nominal!

Here's what I did:
- loaded iris dataset
- unselected class (see screenshot)
- applied NominalToBinary (first-last)
- resulting dataset is attached

There are now three binary attributes, one for each label (in the
"Selected attribute" box, you can now see Min/Max/Mean/Stdev instead
of the labels).

Please note, I didn't read the whole thread, just commented on your
use of NominalToBinary.

Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 858-5174
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

unselect_class.png (44K) Download Attachment
iris_binarized_class.arff (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: CfsSubsetEval between numeric and nominal

Abdrahman0x
Hi Peter,

Thank you so much. It worked :)

Your clarification is highly appreciated :)

Cheers,
Abdrahman



--
Sent from: http://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: CfsSubsetEval between numeric and nominal

Abdrahman0x
In reply to this post by Mark Hall
Mark Hall wrote
> In this case, the nominal attribute in question is effectively binarized
> (via one-hot encoding) and the Pearson's correlation is computed between
> the numeric attribute and each of the new indicator attributes. The final
> score is a weighted (by frequency of occurrence of each of the nominal
> values) sum of these individual Pearson's scores.
>
> Cheers,
> Mark.
> 

Hi Mark,

Can you help withthe following points please:

(1) Do you mean by the new indicator attribute the binarized class
attributes?

(2) Is the Pearson's calculation is computed between all the numeric
attributes and the class binarized attributes. Can you please elaborate
more.

In my situation the data after binarized will be:

 attr1   attr2  attr3  attr4  attr5 ...... attr99 attr100  class
    15     23    57       84       3             18       74         0
    27      0     55       68      97            48       91         0
    81     58    75       34      17            27       9           1

(3) I want to know, how to compute the correlation now between the
attributes and the class.
(4) Is Pearson Correlation is used internally for that?

Thank you,
Abdrahman



--
Sent from: http://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: CfsSubsetEval between numeric and nominal

Mark Hall
The class is never binarized. CFS has two modes of operation:

1. When the class is *numeric*, nominal attributes are binarized and Pearson's correlation is used for computing all correlations between attributes/class
2. When the class is *nominal*, numeric attributes are discretized using the MDL-based method of Fayad and Irani. Symmetrical uncertainty is then used as the correlation metric between attributes/class.

Cheers,
Mark.

On 10/11/18, 9:10 AM, "Abdrahman0x" <[hidden email] on behalf of [hidden email]> wrote:

    Mark Hall wrote
    > In this case, the nominal attribute in question is effectively binarized
    > (via one-hot encoding) and the Pearson's correlation is computed between
    > the numeric attribute and each of the new indicator attributes. The final
    > score is a weighted (by frequency of occurrence of each of the nominal
    > values) sum of these individual Pearson's scores.
    >
    > Cheers,
    > Mark.
    > 
   
    Hi Mark,
   
    Can you help withthe following points please:
   
    (1) Do you mean by the new indicator attribute the binarized class
    attributes?
   
    (2) Is the Pearson's calculation is computed between all the numeric
    attributes and the class binarized attributes. Can you please elaborate
    more.
   
    In my situation the data after binarized will be:
   
     attr1   attr2  attr3  attr4  attr5 ...... attr99 attr100  class
        15     23    57       84       3             18       74         0
        27      0     55       68      97            48       91         0
        81     58    75       34      17            27       9           1
   
    (3) I want to know, how to compute the correlation now between the
    attributes and the class.
    (4) Is Pearson Correlation is used internally for that?
   
    Thank you,
    Abdrahman
   
   
   
    --
    Sent from: http://weka.8497.n7.nabble.com/
    _______________________________________________
    Wekalist mailing list
    Send posts to: [hidden email]
    To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
    List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
   


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: CfsSubsetEval between numeric and nominal

Abdrahman0x
Thank you Mark,

My data has this case:
2. When the class is *nominal*, numeric attributes are discretized using the
MDL-based method of Fayad and Irani. Symmetrical uncertainty is then used as
the correlation metric between attributes/class.

I believe that my questions were not clear. I will try to clarify it more.

(1) I want to compute manually the correlation between  *nominal* class and
the *numeric* attributes. That means I want to know the *exact mathematical
formula* that stands behind calculating the CFS merit.

(2) When I employed the MDL-Based discretization method, then applied the
Symmetrical uncertainty selection, unfortunately I didn't get the same
results of the CFS on my data. Why? Is it supposed to be like this?

Thank you for your understanding,
Abdrahman



--
Sent from: http://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: CfsSubsetEval between numeric and nominal

Eibe Frank-2
Administrator
(1) There is no single formula. The MDL-based discretisation is performed using Fayyad & Irani’s greedy discretisation algorithm. Check their paper for details.

(2) Symmetrical uncertainty based attribute ranking in WEKA evaluates each attribute individually and in isolation (with respect to the class). The point of CFS is that in evaluates subsets of attributes using the CFS formula (which uses symmetrical uncertainty to measure correlation when the class is nominal).

Cheers,
Eibe

> On 11/11/2018, at 2:55 AM, Abdrahman0x <[hidden email]> wrote:
>
> Thank you Mark,
>
> My data has this case:
> 2. When the class is *nominal*, numeric attributes are discretized using the
> MDL-based method of Fayad and Irani. Symmetrical uncertainty is then used as
> the correlation metric between attributes/class.
>
> I believe that my questions were not clear. I will try to clarify it more.
>
> (1) I want to compute manually the correlation between  *nominal* class and
> the *numeric* attributes. That means I want to know the *exact mathematical
> formula* that stands behind calculating the CFS merit.
>
> (2) When I employed the MDL-Based discretization method, then applied the
> Symmetrical uncertainty selection, unfortunately I didn't get the same
> results of the CFS on my data. Why? Is it supposed to be like this?
>
> Thank you for your understanding,
> Abdrahman
>
>
>
> --
> Sent from: http://weka.8497.n7.nabble.com/
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: CfsSubsetEval between numeric and nominal

Abdrahman0x
Thank you Eibe and sorry for the late reply.

You mean that there is ni single formula to computer the CFS as in my
question or you mean for the MDL Discretization. For the MDL, I know that it
follows the formuls in Fayyad and Iranni paper, but what about the CFS. How
to compute it manually beteween numeric attributes and nominal class.

Thank you



--
Sent from: http://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html