Doubt about numeric class

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Doubt about numeric class

Jovani Souza
Hello,

I am trying to find indicators that have more influence on the number of certifications ISO 9001. How can I do this on Weka, considering that the ISO 9001 is numeric?

I am using an excel spreadsheet that is attached in this message.
Thank you!

Iso_9001.xlsx
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Doubt about numeric class

Eibe Frank-2
Administrator
A possible approach is to use ClassifierAttributeEval:

  http://weka.sourceforge.net/doc.packages/classifierBasedAttributeSelection/weka/attributeSelection/ClassifierAttributeEval.html

Cheers,
Eibe

> On 10/08/2017, at 7:59 AM, Jovani Souza <[hidden email]> wrote:
>
> Hello,
>
> I am trying to find indicators that have more influence on the number of
> certifications ISO 9001. How can I do this on Weka, considering that the ISO
> 9001 is numeric?
>
> I am using an excel spreadsheet that is attached in this message.
> Thank you!
>
> Iso_9001.xlsx <http://weka.8497.n7.nabble.com/file/n41454/Iso_9001.xlsx>  
>
>
>
> --
> View this message in context: http://weka.8497.n7.nabble.com/Doubt-about-numeric-class-tp41454.html
> Sent from the WEKA mailing list archive at Nabble.com.
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Doubt about numeric class

Jovani Souza
Thank you very much for your answer!

I still have some doubts.

1) I did what you said using the ClassifierAttributeEval in the Select Attribute, but the first attribute ranked was the Indicator 7, whis has 100% of missing values and I don't understand why.

2) How can I decide which Classifier to use in the ClassifierAttributeEval, considering that I have a numeric class?

3) I found another attribute evaluator named CfSubsetEval, which seems to give the worth of each attribute in correlation with the class. Is it right or I should use another one for this purpose?

Thank you!



Sent with Mailtrack

2017-08-09 19:08 GMT-03:00 Eibe Frank <[hidden email]>:
A possible approach is to use ClassifierAttributeEval:

  http://weka.sourceforge.net/doc.packages/classifierBasedAttributeSelection/weka/attributeSelection/ClassifierAttributeEval.html

Cheers,
Eibe

> On 10/08/2017, at 7:59 AM, Jovani Souza <[hidden email]> wrote:
>
> Hello,
>
> I am trying to find indicators that have more influence on the number of
> certifications ISO 9001. How can I do this on Weka, considering that the ISO
> 9001 is numeric?
>
> I am using an excel spreadsheet that is attached in this message.
> Thank you!
>
> Iso_9001.xlsx <http://weka.8497.n7.nabble.com/file/n41454/Iso_9001.xlsx>
>
>
>
> --
> View this message in context: http://weka.8497.n7.nabble.com/Doubt-about-numeric-class-tp41454.html
> Sent from the WEKA mailing list archive at Nabble.com.
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Doubt about numeric class

Eibe Frank-2
Administrator

> On 11/08/2017, at 3:18 AM, Jovani T. de Souza <[hidden email]> wrote:
>
> 1) I did what you said using the ClassifierAttributeEval in the Select Attribute, but the first attribute ranked was the Indicator 7, whis has 100% of missing values and I don't understand why.

Did you change the default classifier from ZeroR? ZeroR will give every attribute that the same worth.

> 2) How can I decide which Classifier to use in the ClassifierAttributeEval, considering that I have a numeric class?

Not ZeroR!  RandomForest is a reasonable option (if you are also considering the leave-one-attribute-out mode of ClassifierAttributeEval).

> 3) I found another attribute evaluator named CfSubsetEval, which seems to give the worth of each attribute in correlation with the class. Is it right or I should use another one for this purpose?

It gives the worth (merit) for a subset of attributes, not for individual attributes. You could take a look at the other *AttributeEval classes though.

Cheers,
Eibe

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Doubt about numeric class

Jovani Souza
Thank you Eibe for your attention!

Sorry to bother you, but I still have some questions.

1) I used the RandomForest Classifier considering the leave-one-attribute-out as true. However, the results didn't make sense like considering it as false. Why?

2) I tested using full training set and then 10-fold-cross validation. I notice a little change in the order of the ranked attributes. 
Ex.: Full training set: 4,5,3,2,7,6,1 
cross validation: 5,4,3,2,7,6,1
Which one would be the best?

3) In this study, would it be right to use the SMOReg Classifier using the ClassifierAttributeEval? 

4) As I want to find out the indicators that are more relevant to my class, would it be relevant to use even in the RandomForest the evaluation measure as Correlation Coefficient or should I leave the default measure?

Thank you very much!



Sent with Mailtrack

2017-08-10 18:34 GMT-03:00 Eibe Frank <[hidden email]>:

> On 11/08/2017, at 3:18 AM, Jovani T. de Souza <[hidden email]> wrote:
>
> 1) I did what you said using the ClassifierAttributeEval in the Select Attribute, but the first attribute ranked was the Indicator 7, whis has 100% of missing values and I don't understand why.

Did you change the default classifier from ZeroR? ZeroR will give every attribute that the same worth.

> 2) How can I decide which Classifier to use in the ClassifierAttributeEval, considering that I have a numeric class?

Not ZeroR!  RandomForest is a reasonable option (if you are also considering the leave-one-attribute-out mode of ClassifierAttributeEval).

> 3) I found another attribute evaluator named CfSubsetEval, which seems to give the worth of each attribute in correlation with the class. Is it right or I should use another one for this purpose?

It gives the worth (merit) for a subset of attributes, not for individual attributes. You could take a look at the other *AttributeEval classes though.

Cheers,
Eibe

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Doubt about numeric class

Eibe Frank-2
Administrator

> On 16/08/2017, at 8:37 AM, Jovani T. de Souza <[hidden email]> wrote:
>
> 1) I used the RandomForest Classifier considering the leave-one-attribute-out as true. However, the results didn't make sense like considering it as false. Why?

The leave-one-attribute-out option measures the drop in accuracy if you leave a particular attribute out. It’s possible that an informative attribute is left out without a significant drop in accuracy if the predictive information in this attribute is subsumed by the other attributes.

> 2) I tested using full training set and then 10-fold-cross validation. I notice a little change in the order of the ranked attributes.
> Ex.: Full training set: 4,5,3,2,7,6,1
> cross validation: 5,4,3,2,7,6,1
> Which one would be the best?

This is based on the two options in the attribute selection panel? The difference in your ranking seems small. The ranking based on the "cross-validated" results is probably more robust because the ranking algorithm is run on multiple subsets of data when you apply the cross-validation option. (Cross-validation is perhaps not really the best name for this option because there is no evaluation on test data in this case: the algorithm is simply run multiple times on the different training sets that occur in a cross-validation, and occurrence frequencies are calculated.) On the other hand, more data is available for generating the ranking than when ranking based on the full training set.

> 3) In this study, would it be right to use the SMOReg Classifier using the ClassifierAttributeEval?

Sure, you can use any classifier you like. Obviously, it’s best to use a classifier that gives you high accuracy on your data.

> 4) As I want to find out the indicators that are more relevant to my class, would it be relevant to use even in the RandomForest the evaluation measure as Correlation Coefficient or should I leave the default measure?

Correlation coefficient is only appropriate as the evaluation metric in ClassifierAttributeEval if the class is numeric. For classification problems, you may want to use the area under the ROC curve, for example.

Cheers,
Eibe
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Loading...