Cut-off value in Cobweb and Rank in InformationGain WEKA

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Cut-off value in Cobweb and Rank in InformationGain WEKA

Nadia S

Is it true that cutoff value of 0.05 in Cobweb (WEKA) is the same as p-value in statistics (corresponds to some sort of significance value) ?
Is Rank in InformationGain also the same as p-value, therefore we should not choose any values with rank higher than 0.05?

Is this correct?

How is threshold calculated, what does it do to Rank in InformationGain, if attribute is ranked in credit rating (1. checking_status (0.094), 3.credit_history (0.043), 2.duration(0.032), 6.saving_status(0.028), 4.purpose(0.024), 5.credit_amount(0.018).. 
How do we decide which attributes should be disregarded? Those with 0 values or?


Screenshot 2018-12-07 at 15.01.55.png




Screenshot 2018-12-07 at 16.47.23.png


Screenshot 2018-12-07 at 15.05.25.png


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Cut-off value in Cobweb and Rank in InformationGain WEKA

Eibe Frank-3
No, those values cannot be viewed as p-values.

With the info gain evaluator and similar ranking measures, it is normally best to use the top N attributes according to the ranking, where N is the parameter you optimise. Use the value of N that optimises estimated performance of your classifier.

Tuning the parameters of Cobweb is pretty much a black art. I would say Cobweb is primarily a tool for interactive knowledge discovery. Users normally manually tune the parameters until they get a "reasonable" clustering.

Cheers,
Eibe


On Fri, Dec 14, 2018 at 8:07 AM Nadia S <[hidden email]> wrote:

Is it true that cutoff value of 0.05 in Cobweb (WEKA) is the same as p-value in statistics (corresponds to some sort of significance value) ?
Is Rank in InformationGain also the same as p-value, therefore we should not choose any values with rank higher than 0.05?

Is this correct?

How is threshold calculated, what does it do to Rank in InformationGain, if attribute is ranked in credit rating (1. checking_status (0.094), 3.credit_history (0.043), 2.duration(0.032), 6.saving_status(0.028), 4.purpose(0.024), 5.credit_amount(0.018).. 
How do we decide which attributes should be disregarded? Those with 0 values or?












_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html