Problem of the split for J48 without any pruning

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Problem of the split for J48 without any pruning

windows556-2
I use J48 with Weka_control(U = T, M = 1, O = T, R = F) because I want to get
the tree without any pruning.
Then I checked the information gain ratio of each node and found that some
nodes did not have the highest information gain ratio. I got it by
GainRatioAttributeEval in RWeka for each node.
My all attributes are non-numeric, so I guess that MDL may not be related.
Why could it happen?
Could I get a tree that only do splitting by the highest information gain
ratio and do no pruning or other error reduction tricks?
Thank you very much.



--
Sent from: https://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Problem of the split for J48 without any pruning

Eibe Frank-2
Administrator
Only splits with greater than average information gain are admissible for selection in C4.5. Once the “best” split has been found for each attribute, the average info gain across these best splits is calculated. Splits that do not achieve this minimum threshold are disallowed. Could that be the reason in your test case?

REPTree uses info gain only. 

Cheers,
Eibe

On Sun, 22 Sep 2019 at 1:17 PM, windows556 <[hidden email]> wrote:
I use J48 with Weka_control(U = T, M = 1, O = T, R = F) because I want to get
the tree without any pruning.
Then I checked the information gain ratio of each node and found that some
nodes did not have the highest information gain ratio. I got it by
GainRatioAttributeEval in RWeka for each node.
My all attributes are non-numeric, so I guess that MDL may not be related.
Why could it happen?
Could I get a tree that only do splitting by the highest information gain
ratio and do no pruning or other error reduction tricks?
Thank you very much.



--
Sent from: https://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Problem of the split for J48 without any pruning

windows556-2
Thank you for your reply, Eibe.

But I am unclear on the meaning of average info gain. Does the average info gain across these best splits mean that the average info gain of all splits after getting a whole "best" split tree?

If it is the case, why split with non-highest info gain for a particular node in tree? Even the highest info gain split on the particular node is disallowed because it is lower than the minimum threshold.

I tried to check the info gain(not the info gain ratio) again and found that even that root node did not split with the highest info gain.

Thank you very much.
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Problem of the split for J48 without any pruning

Eibe Frank-2
Administrator
It is the average info gain observed at the node for which the split is to be determined. Each attribute yields a possible split for the node. From these available splits, one needs to be chosen. This decision is made by maximising the gain ratio, but only those splits that yield greater than average information gain are eligible. If S_1, S_2, …, S_k are the k possible splits at the node, one for each attribute, the average info gain of those k splits is what is used as the cut-off.

One caveat: some attributes may not yield any useful split at all (e.g., all instances go down one branch of a nominal attributes). Those are not considered in the average. For the exact algorithm, you will need to check the source code. It is included in Ross Quinlan’s book on C4.5. For WEKA’s J48, the code is, for example, here:

https://svn.cms.waikato.ac.nz/svn/weka/trunk/weka/src/main/java/weka/classifiers/trees/j48/C45ModelSelection.java

(Note the poor variable naming in this code though: “minResult” should really be called “maxResult”.)

Cheers,
Eibe

> On 25/09/2019, at 12:46 AM, Tony Yip <[hidden email]> wrote:
>
> Thank you for your reply, Eibe.
>
> But I am unclear on the meaning of average info gain. Does the average info gain across these best splits mean that the average info gain of all splits after getting a whole "best" split tree?
>
> If it is the case, why split with non-highest info gain for a particular node in tree? Even the highest info gain split on the particular node is disallowed because it is lower than the minimum threshold.
>
> I tried to check the info gain(not the info gain ratio) again and found that even that root node did not split with the highest info gain.
>
> Thank you very much.
> _______________________________________________
> Wekalist mailing list -- [hidden email]
> Send posts to: To unsubscribe send an email to [hidden email]
> To subscribe, unsubscribe, etc., visit
> https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html