What does "N of nodes using that attribute" in RF in WEKA mean?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

What does "N of nodes using that attribute" in RF in WEKA mean?

HaniMufti
Good day;

Can someone explain what does "N of nodes using that attribute" in RF in WEKA mean?

Does it mean the number that times the Variable appeared in ANY LOCATION in all tree models created in the RF?
Or
Does it mean the number that times the Variable appeared in SAME LOCATION in all tree models created in the RF?

Also, How does it measure the number that times the Variable appeared a tree model:
If a Variable appeared 2 times in tree model J, does it measure it as 2 or regardless of the times it appeared it measure it as 1(which indicates that it appeared in the model and disregards the number of times it appeared)   

Thank you.

Best Regards,

Hani Mufti; MD, MHI, CIP, FRCSC


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: What does "N of nodes using that attribute" in RF in WEKA mean?

Eibe Frank-3
You can check by printing the trees for a small dataset. The print-out will show, for each internal node of each tree, which attribute is used to define the split at that node. Let's say N_{ij} is the number of times attribute i occurs in the output of tree j. Then N_{i}, the "number of nodes using that attribute", is \sum_{j} N_{ij}, i.e., the total number of times that attribute is used for defining a split anywhere in the random forest.

The average "impurity decrease" is defined with respect to all those nodes. In the classification case, you will get the average information gain from all the corresponding splits. In the regression case, you will get the average reduction in variance. Note that this does *not* take the size of the nodes into account; hence, a great "impurity decrease" at the root node of a tree will be worth as much as one just above some small leaf nodes.

Cheers,
Eibe

On Sun, Sep 1, 2019 at 10:34 AM Hani Mufti <[hidden email]> wrote:
Good day;

Can someone explain what does "N of nodes using that attribute" in RF in WEKA mean?

Does it mean the number that times the Variable appeared in ANY LOCATION in all tree models created in the RF?
Or
Does it mean the number that times the Variable appeared in SAME LOCATION in all tree models created in the RF?

Also, How does it measure the number that times the Variable appeared a tree model:
If a Variable appeared 2 times in tree model J, does it measure it as 2 or regardless of the times it appeared it measure it as 1(which indicates that it appeared in the model and disregards the number of times it appeared)   

Thank you.

Best Regards,

Hani Mufti; MD, MHI, CIP, FRCSC

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html