Dear WEKAExperts,
according to Leo Breiman, a proximity (or similarity) measure of the random forest for two different instances is the number of trees they end up in the same node divided by the total number of trees. Is there any builtin functionality like this, to evaluate the similarity of two instances in WEKA ? Thank you very much, Marcus _______________________________________________ Wekalist mailing list Send posts to: [hidden email] List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html 
Administrator

By “in the same node” you probably mean “in the same leaf node”? No, this is not possible in WEKA without writing some code.
You can use the PartitionMembership filter with RandomForest as the partition generator to get a membership indicator vector for each input instance. This vector will contain one attribute value for each node in the RandomForest (leaves and internal nodes!). The attribute value will be 1 if the corresponding node contains the input instance and 0 otherwise. (This is assuming standard singleinstance input data and not multiinstance data.) The vector will be represented as a SparseInstance to save space. You could then use this for clustering, etc., by applying a distance function such as Manhattan distance to compare the membership vectors. Commandline example usage of the filter: java cp ~/weka391/weka.jar weka.Run .PartitionMembership W .RandomForest i ~/datasets/UCI/iris.arff c last If you are willing to write some code, you can subclass RandomTree and change the relevant methods so that it only considers leaf nodes when generating the membership vectors. Cheers, Eibe > On 7/08/2017, at 11:46 AM, Marcus Müller <[hidden email]> wrote: > > Dear WEKAExperts, > > according to Leo Breiman, a proximity (or similarity) measure of the random forest for two different instances is the number of trees they end up in the same node divided by the total number of trees. Is there any builtin functionality like this, to evaluate the similarity of two instances in WEKA ? > > Thank you very much, > Marcus > _______________________________________________ > Wekalist mailing list > Send posts to: [hidden email] > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html _______________________________________________ Wekalist mailing list Send posts to: [hidden email] List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html 
Hello Eibe, thank you very much for your answer. If I understand you correctly, the PartitionMembership filter produces a sparse vector with information about the presence of an attribute in ALL nodes of the random forest. Did I get you right, that without manipulating the RandomTree class there is no way of distinguishing, whether a node in the PartitionMembership output is an internal or a leave node? Thank you, Marcus 20170808 7:38 GMT+02:00 Eibe Frank <[hidden email]>: By “in the same node” you probably mean “in the same leaf node”? No, this is not possible in WEKA without writing some code. _______________________________________________ Wekalist mailing list Send posts to: [hidden email] List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html 
Administrator

Yes, that’s correct.
Cheers, Eibe > On 13/08/2017, at 10:08 PM, Marcus Müller <[hidden email]> wrote: > > Hello Eibe, > > thank you very much for your answer. If I understand you correctly, the PartitionMembership filter produces a sparse vector with information about the presence of an attribute in ALL nodes of the random forest. Did I get you right, that without manipulating the RandomTree class there is no way of distinguishing, whether a node in the PartitionMembership output is an internal or a leave node? > > Thank you, > Marcus > > 20170808 7:38 GMT+02:00 Eibe Frank <[hidden email]>: > By “in the same node” you probably mean “in the same leaf node”? No, this is not possible in WEKA without writing some code. > > You can use the PartitionMembership filter with RandomForest as the partition generator to get a membership indicator vector for each input instance. This vector will contain one attribute value for each node in the RandomForest (leaves and internal nodes!). The attribute value will be 1 if the corresponding node contains the input instance and 0 otherwise. (This is assuming standard singleinstance input data and not multiinstance data.) The vector will be represented as a SparseInstance to save space. > > You could then use this for clustering, etc., by applying a distance function such as Manhattan distance to compare the membership vectors. > > Commandline example usage of the filter: > > java cp ~/weka391/weka.jar weka.Run .PartitionMembership W .RandomForest i ~/datasets/UCI/iris.arff c last > > If you are willing to write some code, you can subclass RandomTree and change the relevant methods so that it only considers leaf nodes when generating the membership vectors. > > Cheers, > Eibe > > > On 7/08/2017, at 11:46 AM, Marcus Müller <[hidden email]> wrote: > > > > Dear WEKAExperts, > > > > according to Leo Breiman, a proximity (or similarity) measure of the random forest for two different instances is the number of trees they end up in the same node divided by the total number of trees. Is there any builtin functionality like this, to evaluate the similarity of two instances in WEKA ? > > > > Thank you very much, > > Marcus > > _______________________________________________ > > Wekalist mailing list > > Send posts to: [hidden email] > > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist > > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html > > _______________________________________________ > Wekalist mailing list > Send posts to: [hidden email] > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html > > _______________________________________________ > Wekalist mailing list > Send posts to: [hidden email] > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html _______________________________________________ Wekalist mailing list Send posts to: [hidden email] List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html 
Free forum by Nabble  Edit this page 