Random Forest model (number of trees)

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Random Forest model (number of trees)

neha.bologna
Good day everyone

If I am using the Random Forest model, how many trees will it have and how many variables would be available for each tree node? I think the default number of trees in Weka is 100 but in that case, how many variables for each tree node?

Warm regards

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Random Forest model (number of trees)

Peter Reutemann
> If I am using the Random Forest model, how many trees will it have and how many variables would be available for each tree node? I think the default number of trees in Weka is 100 but in that case, how many variables for each tree node?

Copy/paste from the "More" dialog:

numFeatures -- Sets the number of randomly chosen attributes. If 0,
int(log_2(#predictors) + 1) is used.

Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 577-5304
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Random Forest model (number of trees)

neha.bologna
Hi Peter

I am sorry but I did not understand your point. In the more option, I can see 

numFeatures -- Sets the number of randomly chosen attributes. If 0,
int(log_2(#predictors) + 1)

but how can I get the number of variables for each node? I have 20 features in my dataset.

Thank you

On Tue, Apr 27, 2021 at 12:44 AM Peter Reutemann <[hidden email]> wrote:
> If I am using the Random Forest model, how many trees will it have and how many variables would be available for each tree node? I think the default number of trees in Weka is 100 but in that case, how many variables for each tree node?

Copy/paste from the "More" dialog:

numFeatures -- Sets the number of randomly chosen attributes. If 0,
int(log_2(#predictors) + 1) is used.

Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 577-5304
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Random Forest model (number of trees)

neha.bologna
If we use the default, then how many variables for each node? The default is zero, and I read somewhere that zero means unlimited variables? 

Regards

On Tue, Apr 27, 2021 at 7:37 PM Neha gupta <[hidden email]> wrote:
Hi Peter

I am sorry but I did not understand your point. In the more option, I can see 

numFeatures -- Sets the number of randomly chosen attributes. If 0,
int(log_2(#predictors) + 1)

but how can I get the number of variables for each node? I have 20 features in my dataset.

Thank you

On Tue, Apr 27, 2021 at 12:44 AM Peter Reutemann <[hidden email]> wrote:
> If I am using the Random Forest model, how many trees will it have and how many variables would be available for each tree node? I think the default number of trees in Weka is 100 but in that case, how many variables for each tree node?

Copy/paste from the "More" dialog:

numFeatures -- Sets the number of randomly chosen attributes. If 0,
int(log_2(#predictors) + 1) is used.

Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 577-5304
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Random Forest model (number of trees)

Peter Reutemann
In reply to this post by neha.bologna
> I am sorry but I did not understand your point. In the more option, I can see
>
> numFeatures -- Sets the number of randomly chosen attributes. If 0,
> int(log_2(#predictors) + 1)
>
> but how can I get the number of variables for each node? I have 20 features in my dataset.

#predictors is the number of attributes without the class. If you have
20 features incl the class, then you get:
int(log_2(19)+1) = 5

Broken down:
log_2(19) ~ 4.25
log_2(19) + 1 ~ 5.25
int(log_2(19)+1) = 5

That's the number of attributes that are randomly chosen for a tree in
RandomForest.

Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 577-5304
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Random Forest model (number of trees)

neha.bologna
Thank you Peter, very nice explanation.

In some literature, I read that the 'mtry' parameter of RF is sqrt(number of features) for classification problems and number of features / 3 for regression problems.

Kind regards

On Wed, Apr 28, 2021 at 12:32 AM Peter Reutemann <[hidden email]> wrote:
> I am sorry but I did not understand your point. In the more option, I can see
>
> numFeatures -- Sets the number of randomly chosen attributes. If 0,
> int(log_2(#predictors) + 1)
>
> but how can I get the number of variables for each node? I have 20 features in my dataset.

#predictors is the number of attributes without the class. If you have
20 features incl the class, then you get:
int(log_2(19)+1) = 5

Broken down:
log_2(19) ~ 4.25
log_2(19) + 1 ~ 5.25
int(log_2(19)+1) = 5

That's the number of attributes that are randomly chosen for a tree in
RandomForest.

Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 577-5304
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Random Forest model (number of trees)

Eibe Frank-3
Yes, other random forest implementations may be using other heuristics for choosing the size of the random subset of attributes considered at each node of the decision tree as it is being built. WEKA's heuristic is close to (but not exactly the same) as the original heuristic that Leo Breiman first proposed when he introduced random forests.

In practice, to squeeze the absolutely best performance out of a random forest, you generally have to tune this parameter anyway (for example, using internal k-fold cross-validation). These heuristics will almost never give you the best possible random forest for your data.

In WEKA, to automatically tune the parameter specifying the subset size using internal k-fold cross-validation, you could use CVParameterSelection or MultiSearch (the latter is available in a separate package).

Cheers,
Eibe

On Fri, Apr 30, 2021 at 11:22 AM Neha gupta <[hidden email]> wrote:
Thank you Peter, very nice explanation.

In some literature, I read that the 'mtry' parameter of RF is sqrt(number of features) for classification problems and number of features / 3 for regression problems.

Kind regards

On Wed, Apr 28, 2021 at 12:32 AM Peter Reutemann <[hidden email]> wrote:
> I am sorry but I did not understand your point. In the more option, I can see
>
> numFeatures -- Sets the number of randomly chosen attributes. If 0,
> int(log_2(#predictors) + 1)
>
> but how can I get the number of variables for each node? I have 20 features in my dataset.

#predictors is the number of attributes without the class. If you have
20 features incl the class, then you get:
int(log_2(19)+1) = 5

Broken down:
log_2(19) ~ 4.25
log_2(19) + 1 ~ 5.25
int(log_2(19)+1) = 5

That's the number of attributes that are randomly chosen for a tree in
RandomForest.

Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 577-5304
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Random Forest model (number of trees)

neha.bologna
Thank you Eibe for your information. 

Kind regards

On Saturday, May 1, 2021, Eibe Frank <[hidden email]> wrote:
Yes, other random forest implementations may be using other heuristics for choosing the size of the random subset of attributes considered at each node of the decision tree as it is being built. WEKA's heuristic is close to (but not exactly the same) as the original heuristic that Leo Breiman first proposed when he introduced random forests.

In practice, to squeeze the absolutely best performance out of a random forest, you generally have to tune this parameter anyway (for example, using internal k-fold cross-validation). These heuristics will almost never give you the best possible random forest for your data.

In WEKA, to automatically tune the parameter specifying the subset size using internal k-fold cross-validation, you could use CVParameterSelection or MultiSearch (the latter is available in a separate package).

Cheers,
Eibe

On Fri, Apr 30, 2021 at 11:22 AM Neha gupta <[hidden email]> wrote:
Thank you Peter, very nice explanation.

In some literature, I read that the 'mtry' parameter of RF is sqrt(number of features) for classification problems and number of features / 3 for regression problems.

Kind regards

On Wed, Apr 28, 2021 at 12:32 AM Peter Reutemann <[hidden email]> wrote:
> I am sorry but I did not understand your point. In the more option, I can see
>
> numFeatures -- Sets the number of randomly chosen attributes. If 0,
> int(log_2(#predictors) + 1)
>
> but how can I get the number of variables for each node? I have 20 features in my dataset.

#predictors is the number of attributes without the class. If you have
20 features incl the class, then you get:
int(log_2(19)+1) = 5

Broken down:
log_2(19) ~ 4.25
log_2(19) + 1 ~ 5.25
int(log_2(19)+1) = 5

That's the number of attributes that are randomly chosen for a tree in
RandomForest.

Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 577-5304
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html