Nested cross validation in Weka

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

Nested cross validation in Weka

Ketil Oppedal
Good afternoon,

I have problems explaining exactly how nested cross validation is performed when using AttributeSelectedClassifier in Weka. I have used the AttributeSelectedClassifier with a wrapper and the same classifier (Random Forest) in both stages. From the beginning - what exactly happens during each step of the procedure? When is feature selected and when are the RF build? 

I need to understand this thoroughly and hope for help :-)


Have a nice day,

Ketil

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Nested cross validation in Weka

Martin
In AttributeSelectedClassifier, search method searches for the selected features, then the evaluator method evaluates the merit of the selected subset. Finally, classifier algorithm (e.g., Random Forest) is the base classifier to be used for prediction purpose. 

Regards, 
Martin

On 29 September 2016 at 18:50, Ketil Oppedal <[hidden email]> wrote:
Good afternoon,

I have problems explaining exactly how nested cross validation is performed when using AttributeSelectedClassifier in Weka. I have used the AttributeSelectedClassifier with a wrapper and the same classifier (Random Forest) in both stages. From the beginning - what exactly happens during each step of the procedure? When is feature selected and when are the RF build? 

I need to understand this thoroughly and hope for help :-)


Have a nice day,

Ketil

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Nested cross validation in Weka

Ketil Oppedal

Thank you very much.

Could you also help me out explaining when the search, evaluation, and classification happens regarding the fold structure? What happens in the first fold of the inner cv, what happens in the second,....,  and what happens in the first fold of the outer cv and so on?

Thanks again,

Ketil


1. okt. 2016 6.36 p.m. skrev "Martin L" <[hidden email]>:
In AttributeSelectedClassifier, search method searches for the selected features, then the evaluator method evaluates the merit of the selected subset. Finally, classifier algorithm (e.g., Random Forest) is the base classifier to be used for prediction purpose. 

Regards, 
Martin

On 29 September 2016 at 18:50, Ketil Oppedal <[hidden email]> wrote:
Good afternoon,

I have problems explaining exactly how nested cross validation is performed when using AttributeSelectedClassifier in Weka. I have used the AttributeSelectedClassifier with a wrapper and the same classifier (Random Forest) in both stages. From the beginning - what exactly happens during each step of the procedure? When is feature selected and when are the RF build? 

I need to understand this thoroughly and hope for help :-)


Have a nice day,

Ketil

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Nested cross validation in Weka

Martin

WEKA applies the standard way of predicting the error rate of a learning technique where the data is divided randomly into ten portions in which the class is approximately represented in  the same proportions as in the full dataset. Each part is held out in turn and the process (search, evaluate, and classify) occurs on the remaining nine-tenths. After that,  its error rate is computed on the holdout set. Therefore, learning procedure (search, evaluate, and classify) is executed a total of ten times on different training sets. Finally, the ten error estimates are averaged to produce an overall error estimate.

Regards,
Martin

On 2 Oct 2016 3:05 am, "Ketil Oppedal" <[hidden email]> wrote:

Thank you very much.

Could you also help me out explaining when the search, evaluation, and classification happens regarding the fold structure? What happens in the first fold of the inner cv, what happens in the second,....,  and what happens in the first fold of the outer cv and so on?

Thanks again,

Ketil


1. okt. 2016 6.36 p.m. skrev "Martin L" <[hidden email]>:
In AttributeSelectedClassifier, search method searches for the selected features, then the evaluator method evaluates the merit of the selected subset. Finally, classifier algorithm (e.g., Random Forest) is the base classifier to be used for prediction purpose. 

Regards, 
Martin

On 29 September 2016 at 18:50, Ketil Oppedal <[hidden email]> wrote:
Good afternoon,

I have problems explaining exactly how nested cross validation is performed when using AttributeSelectedClassifier in Weka. I have used the AttributeSelectedClassifier with a wrapper and the same classifier (Random Forest) in both stages. From the beginning - what exactly happens during each step of the procedure? When is feature selected and when are the RF build? 

I need to understand this thoroughly and hope for help :-)


Have a nice day,

Ketil

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Nested cross validation in Weka

Ketil Oppedal
Excellent, thank you. 

So what about the inner and outer CV? Have I understood you correctly if - when using nested CV - that search, evaluate, and classify is performed in the first fold (9/10th) of the inner CV and tested on the remaining 1/10th of the inner CV and then again validated on the first fold of the outer CV? 

Or - if ten models are build in the inner CV, which one of them are tested in the outer CV procedure? 

Regards,
Ketil

2016-10-02 0:51 GMT+02:00 Martin L <[hidden email]>:

WEKA applies the standard way of predicting the error rate of a learning technique where the data is divided randomly into ten portions in which the class is approximately represented in  the same proportions as in the full dataset. Each part is held out in turn and the process (search, evaluate, and classify) occurs on the remaining nine-tenths. After that,  its error rate is computed on the holdout set. Therefore, learning procedure (search, evaluate, and classify) is executed a total of ten times on different training sets. Finally, the ten error estimates are averaged to produce an overall error estimate.

Regards,
Martin

On 2 Oct 2016 3:05 am, "Ketil Oppedal" <[hidden email]> wrote:

Thank you very much.

Could you also help me out explaining when the search, evaluation, and classification happens regarding the fold structure? What happens in the first fold of the inner cv, what happens in the second,....,  and what happens in the first fold of the outer cv and so on?

Thanks again,

Ketil


1. okt. 2016 6.36 p.m. skrev "Martin L" <[hidden email]>:
In AttributeSelectedClassifier, search method searches for the selected features, then the evaluator method evaluates the merit of the selected subset. Finally, classifier algorithm (e.g., Random Forest) is the base classifier to be used for prediction purpose. 

Regards, 
Martin

On 29 September 2016 at 18:50, Ketil Oppedal <[hidden email]> wrote:
Good afternoon,

I have problems explaining exactly how nested cross validation is performed when using AttributeSelectedClassifier in Weka. I have used the AttributeSelectedClassifier with a wrapper and the same classifier (Random Forest) in both stages. From the beginning - what exactly happens during each step of the procedure? When is feature selected and when are the RF build? 

I need to understand this thoroughly and hope for help :-)


Have a nice day,

Ketil

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Nested cross validation in Weka

Martin


Excellent, thank you. 

So what about the inner and outer CV? Have I understood you correctly if - when using nested CV - that search, evaluate, and classify is performed in the first fold (9/10th) of the inner CV and tested on the remaining 1/10th of the inner CV and then again validated on the first fold of the outer CV? 

Generally, consider the inner cross-validation as part of the model fitting procedure, whereas the outer cross-validation estimates the performance of this model fitting approach. Thus, both learning and testing procedures occur in each fold. For attribute selection, learning process (including instance searching, evaluating, and classifying), then testing what have been learned, are executed a total of 10 times in the standard stratified tenfold cross-validation. 

However, in the mechanism of parameter selection-based cross-validation; inner cross-validation is used for parameter tuning, but outer cross-validation is applied for error estimation.

Regards,
Martin



Or - if ten models are build in the inner CV, which one of them are tested in the outer CV procedure? 

Regards,
Ketil

2016-10-02 0:51 GMT+02:00 Martin L <[hidden email]>:

WEKA applies the standard way of predicting the error rate of a learning technique where the data is divided randomly into ten portions in which the class is approximately represented in  the same proportions as in the full dataset. Each part is held out in turn and the process (search, evaluate, and classify) occurs on the remaining nine-tenths. After that,  its error rate is computed on the holdout set. Therefore, learning procedure (search, evaluate, and classify) is executed a total of ten times on different training sets. Finally, the ten error estimates are averaged to produce an overall error estimate.

Regards,
Martin

On 2 Oct 2016 3:05 am, "Ketil Oppedal" <[hidden email]> wrote:

Thank you very much.

Could you also help me out explaining when the search, evaluation, and classification happens regarding the fold structure? What happens in the first fold of the inner cv, what happens in the second,....,  and what happens in the first fold of the outer cv and so on?

Thanks again,

Ketil


1. okt. 2016 6.36 p.m. skrev "Martin L" <[hidden email]>:
In AttributeSelectedClassifier, search method searches for the selected features, then the evaluator method evaluates the merit of the selected subset. Finally, classifier algorithm (e.g., Random Forest) is the base classifier to be used for prediction purpose. 

Regards, 
Martin

On 29 September 2016 at 18:50, Ketil Oppedal <[hidden email]> wrote:
Good afternoon,

I have problems explaining exactly how nested cross validation is performed when using AttributeSelectedClassifier in Weka. I have used the AttributeSelectedClassifier with a wrapper and the same classifier (Random Forest) in both stages. From the beginning - what exactly happens during each step of the procedure? When is feature selected and when are the RF build? 

I need to understand this thoroughly and hope for help :-)


Have a nice day,

Ketil

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Nested cross validation in Weka

Ketil Oppedal
Good afternoon,

Say we have a data set consisting of 100 instances and use 10-fold CV in both the outer- and inner CV scheme. Say also use a RF classifier with the same parameters both places.

2016-10-03 7:18 GMT+02:00 Martin L <[hidden email]>:


Excellent, thank you. 

So what about the inner and outer CV? Have I understood you correctly if - when using nested CV - that search, evaluate, and classify is performed in the first fold (9/10th) of the inner CV and tested on the remaining 1/10th of the inner CV and then again validated on the first fold of the outer CV? 

Generally, consider the inner cross-validation as part of the model fitting procedure, whereas the outer cross-validation estimates the performance of this model fitting approach. Thus, both learning and testing procedures occur in each fold. For attribute selection, learning process (including instance searching, evaluating, and classifying), then testing what have been learned, are executed a total of 10 times in the standard stratified tenfold cross-validation. 

However, in the mechanism of parameter selection-based cross-validation; inner cross-validation is used for parameter tuning, but outer cross-validation is applied for error estimation.

Regards,
Martin



Or - if ten models are build in the inner CV, which one of them are tested in the outer CV procedure? 

Regards,
Ketil

2016-10-02 0:51 GMT+02:00 Martin L <[hidden email]>:

WEKA applies the standard way of predicting the error rate of a learning technique where the data is divided randomly into ten portions in which the class is approximately represented in  the same proportions as in the full dataset. Each part is held out in turn and the process (search, evaluate, and classify) occurs on the remaining nine-tenths. After that,  its error rate is computed on the holdout set. Therefore, learning procedure (search, evaluate, and classify) is executed a total of ten times on different training sets. Finally, the ten error estimates are averaged to produce an overall error estimate.

Regards,
Martin

On 2 Oct 2016 3:05 am, "Ketil Oppedal" <[hidden email]> wrote:

Thank you very much.

Could you also help me out explaining when the search, evaluation, and classification happens regarding the fold structure? What happens in the first fold of the inner cv, what happens in the second,....,  and what happens in the first fold of the outer cv and so on?

Thanks again,

Ketil


1. okt. 2016 6.36 p.m. skrev "Martin L" <[hidden email]>:
In AttributeSelectedClassifier, search method searches for the selected features, then the evaluator method evaluates the merit of the selected subset. Finally, classifier algorithm (e.g., Random Forest) is the base classifier to be used for prediction purpose. 

Regards, 
Martin

On 29 September 2016 at 18:50, Ketil Oppedal <[hidden email]> wrote:
Good afternoon,

I have problems explaining exactly how nested cross validation is performed when using AttributeSelectedClassifier in Weka. I have used the AttributeSelectedClassifier with a wrapper and the same classifier (Random Forest) in both stages. From the beginning - what exactly happens during each step of the procedure? When is feature selected and when are the RF build? 

I need to understand this thoroughly and hope for help :-)


Have a nice day,

Ketil

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Nested cross validation in Weka

Ketil Oppedal
In reply to this post by Martin
Good afternoon,

I need to know more precisely what happens step by step when using the AttributeSelectedClassifier with WrapperSubsetEval. I hope you can help me sort this out. Say we have a data set consisting of 100 instances and use 10-fold CV in both the outer- and inner CV scheme. Say we also use a RF classifier with the same parameters both places, with search method best first, forward and n=5. Then the initial training data will consist of 81 instances, correct? First of all, one feature will be selected and a RF will be trained. Then another feature will be added and another RF will be trained. This will continue until five consective feature additions do not increase classification performance. Then this system will be tested on the remaining nine instances in the inner CV loop. This will be performed 10 times.

Which of these ten models will be tested in the 10 instances remaining in the first fold of the outer CV loop?

I am afraid I have misunderstood this completely though, hope you are able to help me figure this out.


Regards,

Ketil

2016-10-03 7:18 GMT+02:00 Martin L <[hidden email]>:


Excellent, thank you. 

So what about the inner and outer CV? Have I understood you correctly if - when using nested CV - that search, evaluate, and classify is performed in the first fold (9/10th) of the inner CV and tested on the remaining 1/10th of the inner CV and then again validated on the first fold of the outer CV? 

Generally, consider the inner cross-validation as part of the model fitting procedure, whereas the outer cross-validation estimates the performance of this model fitting approach. Thus, both learning and testing procedures occur in each fold. For attribute selection, learning process (including instance searching, evaluating, and classifying), then testing what have been learned, are executed a total of 10 times in the standard stratified tenfold cross-validation. 

However, in the mechanism of parameter selection-based cross-validation; inner cross-validation is used for parameter tuning, but outer cross-validation is applied for error estimation.

Regards,
Martin



Or - if ten models are build in the inner CV, which one of them are tested in the outer CV procedure? 

Regards,
Ketil

2016-10-02 0:51 GMT+02:00 Martin L <[hidden email]>:

WEKA applies the standard way of predicting the error rate of a learning technique where the data is divided randomly into ten portions in which the class is approximately represented in  the same proportions as in the full dataset. Each part is held out in turn and the process (search, evaluate, and classify) occurs on the remaining nine-tenths. After that,  its error rate is computed on the holdout set. Therefore, learning procedure (search, evaluate, and classify) is executed a total of ten times on different training sets. Finally, the ten error estimates are averaged to produce an overall error estimate.

Regards,
Martin

On 2 Oct 2016 3:05 am, "Ketil Oppedal" <[hidden email]> wrote:

Thank you very much.

Could you also help me out explaining when the search, evaluation, and classification happens regarding the fold structure? What happens in the first fold of the inner cv, what happens in the second,....,  and what happens in the first fold of the outer cv and so on?

Thanks again,

Ketil


1. okt. 2016 6.36 p.m. skrev "Martin L" <[hidden email]>:
In AttributeSelectedClassifier, search method searches for the selected features, then the evaluator method evaluates the merit of the selected subset. Finally, classifier algorithm (e.g., Random Forest) is the base classifier to be used for prediction purpose. 

Regards, 
Martin

On 29 September 2016 at 18:50, Ketil Oppedal <[hidden email]> wrote:
Good afternoon,

I have problems explaining exactly how nested cross validation is performed when using AttributeSelectedClassifier in Weka. I have used the AttributeSelectedClassifier with a wrapper and the same classifier (Random Forest) in both stages. From the beginning - what exactly happens during each step of the procedure? When is feature selected and when are the RF build? 

I need to understand this thoroughly and hope for help :-)


Have a nice day,

Ketil

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Nested cross validation in Weka

Martin
In "AttributeSelectedClassifier", dimensionality get reduced by attribute selection before being passed on to a classifier (RF here). Briefly, in *first* fold cross-validation, 90% of the data is the training set and 10% the test set, the following process occurred:

- BestFirst selects the subset of attriubtes.
- WrapperSubsetEval evaluates the selected subset using base learner of the WrapperSubsetEval.  
- RandomForest (base classifier of AttributeSelectedClassifier) is the utilized algorithm for classificatoin purpose (i.e., building classifier model in this fold on the training set).
-Testing the model on the remain 10% data (set set) 

This process will be repeated 10 times. In the end, ultimate classifier mode will be applied and the ten error estimates are averaged to produce an overall error estimate.


For more informaiton about WEKA's cross-valiation have a look at this link:

Regards, 
Martin



On 4 October 2016 at 21:48, Ketil Oppedal <[hidden email]> wrote:
Good afternoon,

I need to know more precisely what happens step by step when using the AttributeSelectedClassifier with WrapperSubsetEval. I hope you can help me sort this out. Say we have a data set consisting of 100 instances and use 10-fold CV in both the outer- and inner CV scheme. Say we also use a RF classifier with the same parameters both places, with search method best first, forward and n=5. Then the initial training data will consist of 81 instances, correct? First of all, one feature will be selected and a RF will be trained. Then another feature will be added and another RF will be trained. This will continue until five consective feature additions do not increase classification performance. Then this system will be tested on the remaining nine instances in the inner CV loop. This will be performed 10 times.

Which of these ten models will be tested in the 10 instances remaining in the first fold of the outer CV loop?

I am afraid I have misunderstood this completely though, hope you are able to help me figure this out.


Regards,

Ketil

2016-10-03 7:18 GMT+02:00 Martin L <[hidden email]>:


Excellent, thank you. 

So what about the inner and outer CV? Have I understood you correctly if - when using nested CV - that search, evaluate, and classify is performed in the first fold (9/10th) of the inner CV and tested on the remaining 1/10th of the inner CV and then again validated on the first fold of the outer CV? 

Generally, consider the inner cross-validation as part of the model fitting procedure, whereas the outer cross-validation estimates the performance of this model fitting approach. Thus, both learning and testing procedures occur in each fold. For attribute selection, learning process (including instance searching, evaluating, and classifying), then testing what have been learned, are executed a total of 10 times in the standard stratified tenfold cross-validation. 

However, in the mechanism of parameter selection-based cross-validation; inner cross-validation is used for parameter tuning, but outer cross-validation is applied for error estimation.

Regards,
Martin



Or - if ten models are build in the inner CV, which one of them are tested in the outer CV procedure? 

Regards,
Ketil

2016-10-02 0:51 GMT+02:00 Martin L <[hidden email]>:

WEKA applies the standard way of predicting the error rate of a learning technique where the data is divided randomly into ten portions in which the class is approximately represented in  the same proportions as in the full dataset. Each part is held out in turn and the process (search, evaluate, and classify) occurs on the remaining nine-tenths. After that,  its error rate is computed on the holdout set. Therefore, learning procedure (search, evaluate, and classify) is executed a total of ten times on different training sets. Finally, the ten error estimates are averaged to produce an overall error estimate.

Regards,
Martin

On 2 Oct 2016 3:05 am, "Ketil Oppedal" <[hidden email]> wrote:

Thank you very much.

Could you also help me out explaining when the search, evaluation, and classification happens regarding the fold structure? What happens in the first fold of the inner cv, what happens in the second,....,  and what happens in the first fold of the outer cv and so on?

Thanks again,

Ketil


1. okt. 2016 6.36 p.m. skrev "Martin L" <[hidden email]>:
In AttributeSelectedClassifier, search method searches for the selected features, then the evaluator method evaluates the merit of the selected subset. Finally, classifier algorithm (e.g., Random Forest) is the base classifier to be used for prediction purpose. 

Regards, 
Martin

On 29 September 2016 at 18:50, Ketil Oppedal <[hidden email]> wrote:
Good afternoon,

I have problems explaining exactly how nested cross validation is performed when using AttributeSelectedClassifier in Weka. I have used the AttributeSelectedClassifier with a wrapper and the same classifier (Random Forest) in both stages. From the beginning - what exactly happens during each step of the procedure? When is feature selected and when are the RF build? 

I need to understand this thoroughly and hope for help :-)


Have a nice day,

Ketil

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Nested cross validation in Weka

Eibe Frank-2
Administrator
In reply to this post by Ketil Oppedal
You haven’t got it quite right. The inner cross-validation is run for each feature subset being considered during the best first search, i.e., the current training set of the of the outer cross-validation (90 instances) is reduced to the feature subset being evaluated and a cross-validation is run on the reduced data, using RandomForest. This gives a score for each feature subset. The best feature subset encountered during the search is then used to build a model on the current training set of the outer cross-validation (90 instances) and this model is evaluated on the corresponding test set of the outer cross-validation (10 instances). Note that a different feature subset may be selected for each of the 10 training sets in the outer 10-fold cross-validation.

Cheers,
Eibe

> On 5/10/2016, at 2:48 AM, Ketil Oppedal <[hidden email]> wrote:
>
> Good afternoon,
>
> I need to know more precisely what happens step by step when using the AttributeSelectedClassifier with WrapperSubsetEval. I hope you can help me sort this out. Say we have a data set consisting of 100 instances and use 10-fold CV in both the outer- and inner CV scheme. Say we also use a RF classifier with the same parameters both places, with search method best first, forward and n=5. Then the initial training data will consist of 81 instances, correct? First of all, one feature will be selected and a RF will be trained. Then another feature will be added and another RF will be trained. This will continue until five consective feature additions do not increase classification performance. Then this system will be tested on the remaining nine instances in the inner CV loop. This will be performed 10 times.
>
> Which of these ten models will be tested in the 10 instances remaining in the first fold of the outer CV loop?
>
> I am afraid I have misunderstood this completely though, hope you are able to help me figure this out.
>
>
> Regards,
>
> Ketil
>
> 2016-10-03 7:18 GMT+02:00 Martin L <[hidden email]>:
>
>
> Excellent, thank you.
>
> So what about the inner and outer CV? Have I understood you correctly if - when using nested CV - that search, evaluate, and classify is performed in the first fold (9/10th) of the inner CV and tested on the remaining 1/10th of the inner CV and then again validated on the first fold of the outer CV?
>
> Generally, consider the inner cross-validation as part of the model fitting procedure, whereas the outer cross-validation estimates the performance of this model fitting approach. Thus, both learning and testing procedures occur in each fold. For attribute selection, learning process (including instance searching, evaluating, and classifying), then testing what have been learned, are executed a total of 10 times in the standard stratified tenfold cross-validation.
>
> However, in the mechanism of parameter selection-based cross-validation; inner cross-validation is used for parameter tuning, but outer cross-validation is applied for error estimation.
>
> Regards,
> Martin
>
>
>
> Or - if ten models are build in the inner CV, which one of them are tested in the outer CV procedure?
>
> Regards,
> Ketil
>
> 2016-10-02 0:51 GMT+02:00 Martin L <[hidden email]>:
> WEKA applies the standard way of predicting the error rate of a learning technique where the data is divided randomly into ten portions in which the class is approximately represented in  the same proportions as in the full dataset. Each part is held out in turn and the process (search, evaluate, and classify) occurs on the remaining nine-tenths. After that,  its error rate is computed on the holdout set. Therefore, learning procedure (search, evaluate, and classify) is executed a total of ten times on different training sets. Finally, the ten error estimates are averaged to produce an overall error estimate.
>
> Regards,
> Martin
>
> On 2 Oct 2016 3:05 am, "Ketil Oppedal" <[hidden email]> wrote:
> Thank you very much.
>
> Could you also help me out explaining when the search, evaluation, and classification happens regarding the fold structure? What happens in the first fold of the inner cv, what happens in the second,....,  and what happens in the first fold of the outer cv and so on?
>
> Thanks again,
>
> Ketil
>
>
> 1. okt. 2016 6.36 p.m. skrev "Martin L" <[hidden email]>:
> In AttributeSelectedClassifier, search method searches for the selected features, then the evaluator method evaluates the merit of the selected subset. Finally, classifier algorithm (e.g., Random Forest) is the base classifier to be used for prediction purpose.
>
> Regards,
> Martin
>
> On 29 September 2016 at 18:50, Ketil Oppedal <[hidden email]> wrote:
> Good afternoon,
>
> I have problems explaining exactly how nested cross validation is performed when using AttributeSelectedClassifier in Weka. I have used the AttributeSelectedClassifier with a wrapper and the same classifier (Random Forest) in both stages. From the beginning - what exactly happens during each step of the procedure? When is feature selected and when are the RF build?
>
> I need to understand this thoroughly and hope for help :-)
>
>
> Have a nice day,
>
> Ketil
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Nested cross validation in Weka

Ketil Oppedal
Thanks again,

so once again let us say we have a dat set of 100 instances, use AttributeSelectedClassifier with WrapperSubsetEval, RF as classifier in both instances, use 10-fold CV in both the outer- and inner CV scheme, with search method best first, forward and n=5 and the default "accuracy" as evaluation measure. My data has 48 features plus a class variable. In the classifier output window I can see that 362 feature subsets have ben evaluated with a merit of best subset of 0.991.  

So - for each of the 362 feature subsets a 10-fold CV is run on the 90 instances (train on 81 and test on 9) in the inner CV and the accuracy averaged over the 10 runs. So one out of the 362 runs have got an accuracy of 0.991. Then, the mentioned feature subset will be used to build a classifier using the outer CV, training on all the 90 instances in the inner CV and testing on the remaining 10. This will be performed 10 times. Is that correct? 

Regards,
Ketil

PS!
Which parameter in RF is the number of trees?

2016-10-04 23:11 GMT+02:00 Eibe Frank <[hidden email]>:
You haven’t got it quite right. The inner cross-validation is run for each feature subset being considered during the best first search, i.e., the current training set of the of the outer cross-validation (90 instances) is reduced to the feature subset being evaluated and a cross-validation is run on the reduced data, using RandomForest. This gives a score for each feature subset. The best feature subset encountered during the search is then used to build a model on the current training set of the outer cross-validation (90 instances) and this model is evaluated on the corresponding test set of the outer cross-validation (10 instances). Note that a different feature subset may be selected for each of the 10 training sets in the outer 10-fold cross-validation.

Cheers,
Eibe

> On 5/10/2016, at 2:48 AM, Ketil Oppedal <[hidden email]> wrote:
>
> Good afternoon,
>
> I need to know more precisely what happens step by step when using the AttributeSelectedClassifier with WrapperSubsetEval. I hope you can help me sort this out. Say we have a data set consisting of 100 instances and use 10-fold CV in both the outer- and inner CV scheme. Say we also use a RF classifier with the same parameters both places, with search method best first, forward and n=5. Then the initial training data will consist of 81 instances, correct? First of all, one feature will be selected and a RF will be trained. Then another feature will be added and another RF will be trained. This will continue until five consective feature additions do not increase classification performance. Then this system will be tested on the remaining nine instances in the inner CV loop. This will be performed 10 times.
>
> Which of these ten models will be tested in the 10 instances remaining in the first fold of the outer CV loop?
>
> I am afraid I have misunderstood this completely though, hope you are able to help me figure this out.
>
>
> Regards,
>
> Ketil
>
> 2016-10-03 7:18 GMT+02:00 Martin L <[hidden email]>:
>
>
> Excellent, thank you.
>
> So what about the inner and outer CV? Have I understood you correctly if - when using nested CV - that search, evaluate, and classify is performed in the first fold (9/10th) of the inner CV and tested on the remaining 1/10th of the inner CV and then again validated on the first fold of the outer CV?
>
> Generally, consider the inner cross-validation as part of the model fitting procedure, whereas the outer cross-validation estimates the performance of this model fitting approach. Thus, both learning and testing procedures occur in each fold. For attribute selection, learning process (including instance searching, evaluating, and classifying), then testing what have been learned, are executed a total of 10 times in the standard stratified tenfold cross-validation.
>
> However, in the mechanism of parameter selection-based cross-validation; inner cross-validation is used for parameter tuning, but outer cross-validation is applied for error estimation.
>
> Regards,
> Martin
>
>
>
> Or - if ten models are build in the inner CV, which one of them are tested in the outer CV procedure?
>
> Regards,
> Ketil
>
> 2016-10-02 0:51 GMT+02:00 Martin L <[hidden email]>:
> WEKA applies the standard way of predicting the error rate of a learning technique where the data is divided randomly into ten portions in which the class is approximately represented in  the same proportions as in the full dataset. Each part is held out in turn and the process (search, evaluate, and classify) occurs on the remaining nine-tenths. After that,  its error rate is computed on the holdout set. Therefore, learning procedure (search, evaluate, and classify) is executed a total of ten times on different training sets. Finally, the ten error estimates are averaged to produce an overall error estimate.
>
> Regards,
> Martin
>
> On 2 Oct 2016 3:05 am, "Ketil Oppedal" <[hidden email]> wrote:
> Thank you very much.
>
> Could you also help me out explaining when the search, evaluation, and classification happens regarding the fold structure? What happens in the first fold of the inner cv, what happens in the second,....,  and what happens in the first fold of the outer cv and so on?
>
> Thanks again,
>
> Ketil
>
>
> 1. okt. 2016 6.36 p.m. skrev "Martin L" <[hidden email]>:
> In AttributeSelectedClassifier, search method searches for the selected features, then the evaluator method evaluates the merit of the selected subset. Finally, classifier algorithm (e.g., Random Forest) is the base classifier to be used for prediction purpose.
>
> Regards,
> Martin
>
> On 29 September 2016 at 18:50, Ketil Oppedal <[hidden email]> wrote:
> Good afternoon,
>
> I have problems explaining exactly how nested cross validation is performed when using AttributeSelectedClassifier in Weka. I have used the AttributeSelectedClassifier with a wrapper and the same classifier (Random Forest) in both stages. From the beginning - what exactly happens during each step of the procedure? When is feature selected and when are the RF build?
>
> I need to understand this thoroughly and hope for help :-)
>
>
> Have a nice day,
>
> Ketil
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Nested cross validation in Weka

Eibe Frank-2
Administrator
The classifier that is shown in the output panel is the one built from the full datasets, so no outer cross-validation is involved to compute this output. The 362 feature subsets were considered in a search performed on the full dataset, where each of the subsets was evaluated using 10-fold cross-validation. The best CV score in this process was 0.991.

Nested cross-validation is only used to compute the performance statistics shown under "=== Stratified cross-validation ===“ (if cross-validation is the evaluation option you have selected).

Cheers,
Eibe

> On 5/10/2016, at 10:55 PM, Ketil Oppedal <[hidden email]> wrote:
>
> Thanks again,
>
> so once again let us say we have a dat set of 100 instances, use AttributeSelectedClassifier with WrapperSubsetEval, RF as classifier in both instances, use 10-fold CV in both the outer- and inner CV scheme, with search method best first, forward and n=5 and the default "accuracy" as evaluation measure. My data has 48 features plus a class variable. In the classifier output window I can see that 362 feature subsets have ben evaluated with a merit of best subset of 0.991.  
>
> So - for each of the 362 feature subsets a 10-fold CV is run on the 90 instances (train on 81 and test on 9) in the inner CV and the accuracy averaged over the 10 runs. So one out of the 362 runs have got an accuracy of 0.991. Then, the mentioned feature subset will be used to build a classifier using the outer CV, training on all the 90 instances in the inner CV and testing on the remaining 10. This will be performed 10 times. Is that correct?
>
> Regards,
> Ketil
>
> PS!
> Which parameter in RF is the number of trees?
>
> 2016-10-04 23:11 GMT+02:00 Eibe Frank <[hidden email]>:
> You haven’t got it quite right. The inner cross-validation is run for each feature subset being considered during the best first search, i.e., the current training set of the of the outer cross-validation (90 instances) is reduced to the feature subset being evaluated and a cross-validation is run on the reduced data, using RandomForest. This gives a score for each feature subset. The best feature subset encountered during the search is then used to build a model on the current training set of the outer cross-validation (90 instances) and this model is evaluated on the corresponding test set of the outer cross-validation (10 instances). Note that a different feature subset may be selected for each of the 10 training sets in the outer 10-fold cross-validation.
>
> Cheers,
> Eibe
>
> > On 5/10/2016, at 2:48 AM, Ketil Oppedal <[hidden email]> wrote:
> >
> > Good afternoon,
> >
> > I need to know more precisely what happens step by step when using the AttributeSelectedClassifier with WrapperSubsetEval. I hope you can help me sort this out. Say we have a data set consisting of 100 instances and use 10-fold CV in both the outer- and inner CV scheme. Say we also use a RF classifier with the same parameters both places, with search method best first, forward and n=5. Then the initial training data will consist of 81 instances, correct? First of all, one feature will be selected and a RF will be trained. Then another feature will be added and another RF will be trained. This will continue until five consective feature additions do not increase classification performance. Then this system will be tested on the remaining nine instances in the inner CV loop. This will be performed 10 times.
> >
> > Which of these ten models will be tested in the 10 instances remaining in the first fold of the outer CV loop?
> >
> > I am afraid I have misunderstood this completely though, hope you are able to help me figure this out.
> >
> >
> > Regards,
> >
> > Ketil
> >
> > 2016-10-03 7:18 GMT+02:00 Martin L <[hidden email]>:
> >
> >
> > Excellent, thank you.
> >
> > So what about the inner and outer CV? Have I understood you correctly if - when using nested CV - that search, evaluate, and classify is performed in the first fold (9/10th) of the inner CV and tested on the remaining 1/10th of the inner CV and then again validated on the first fold of the outer CV?
> >
> > Generally, consider the inner cross-validation as part of the model fitting procedure, whereas the outer cross-validation estimates the performance of this model fitting approach. Thus, both learning and testing procedures occur in each fold. For attribute selection, learning process (including instance searching, evaluating, and classifying), then testing what have been learned, are executed a total of 10 times in the standard stratified tenfold cross-validation.
> >
> > However, in the mechanism of parameter selection-based cross-validation; inner cross-validation is used for parameter tuning, but outer cross-validation is applied for error estimation.
> >
> > Regards,
> > Martin
> >
> >
> >
> > Or - if ten models are build in the inner CV, which one of them are tested in the outer CV procedure?
> >
> > Regards,
> > Ketil
> >
> > 2016-10-02 0:51 GMT+02:00 Martin L <[hidden email]>:
> > WEKA applies the standard way of predicting the error rate of a learning technique where the data is divided randomly into ten portions in which the class is approximately represented in  the same proportions as in the full dataset. Each part is held out in turn and the process (search, evaluate, and classify) occurs on the remaining nine-tenths. After that,  its error rate is computed on the holdout set. Therefore, learning procedure (search, evaluate, and classify) is executed a total of ten times on different training sets. Finally, the ten error estimates are averaged to produce an overall error estimate.
> >
> > Regards,
> > Martin
> >
> > On 2 Oct 2016 3:05 am, "Ketil Oppedal" <[hidden email]> wrote:
> > Thank you very much.
> >
> > Could you also help me out explaining when the search, evaluation, and classification happens regarding the fold structure? What happens in the first fold of the inner cv, what happens in the second,....,  and what happens in the first fold of the outer cv and so on?
> >
> > Thanks again,
> >
> > Ketil
> >
> >
> > 1. okt. 2016 6.36 p.m. skrev "Martin L" <[hidden email]>:
> > In AttributeSelectedClassifier, search method searches for the selected features, then the evaluator method evaluates the merit of the selected subset. Finally, classifier algorithm (e.g., Random Forest) is the base classifier to be used for prediction purpose.
> >
> > Regards,
> > Martin
> >
> > On 29 September 2016 at 18:50, Ketil Oppedal <[hidden email]> wrote:
> > Good afternoon,
> >
> > I have problems explaining exactly how nested cross validation is performed when using AttributeSelectedClassifier in Weka. I have used the AttributeSelectedClassifier with a wrapper and the same classifier (Random Forest) in both stages. From the beginning - what exactly happens during each step of the procedure? When is feature selected and when are the RF build?
> >
> > I need to understand this thoroughly and hope for help :-)
> >
> >
> > Have a nice day,
> >
> > Ketil
> >
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> >
> >
> >
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> >
> >
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> >
> >
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> >
> >
> >
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> >
> >
> >
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> >
> >
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Nested cross validation in Weka

Ketil Oppedal
Good afternoon and thanks,

OK, so the numbers printed to screen are for display purposes only and are calculated on the complete data set (all 100 instances)? 

That means (given the example above) - if for the first run through the outer CV, 90 instances are chosen to use for the inner CV scheme. There 81 instances are used to train a classifier on a subset of features and tested on the remaining 9. This is performed ten times and the performance of the classifier build using this feature subset is averaged over the ten runs. This is performed again and again on different feature subsets chosen by the BestFirst algorithm (probably somewhere close to 362 depending on the variance in the dataset). Then - when the feature subset with best performance is found, then a classifier is build on the 90 instances using this feature subset and tested on the remaining 10 in the outer CV. This is then repeated ten times and the overall performance averaged. 

Have I understood the procedure correctly?


Regards,

Ketil

2016-10-05 22:58 GMT+02:00 Eibe Frank <[hidden email]>:
The classifier that is shown in the output panel is the one built from the full datasets, so no outer cross-validation is involved to compute this output. The 362 feature subsets were considered in a search performed on the full dataset, where each of the subsets was evaluated using 10-fold cross-validation. The best CV score in this process was 0.991.

Nested cross-validation is only used to compute the performance statistics shown under "=== Stratified cross-validation ===“ (if cross-validation is the evaluation option you have selected).

Cheers,
Eibe

> On 5/10/2016, at 10:55 PM, Ketil Oppedal <[hidden email]> wrote:
>
> Thanks again,
>
> so once again let us say we have a dat set of 100 instances, use AttributeSelectedClassifier with WrapperSubsetEval, RF as classifier in both instances, use 10-fold CV in both the outer- and inner CV scheme, with search method best first, forward and n=5 and the default "accuracy" as evaluation measure. My data has 48 features plus a class variable. In the classifier output window I can see that 362 feature subsets have ben evaluated with a merit of best subset of 0.991.
>
> So - for each of the 362 feature subsets a 10-fold CV is run on the 90 instances (train on 81 and test on 9) in the inner CV and the accuracy averaged over the 10 runs. So one out of the 362 runs have got an accuracy of 0.991. Then, the mentioned feature subset will be used to build a classifier using the outer CV, training on all the 90 instances in the inner CV and testing on the remaining 10. This will be performed 10 times. Is that correct?
>
> Regards,
> Ketil
>
> PS!
> Which parameter in RF is the number of trees?
>
> 2016-10-04 23:11 GMT+02:00 Eibe Frank <[hidden email]>:
> You haven’t got it quite right. The inner cross-validation is run for each feature subset being considered during the best first search, i.e., the current training set of the of the outer cross-validation (90 instances) is reduced to the feature subset being evaluated and a cross-validation is run on the reduced data, using RandomForest. This gives a score for each feature subset. The best feature subset encountered during the search is then used to build a model on the current training set of the outer cross-validation (90 instances) and this model is evaluated on the corresponding test set of the outer cross-validation (10 instances). Note that a different feature subset may be selected for each of the 10 training sets in the outer 10-fold cross-validation.
>
> Cheers,
> Eibe
>
> > On 5/10/2016, at 2:48 AM, Ketil Oppedal <[hidden email]> wrote:
> >
> > Good afternoon,
> >
> > I need to know more precisely what happens step by step when using the AttributeSelectedClassifier with WrapperSubsetEval. I hope you can help me sort this out. Say we have a data set consisting of 100 instances and use 10-fold CV in both the outer- and inner CV scheme. Say we also use a RF classifier with the same parameters both places, with search method best first, forward and n=5. Then the initial training data will consist of 81 instances, correct? First of all, one feature will be selected and a RF will be trained. Then another feature will be added and another RF will be trained. This will continue until five consective feature additions do not increase classification performance. Then this system will be tested on the remaining nine instances in the inner CV loop. This will be performed 10 times.
> >
> > Which of these ten models will be tested in the 10 instances remaining in the first fold of the outer CV loop?
> >
> > I am afraid I have misunderstood this completely though, hope you are able to help me figure this out.
> >
> >
> > Regards,
> >
> > Ketil
> >
> > 2016-10-03 7:18 GMT+02:00 Martin L <[hidden email]>:
> >
> >
> > Excellent, thank you.
> >
> > So what about the inner and outer CV? Have I understood you correctly if - when using nested CV - that search, evaluate, and classify is performed in the first fold (9/10th) of the inner CV and tested on the remaining 1/10th of the inner CV and then again validated on the first fold of the outer CV?
> >
> > Generally, consider the inner cross-validation as part of the model fitting procedure, whereas the outer cross-validation estimates the performance of this model fitting approach. Thus, both learning and testing procedures occur in each fold. For attribute selection, learning process (including instance searching, evaluating, and classifying), then testing what have been learned, are executed a total of 10 times in the standard stratified tenfold cross-validation.
> >
> > However, in the mechanism of parameter selection-based cross-validation; inner cross-validation is used for parameter tuning, but outer cross-validation is applied for error estimation.
> >
> > Regards,
> > Martin
> >
> >
> >
> > Or - if ten models are build in the inner CV, which one of them are tested in the outer CV procedure?
> >
> > Regards,
> > Ketil
> >
> > 2016-10-02 0:51 GMT+02:00 Martin L <[hidden email]>:
> > WEKA applies the standard way of predicting the error rate of a learning technique where the data is divided randomly into ten portions in which the class is approximately represented in  the same proportions as in the full dataset. Each part is held out in turn and the process (search, evaluate, and classify) occurs on the remaining nine-tenths. After that,  its error rate is computed on the holdout set. Therefore, learning procedure (search, evaluate, and classify) is executed a total of ten times on different training sets. Finally, the ten error estimates are averaged to produce an overall error estimate.
> >
> > Regards,
> > Martin
> >
> > On 2 Oct 2016 3:05 am, "Ketil Oppedal" <[hidden email]> wrote:
> > Thank you very much.
> >
> > Could you also help me out explaining when the search, evaluation, and classification happens regarding the fold structure? What happens in the first fold of the inner cv, what happens in the second,....,  and what happens in the first fold of the outer cv and so on?
> >
> > Thanks again,
> >
> > Ketil
> >
> >
> > 1. okt. 2016 6.36 p.m. skrev "Martin L" <[hidden email]>:
> > In AttributeSelectedClassifier, search method searches for the selected features, then the evaluator method evaluates the merit of the selected subset. Finally, classifier algorithm (e.g., Random Forest) is the base classifier to be used for prediction purpose.
> >
> > Regards,
> > Martin
> >
> > On 29 September 2016 at 18:50, Ketil Oppedal <[hidden email]> wrote:
> > Good afternoon,
> >
> > I have problems explaining exactly how nested cross validation is performed when using AttributeSelectedClassifier in Weka. I have used the AttributeSelectedClassifier with a wrapper and the same classifier (Random Forest) in both stages. From the beginning - what exactly happens during each step of the procedure? When is feature selected and when are the RF build?
> >
> > I need to understand this thoroughly and hope for help :-)
> >
> >
> > Have a nice day,
> >
> > Ketil
> >
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> >
> >
> >
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> >
> >
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> >
> >
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> >
> >
> >
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> >
> >
> >
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> >
> >
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Nested cross validation in Weka

Eibe Frank-2
Administrator

> On 7/10/2016, at 12:26 AM, Ketil Oppedal <[hidden email]> wrote:
>
> OK, so the numbers printed to screen are for display purposes only and are calculated on the complete data set (all 100 instances)?

I suppose so, but there is some potentially useful information there: which feature subset is selected on the full data and would be used if the model built from the full training data were to be deployed on fresh test data, its accuracy estimated by CV (optimistically biased though, because it is the maximum of many estimates), and the number of feature subsets/estimates considered.

> That means (given the example above) - if for the first run through the outer CV, 90 instances are chosen to use for the inner CV scheme. There 81 instances are used to train a classifier on a subset of features and tested on the remaining 9. This is performed ten times and the performance of the classifier build using this feature subset is averaged over the ten runs. This is performed again and again on different feature subsets chosen by the BestFirst algorithm (probably somewhere close to 362 depending on the variance in the dataset). Then - when the feature subset with best performance is found, then a classifier is build on the 90 instances using this feature subset and tested on the remaining 10 in the outer CV. This is then repeated ten times and the overall performance averaged.
>
> Have I understood the procedure correctly?

Yes.

Cheers,
Eibe

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Nested cross validation in Weka

xafarranxera
In reply to this post by Ketil Oppedal
Hello,

So, how is the right command order to get the Feature Selection process
inside the Cross Validation loop?

Thanks,



--
Sent from: https://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Nested cross validation in Weka

Eibe Frank
Use the AttributeSelectedClassifier (or, alternatively, the FilteredClassifier together with the AttributeSelection filter).

Cheers,
Eibe

On Wed, Feb 26, 2020 at 12:38 AM xafarranxera <[hidden email]> wrote:
Hello,

So, how is the right command order to get the Feature Selection process
inside the Cross Validation loop?

Thanks,



--
Sent from: https://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html