Once you have identified the best classifier after 100x10-fold CV, you rebuild this classifier on the 90% of the data comprising the training and validation sets and get the final performance estimate by using this rebuilt classifier to predict the remaining 10% of the data, correct?

Your process seems quite reasonable to me. A drawback is that classifier A might be worse than classifier B when trained on (0.9*30)% of the data (in the 100x10-fold CV) but better when trained on 90% of the data. In that case, you would end up picking a suboptimal classifier for your final model. Another drawback is that the best parameter setting when training from <60% of the data may not be the best one for substantially different training set sizes.

If I had to do this, I would probably take the 90% of the data comprising the training and validation sets and use a 10x10-fold CV in the Experimenter on this 90% dataset to compare

1) MultiSearch with classifier A as the base learner

2) MultiSearch with classifier B as the base learner

…

In MultiSearch, I would use something like 5-fold CV to estimate performance for different parameter settings so that time complexity remains reasonable. In this way, the training set size of the base learners is (0.8*0.9*90)% during parameter tuning and (0.9*90)% for the estimates obtained with the 10x10-fold CV.

Then, having established the best of 1), 2), etc., I would apply that configuration of MultiSearch to build the final model from the 90% dataset and evaluate that on the remaining 10% of the data.

(Obviously, MultiSearch is not required if the base learner has no parameters to optimise.)

Cheers,

Eibe

> On 22/01/2019, at 11:32 PM, Sallam ABUALHAIJA <

[hidden email]> wrote:

>

> Hello,

>

> I have a question regarding “model selection”.

>

> I have a big dataset of 18,300 instances for a binary classification problem. I split it as 60% for training, 30% for validation and 10% for final evaluation (using random stratification). Then, I used the 60% training set to optimize the hyper-parameters of several different classifiers.

> After that, I compare the performance of the resulting models on the 30% validation set (>5,000 examples) by using 10-folds cross-validation 100 runs on the experimenter such that I get 1,000 observations in order to be able to perform a significance test.

>

> Is using 10-folds cv in this case plausible? Does anyone have any useful reference that uses this method? Would the alternative of training on the training set and testing on the validation set be more correct?

>

> Thanks a lot,

> Sallam

>

>

> _______________________________________________

> Wekalist mailing list

> Send posts to:

[hidden email]
> To subscribe, unsubscribe, etc., visit

https://list.waikato.ac.nz/mailman/listinfo/wekalist> List etiquette:

http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html_______________________________________________

Wekalist mailing list

Send posts to:

[hidden email]
To subscribe, unsubscribe, etc., visit

https://list.waikato.ac.nz/mailman/listinfo/wekalistList etiquette:

http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html