About the bias of performance estimates

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

About the bias of performance estimates

asadbtk
Hello Eibe and Peter

I read a paper recently which stated that " Recent research has raised concerns about the bias (i.e., how much do the performance estimates differ from the model performance on unseen data?) and variance (i.e., how much do performance estimates vary when the experiment is repeated on the same data?) of model validation techniques when they are applied to defect prediction models."

I know about the variance but what does the authors mean by " how much do the performance estimates differ from the model performance on unseen data "? Do they mean if we train a model in Weka with training data and we get the RMSE value i.e. 0.50 and then same model with test data and we get RMSE= 0.30, can we say that there is a bias of 0.20. 

What I understood from their statement "performance estimates" is estimation with training data and the "model performance on unseen data" is the estimation with test data. Am I interpreting it in wrong way?

For more clearance of their point, I have pasted the diagram in their paper.

image.png

Regards

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: About the bias of performance estimates

Eibe Frank-2
Administrator
Even if you use 10-fold cross-validation to estimate expected performance on unseen data for the model built from the full dataset (the one that WEKA shows you), there will be bias (which people typically ignore). Why? Because each training set in a 10-fold cross-validation only contains 90% of the full dataset. Hence, strictly speaking, you get an estimate of expected performance of a model learned from 90% of the full dataset.

The bias increases as the training sets become smaller. If you use 2-fold cross-validation, you actually estimate the expected performance of a model built from 50% of the data.

Obviously, a percentage split evaluation has the same issue, and bootstrap estimation has it too.

On the plus side, for well-behaved learning algorithms (and only in expectation, of course), these estimates will not *over*estimate performance of the model built on the full dataset; instead, they will tend to underestimate expected performance (i.e., they will be pessimistic). This is in contrast to the huge optimistic bias that you may encounter when you build a model from the full training set and then evaluate this one model on the same training set (i.e., if you perform evaluation on the training set in WEKA).

Cheers,
Eibe

On 25/04/2020, at 11:07 AM, javed khan <[hidden email]> wrote:

Hello Eibe and Peter

I read a paper recently which stated that " Recent research has raised concerns about the bias (i.e., how much do the performance estimates differ from the model performance on unseen data?) and variance (i.e., how much do performance estimates vary when the experiment is repeated on the same data?) of model validation techniques when they are applied to defect prediction models."

I know about the variance but what does the authors mean by " how much do the performance estimates differ from the model performance on unseen data "? Do they mean if we train a model in Weka with training data and we get the RMSE value i.e. 0.50 and then same model with test data and we get RMSE= 0.30, can we say that there is a bias of 0.20. 

What I understood from their statement "performance estimates" is estimation with training data and the "model performance on unseen data" is the estimation with test data. Am I interpreting it in wrong way?

For more clearance of their point, I have pasted the diagram in their paper.

<image.png>

Regards
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: About the bias of performance estimates

asadbtk
Hi Eibe, thank you for the information. I can understand that but coming to the authors of the paper I mentioned, they says bias is " how much do the performance estimates differ from the model performance on unseen data" ?  

If I have to calculate the bias for each ML model i.e. SVM, RF etc, do I need to measure the difference of their performance estimate from train data and test data? For example, RF train data estimation is RMSE= 0.50 and test data estimation accuracy is, RMSE=0.30, do the authors above says that the difference (0.50 - 0.30 = 0.20) could be the 'bias' of the model? 

I am sorry if I am completely misunderstanding their view point.

Best regards

On Sat, Apr 25, 2020 at 3:00 AM Eibe Frank <[hidden email]> wrote:
Even if you use 10-fold cross-validation to estimate expected performance on unseen data for the model built from the full dataset (the one that WEKA shows you), there will be bias (which people typically ignore). Why? Because each training set in a 10-fold cross-validation only contains 90% of the full dataset. Hence, strictly speaking, you get an estimate of expected performance of a model learned from 90% of the full dataset.

The bias increases as the training sets become smaller. If you use 2-fold cross-validation, you actually estimate the expected performance of a model built from 50% of the data.

Obviously, a percentage split evaluation has the same issue, and bootstrap estimation has it too.

On the plus side, for well-behaved learning algorithms (and only in expectation, of course), these estimates will not *over*estimate performance of the model built on the full dataset; instead, they will tend to underestimate expected performance (i.e., they will be pessimistic). This is in contrast to the huge optimistic bias that you may encounter when you build a model from the full training set and then evaluate this one model on the same training set (i.e., if you perform evaluation on the training set in WEKA).

Cheers,
Eibe

On 25/04/2020, at 11:07 AM, javed khan <[hidden email]> wrote:

Hello Eibe and Peter

I read a paper recently which stated that " Recent research has raised concerns about the bias (i.e., how much do the performance estimates differ from the model performance on unseen data?) and variance (i.e., how much do performance estimates vary when the experiment is repeated on the same data?) of model validation techniques when they are applied to defect prediction models."

I know about the variance but what does the authors mean by " how much do the performance estimates differ from the model performance on unseen data "? Do they mean if we train a model in Weka with training data and we get the RMSE value i.e. 0.50 and then same model with test data and we get RMSE= 0.30, can we say that there is a bias of 0.20. 

What I understood from their statement "performance estimates" is estimation with training data and the "model performance on unseen data" is the estimation with test data. Am I interpreting it in wrong way?

For more clearance of their point, I have pasted the diagram in their paper.

<image.png>

Regards
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html