Generalization problem with weka randomforest regressor

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Generalization problem with weka randomforest regressor

Alexandre R. Carvalho

Dear all,

 

We are experiencing a strange result with weka randomforest regressor.

 

With a simple dataset for regression, using the weka randomforest regressor with CV we have a nice result of R2=92%

And with a 80% partition test the result is quite similar R2=90%, as expected.

 

The problem appears with an external dataset, with new data, using the previous model,, the result is R2=-0.01%

 

I cannot understand this generalization failure. The new data is similar to the training dataset.

 

May I ask if you have any idea why is this happening?

 

Thank you for your time. Really appreciated.

Alexandre R. Carvalho



O conteúdo desta mensagem de correio eletrónico e seus anexos podem ser confidenciais e de uso reservado. Se não for o destinatário previsto desta mensagem, queira por favor elimina-la de imediato, não a reenvie a terceiros, nem faça qualquer uso da informação nela contida. Notifique igualmente o remetente que recebeu esta mensagem por engano. As mensagens de e-mail podem conter vírus ou outros defeitos, podem não ser reproduzidas fielmente em outros sistemas, ou podem ser intercetadas, excluídas ou interferidas sem o conhecimento do remetente ou do destinatário. O ISQ não assume nenhuma responsabilidade em relação a qualquer uma destas ocorrências.

The contents of this email message and its attachments may be confidential and reserved use. If you are not the intended recipient of this message, please delete it immediately, do not resubmit to third parties or make any use of the information contained therein. Also notify the sender that you received this message by mistake.
E-mail messages may contain computer viruses or other defects, may not be accurately replicated on other systems, or may be intercepted, deleted or interfered without the knowledge of the sender or intended recipient. The ISQ assumes no responsibility for any of these occurrences.


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Generalization problem with weka randomforest regressor

Eibe Frank-2
Administrator

What about other (tree-based) regressors? Do they exhibit a similar behaviour?

 

Cheers,

Eibe

 

From: [hidden email] <[hidden email]> On Behalf Of Alexandre R. Carvalho
Sent: Thursday, 18 July 2019 10:08 PM
To: [hidden email]
Subject: [Wekalist] Generalization problem with weka randomforest regressor

 

Dear all,

 

We are experiencing a strange result with weka randomforest regressor.

 

With a simple dataset for regression, using the weka randomforest regressor with CV we have a nice result of R2=92%

And with a 80% partition test the result is quite similar R2=90%, as expected.

 

The problem appears with an external dataset, with new data, using the previous model,, the result is R2=-0.01%

 

I cannot understand this generalization failure. The new data is similar to the training dataset.

 

May I ask if you have any idea why is this happening?

 

Thank you for your time. Really appreciated.

Alexandre R. Carvalho

 


O conteúdo desta mensagem de correio eletrónico e seus anexos podem ser confidenciais e de uso reservado. Se não for o destinatário previsto desta mensagem, queira por favor elimina-la de imediato, não a reenvie a terceiros, nem faça qualquer uso da informação nela contida. Notifique igualmente o remetente que recebeu esta mensagem por engano. As mensagens de e-mail podem conter vírus ou outros defeitos, podem não ser reproduzidas fielmente em outros sistemas, ou podem ser intercetadas, excluídas ou interferidas sem o conhecimento do remetente ou do destinatário. O ISQ não assume nenhuma responsabilidade em relação a qualquer uma destas ocorrências.

The contents of this email message and its attachments may be confidential and reserved use. If you are not the intended recipient of this message, please delete it immediately, do not resubmit to third parties or make any use of the information contained therein. Also notify the sender that you received this message by mistake.
E-mail messages may contain computer viruses or other defects, may not be accurately replicated on other systems, or may be intercepted, deleted or interfered without the knowledge of the sender or intended recipient. The ISQ assumes no responsibility for any of these occurrences.


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Generalization problem with weka randomforest regressor

Alexandre R. Carvalho
In reply to this post by Alexandre R. Carvalho

Dear all,

 

We are experiencing a strange result with weka randomforest regressor.

 

With a simple dataset for regression, using the weka randomforest regressor with CV we have a nice result of R2=92%

And with a 80% partition test the result is quite similar R2=90%, as expected.

 

The problem appears with an external dataset, with new data, using the previous model, the result is R2=-0.01%

 

I cannot understand this generalization failure. The new data is similar to the training dataset.

 

May I ask if you have any idea why is this happening?

 

Thank you for your time. Really appreciated.

Alexandre R. Carvalho



O conteúdo desta mensagem de correio eletrónico e seus anexos podem ser confidenciais e de uso reservado. Se não for o destinatário previsto desta mensagem, queira por favor elimina-la de imediato, não a reenvie a terceiros, nem faça qualquer uso da informação nela contida. Notifique igualmente o remetente que recebeu esta mensagem por engano. As mensagens de e-mail podem conter vírus ou outros defeitos, podem não ser reproduzidas fielmente em outros sistemas, ou podem ser intercetadas, excluídas ou interferidas sem o conhecimento do remetente ou do destinatário. O ISQ não assume nenhuma responsabilidade em relação a qualquer uma destas ocorrências.

The contents of this email message and its attachments may be confidential and reserved use. If you are not the intended recipient of this message, please delete it immediately, do not resubmit to third parties or make any use of the information contained therein. Also notify the sender that you received this message by mistake.
E-mail messages may contain computer viruses or other defects, may not be accurately replicated on other systems, or may be intercepted, deleted or interfered without the knowledge of the sender or intended recipient. The ISQ assumes no responsibility for any of these occurrences.


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Generalization problem with weka randomforest regressor

Alexandre R. Carvalho
In reply to this post by Alexandre R. Carvalho

Dear Eibe Frank-2,

 

Thank you for your reply. Really appreciated.

 

With randomTree I have the same result.

With M5P the result with CV is as bad as with external data.

 

Any ideas why?

 

With my best regards,

Alexandre R. Carvalho



O conteúdo desta mensagem de correio eletrónico e seus anexos podem ser confidenciais e de uso reservado. Se não for o destinatário previsto desta mensagem, queira por favor elimina-la de imediato, não a reenvie a terceiros, nem faça qualquer uso da informação nela contida. Notifique igualmente o remetente que recebeu esta mensagem por engano. As mensagens de e-mail podem conter vírus ou outros defeitos, podem não ser reproduzidas fielmente em outros sistemas, ou podem ser intercetadas, excluídas ou interferidas sem o conhecimento do remetente ou do destinatário. O ISQ não assume nenhuma responsabilidade em relação a qualquer uma destas ocorrências.

The contents of this email message and its attachments may be confidential and reserved use. If you are not the intended recipient of this message, please delete it immediately, do not resubmit to third parties or make any use of the information contained therein. Also notify the sender that you received this message by mistake.
E-mail messages may contain computer viruses or other defects, may not be accurately replicated on other systems, or may be intercepted, deleted or interfered without the knowledge of the sender or intended recipient. The ISQ assumes no responsibility for any of these occurrences.


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Generalization problem with weka randomforest regressor

usacoder
In reply to this post by Alexandre R. Carvalho
During my testing with any WEKA classifier I have found that if I get
excellent results with partitioned test data but get terrible results with
external data then this was caused by some of the partitioned test data
/bleeding/ into the training data.  

You should verify that during partition tests that the test data is
completely separate from training.




--
Sent from: http://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Generalization problem with weka randomforest regressor

Eibe Frank-2
Administrator
In reply to this post by Alexandre R. Carvalho

What about other (tree-based) regressors such as REPTree? Do they exhibit a similar behaviour? What about LinearRegression?

 

Cheers,

Eibe

 

From: [hidden email]
Sent: Wednesday, 24 July 2019 11:01 PM
To: [hidden email]
Subject: [Wekalist] Generalization problem with weka randomforest regressor

 

Dear all,

 

We are experiencing a strange result with weka randomforest regressor.

 

With a simple dataset for regression, using the weka randomforest regressor with CV we have a nice result of R2=92%

And with a 80% partition test the result is quite similar R2=90%, as expected.

 

The problem appears with an external dataset, with new data, using the previous model, the result is R2=-0.01%

 

I cannot understand this generalization failure. The new data is similar to the training dataset.

 

May I ask if you have any idea why is this happening?

 

Thank you for your time. Really appreciated.

Alexandre R. Carvalho

 

O conteúdo desta mensagem de correio eletrónico e seus anexos podem ser confidenciais e de uso reservado. Se não for o destinatário previsto desta mensagem, queira por favor elimina-la de imediato, não a reenvie a terceiros, nem faça qualquer uso da informação nela contida. Notifique igualmente o remetente que recebeu esta mensagem por engano. As mensagens de e-mail podem conter vírus ou outros defeitos, podem não ser reproduzidas fielmente em outros sistemas, ou podem ser intercetadas, excluídas ou interferidas sem o conhecimento do remetente ou do destinatário. O ISQ não assume nenhuma responsabilidade em relação a qualquer uma destas ocorrências.

The contents of this email message and its attachments may be confidential and reserved use. If you are not the intended recipient of this message, please delete it immediately, do not resubmit to third parties or make any use of the information contained therein. Also notify the sender that you received this message by mistake.
E-mail messages may contain computer viruses or other defects, may not be accurately replicated on other systems, or may be intercepted, deleted or interfered without the knowledge of the sender or intended recipient. The ISQ assumes no responsibility for any of these occurrences.

 


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Generalization problem with weka randomforest regressor

Alexandre R. Carvalho

Dear Professor Eibe Frank,

 

Thank you for your reply. Really appreciated.

 

With RandomForest      R= 0.9039 for CV and R= -0.0085 for external dataset.

With RandomTree          R= 0.9023 for CV and R=-0.0207 for external dataset.

With M5P                        R=0.0476 for CV and  R=-0.0256 for external dataset.

With REPTree                   R= 0.5814 for CV and R= 0.0334 for external dataset.

With LinearRegression  R=0.0978 for CV  and R=0.13 for external dataset.

 

Any ideas why?

 

With my best regards,

Alexandre R. Carvalho

 

 

cid:image002.jpg@01D4BCA9.CAC94530

cid:image003.jpg@01D4BCA9.CAC94530

Alexandre R. Carvalho, PhD
Researcher - Intelligent & Digital Systems

Research, Development and Innovation

M. +351 910 525 390
SkypeName: arcarvalho
www.isqgroup.com


cid:image004.png@01D4BCA9.CAC94530  cid:image005.png@01D4BCA9.CAC94530

 

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Eibe Frank
Sent: 25 de julho de 2019 06:13
To: Weka machine learning workbench list. <[hidden email]>
Subject: Re: [Wekalist] Generalization problem with weka randomforest regressor

 

What about other (tree-based) regressors such as REPTree? Do they exhibit a similar behaviour? What about LinearRegression?

 

Cheers,

Eibe

 

From: [hidden email]
Sent: Wednesday, 24 July 2019 11:01 PM
To: [hidden email]
Subject: [Wekalist] Generalization problem with weka randomforest regressor

 

Dear all,

 

We are experiencing a strange result with weka randomforest regressor.

 

With a simple dataset for regression, using the weka randomforest regressor with CV we have a nice result of R2=92%

And with a 80% partition test the result is quite similar R2=90%, as expected.

 

The problem appears with an external dataset, with new data, using the previous model, the result is R2=-0.01%

 

I cannot understand this generalization failure. The new data is similar to the training dataset.

 

May I ask if you have any idea why is this happening?

 

Thank you for your time. Really appreciated.

Alexandre R. Carvalho

 

O conteúdo desta mensagem de correio eletrónico e seus anexos podem ser confidenciais e de uso reservado. Se não for o destinatário previsto desta mensagem, queira por favor elimina-la de imediato, não a reenvie a terceiros, nem faça qualquer uso da informação nela contida. Notifique igualmente o remetente que recebeu esta mensagem por engano. As mensagens de e-mail podem conter vírus ou outros defeitos, podem não ser reproduzidas fielmente em outros sistemas, ou podem ser intercetadas, excluídas ou interferidas sem o conhecimento do remetente ou do destinatário. O ISQ não assume nenhuma responsabilidade em relação a qualquer uma destas ocorrências.

The contents of this email message and its attachments may be confidential and reserved use. If you are not the intended recipient of this message, please delete it immediately, do not resubmit to third parties or make any use of the information contained therein. Also notify the sender that you received this message by mistake.
E-mail messages may contain computer viruses or other defects, may not be accurately replicated on other systems, or may be intercepted, deleted or interfered without the knowledge of the sender or intended recipient. The ISQ assumes no responsibility for any of these occurrences.

 


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Generalization problem with weka randomforest regressor

Eibe Frank-3
Have you had a look at the REPTree classifiers that are generated during the cross-validation? In the latest versions (i.e.,, WEKA 3.8.3 and WEKA 3.9.3) you can do that easily in the Explorer by ticking the appropriate box under "More options..." in the Classify panel.

Cheers,
EIbe

On Thu, Jul 25, 2019 at 9:09 PM Alexandre R. Carvalho <[hidden email]> wrote:

Dear Professor Eibe Frank,

 

Thank you for your reply. Really appreciated.

 

With RandomForest      R= 0.9039 for CV and R= -0.0085 for external dataset.

With RandomTree          R= 0.9023 for CV and R=-0.0207 for external dataset.

With M5P                        R=0.0476 for CV and  R=-0.0256 for external dataset.

With REPTree                   R= 0.5814 for CV and R= 0.0334 for external dataset.

With LinearRegression  R=0.0978 for CV  and R=0.13 for external dataset.

 

Any ideas why?

 

With my best regards,

Alexandre R. Carvalho

 

 

cid:image002.jpg@01D4BCA9.CAC94530

cid:image003.jpg@01D4BCA9.CAC94530

Alexandre R. Carvalho, PhD
Researcher - Intelligent & Digital Systems

Research, Development and Innovation

M. +351 910 525 390
SkypeName: arcarvalho
www.isqgroup.com


cid:image004.png@01D4BCA9.CAC94530  cid:image005.png@01D4BCA9.CAC94530

 

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Eibe Frank
Sent: 25 de julho de 2019 06:13
To: Weka machine learning workbench list. <[hidden email]>
Subject: Re: [Wekalist] Generalization problem with weka randomforest regressor

 

What about other (tree-based) regressors such as REPTree? Do they exhibit a similar behaviour? What about LinearRegression?

 

Cheers,

Eibe

 

From: [hidden email]
Sent: Wednesday, 24 July 2019 11:01 PM
To: [hidden email]
Subject: [Wekalist] Generalization problem with weka randomforest regressor

 

Dear all,

 

We are experiencing a strange result with weka randomforest regressor.

 

With a simple dataset for regression, using the weka randomforest regressor with CV we have a nice result of R2=92%

And with a 80% partition test the result is quite similar R2=90%, as expected.

 

The problem appears with an external dataset, with new data, using the previous model, the result is R2=-0.01%

 

I cannot understand this generalization failure. The new data is similar to the training dataset.

 

May I ask if you have any idea why is this happening?

 

Thank you for your time. Really appreciated.

Alexandre R. Carvalho

 

O conteúdo desta mensagem de correio eletrónico e seus anexos podem ser confidenciais e de uso reservado. Se não for o destinatário previsto desta mensagem, queira por favor elimina-la de imediato, não a reenvie a terceiros, nem faça qualquer uso da informação nela contida. Notifique igualmente o remetente que recebeu esta mensagem por engano. As mensagens de e-mail podem conter vírus ou outros defeitos, podem não ser reproduzidas fielmente em outros sistemas, ou podem ser intercetadas, excluídas ou interferidas sem o conhecimento do remetente ou do destinatário. O ISQ não assume nenhuma responsabilidade em relação a qualquer uma destas ocorrências.

The contents of this email message and its attachments may be confidential and reserved use. If you are not the intended recipient of this message, please delete it immediately, do not resubmit to third parties or make any use of the information contained therein. Also notify the sender that you received this message by mistake.
E-mail messages may contain computer viruses or other defects, may not be accurately replicated on other systems, or may be intercepted, deleted or interfered without the knowledge of the sender or intended recipient. The ISQ assumes no responsibility for any of these occurrences.

 

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html