Apply Log10 to dependent variable

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Apply Log10 to dependent variable

asadbtk
Hi
I have a dataset which has a dependent variable having continuous data. Is it a good idea to apply log10 before we train the data. When I apply it, I get the rmse value as 0.2 while without it, I get the rmse value of 16.5. It means a lot of difference. My question is in which situations we need to apply it? 

Best regards 

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Apply Log10 to dependent variable

Peter Reutemann
> I have a dataset which has a dependent variable having continuous data. Is it a good idea to apply log10 before we train the data. When I apply it, I get the rmse value as 0.2 while without it, I get the rmse value of 16.5. It means a lot of difference. My question is in which situations we need to apply it?

You do realize that scaling the class values will result in scaling
the RMSE (https://en.wikipedia.org/wiki/Root-mean-square_deviation),
since it is calculated from the different between actual and predicted
class value?
For example, if I divide my numeric class by 1000, my RMSE will be
1000 times smaller. But the model is still the same, it just predicts
smaller numbers...

Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 858-5174
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Apply Log10 to dependent variable

Eibe Frank-2
Administrator


> On 3/12/2019, at 1:31 PM, Peter Reutemann <[hidden email]> wrote:
>
>> I have a dataset which has a dependent variable having continuous data. Is it a good idea to apply log10 before we train the data. When I apply it, I get the rmse value as 0.2 while without it, I get the rmse value of 16.5. It means a lot of difference. My question is in which situations we need to apply it?
>
> You do realize that scaling the class values will result in scaling
> the RMSE (https://en.wikipedia.org/wiki/Root-mean-square_deviation),
> since it is calculated from the different between actual and predicted
> class value?
> For example, if I divide my numeric class by 1000, my RMSE will be
> 1000 times smaller. But the model is still the same, it just predicts
> smaller numbers...

Yes, what is needed is this:

  https://github.com/waikato-datamining/adams-base/blob/master/adams-weka/src/main/java/weka/classifiers/meta/LogTargetRegressor.java

It will also transform the independent variables though. Anyway, it should be easy enough to adapt the code and make a GroovyClassifier:

  https://waikato.github.io/weka-wiki/using_weka_from_groovy/#implementing-a-groovy-classifier

Cheers,
Eibe
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Apply Log10 to dependent variable

asadbtk
Thanks Peter and Eibe. 

It means if I use or not the log transformation, it will have no effect on the ultimate accuracy of the prediction model..?

On Tuesday, December 3, 2019, Eibe Frank <[hidden email]> wrote:


> On 3/12/2019, at 1:31 PM, Peter Reutemann <[hidden email]> wrote:
>
>> I have a dataset which has a dependent variable having continuous data. Is it a good idea to apply log10 before we train the data. When I apply it, I get the rmse value as 0.2 while without it, I get the rmse value of 16.5. It means a lot of difference. My question is in which situations we need to apply it?
>
> You do realize that scaling the class values will result in scaling
> the RMSE (https://en.wikipedia.org/wiki/Root-mean-square_deviation),
> since it is calculated from the different between actual and predicted
> class value?
> For example, if I divide my numeric class by 1000, my RMSE will be
> 1000 times smaller. But the model is still the same, it just predicts
> smaller numbers...

Yes, what is needed is this:

  https://github.com/waikato-datamining/adams-base/blob/master/adams-weka/src/main/java/weka/classifiers/meta/LogTargetRegressor.java

It will also transform the independent variables though. Anyway, it should be easy enough to adapt the code and make a GroovyClassifier:

  https://waikato.github.io/weka-wiki/using_weka_from_groovy/#implementing-a-groovy-classifier

Cheers,
Eibe
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html