metric for regression

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

metric for regression

Desai Ankit
Hi all,

Apart from looking in to Root Mean Square Error (RMSE) what could be other metric to optimise when the class distribution is skewed? 

Please help.

--
Ankit Desai

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: metric for regression

George Dombi

Hi Ankit,

You could go Old School and do a trimmed mean until the new mean = Median.

To trim the data, rank order the data and drop the highest and lowest in pairs.

This rapidly shaves off the skewed end. A common trimmed mean is 5% off the top and a matching 5% off the bottom.

Compare the median to the trimmed mean; when the two are equal or within 10%, the assumption of a normal distribution is close enough to do ANOVA or t-test statistics.

Bye for now,

George


On 01/31/2017 03:27 AM, Desai Ankit wrote:
Hi all,

Apart from looking in to Root Mean Square Error (RMSE) what could be other metric to optimise when the class distribution is skewed? 

Please help.

--
Ankit Desai


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: metric for regression

Desai Ankit
I got the idea but it will be really helpful if you can help out with a small toy example. 

Thanks. 

On 31 January 2017 at 21:08, George Dombi <[hidden email]> wrote:

Hi Ankit,

You could go Old School and do a trimmed mean until the new mean = Median.

To trim the data, rank order the data and drop the highest and lowest in pairs.

This rapidly shaves off the skewed end. A common trimmed mean is 5% off the top and a matching 5% off the bottom.

Compare the median to the trimmed mean; when the two are equal or within 10%, the assumption of a normal distribution is close enough to do ANOVA or t-test statistics.

Bye for now,

George


On 01/31/2017 03:27 AM, Desai Ankit wrote:
Hi all,

Apart from looking in to Root Mean Square Error (RMSE) what could be other metric to optimise when the class distribution is skewed? 

Please help.

--
Ankit Desai


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html




--
Ankit Desai

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: metric for regression

George Dombi

Hi Ankit,

My suggestion was to use the RMSE measure but to try to normalize the data first.

So if your data was (1,2,3,4,5,6); Mean = 3.5, median = 3.5.  This is normal data, use as is.

If your other data was (1,2,3,4,56); Mean = 13.2, median = 3. This is skewed data. Cut off the top and the bottom

(2,3,4): Mean = 3, median = 3.

If you have some other situation, let me know.

Bye for now,

George


On 02/01/2017 01:10 AM, Desai Ankit wrote:
I got the idea but it will be really helpful if you can help out with a small toy example. 

Thanks. 

On 31 January 2017 at 21:08, George Dombi <[hidden email]> wrote:

Hi Ankit,

You could go Old School and do a trimmed mean until the new mean = Median.

To trim the data, rank order the data and drop the highest and lowest in pairs.

This rapidly shaves off the skewed end. A common trimmed mean is 5% off the top and a matching 5% off the bottom.

Compare the median to the trimmed mean; when the two are equal or within 10%, the assumption of a normal distribution is close enough to do ANOVA or t-test statistics.

Bye for now,

George


On 01/31/2017 03:27 AM, Desai Ankit wrote:
Hi all,

Apart from looking in to Root Mean Square Error (RMSE) what could be other metric to optimise when the class distribution is skewed? 

Please help.

--
Ankit Desai


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
_______________________________________________ Wekalist mailing list Send posts to: [hidden email] List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
--
Ankit Desai
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: metric for regression

Desai Ankit
George thanks for the example. I think there is little misunderstanding here, From your answer I understand that, you suggested how to improve on RMSE. I really appreciate that. On the other end, I need to know the alternative to RMSE. Any other measure of performance for regression problems?

Thanks.

On 2 February 2017 at 03:01, George Dombi <[hidden email]> wrote:

Hi Ankit,

My suggestion was to use the RMSE measure but to try to normalize the data first.

So if your data was (1,2,3,4,5,6); Mean = 3.5, median = 3.5.  This is normal data, use as is.

If your other data was (1,2,3,4,56); Mean = 13.2, median = 3. This is skewed data. Cut off the top and the bottom

(2,3,4): Mean = 3, median = 3.

If you have some other situation, let me know.

Bye for now,

George


On 02/01/2017 01:10 AM, Desai Ankit wrote:
I got the idea but it will be really helpful if you can help out with a small toy example. 

Thanks. 

On 31 January 2017 at 21:08, George Dombi <[hidden email]> wrote:

Hi Ankit,

You could go Old School and do a trimmed mean until the new mean = Median.

To trim the data, rank order the data and drop the highest and lowest in pairs.

This rapidly shaves off the skewed end. A common trimmed mean is 5% off the top and a matching 5% off the bottom.

Compare the median to the trimmed mean; when the two are equal or within 10%, the assumption of a normal distribution is close enough to do ANOVA or t-test statistics.

Bye for now,

George


On 01/31/2017 03:27 AM, Desai Ankit wrote:
Hi all,

Apart from looking in to Root Mean Square Error (RMSE) what could be other metric to optimise when the class distribution is skewed? 

Please help.

--
Ankit Desai


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
_______________________________________________ Wekalist mailing list Send posts to: [hidden email] List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
--
Ankit Desai
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html




--
Ankit Desai

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: metric for regression

George Dombi

Hi Ankit,

Sorry, for the confusion.

I really don't know of another measure unless you want to go to some thing like the Akaike information criterion (AIC) https://en.wikipedia.org/wiki/Akaike_information_criterion.  This is a way of measuring the effectiveness of adding another independent variable to a multi-variable regression.   For example if you have 3 independent variables to fit to your data set, you get a certain r value.  If you can do the regression with 4 independent variables, you should get an even better fit with a higher r value.  The more independent variables, the higher the r value, but there is a sort of breakpoint where the high variable equation begins to over fit the data. The AIC value is a type of correction on the goodness of fit  so that it penalizes adding more independent variables.  The AIC philosophy is something like "less is more".  One tries to maximize the AIC, while minimizing the number of independent variable.

Without knowing more about your data set and your fitting equations, I'm tapped out.

Bye for now,

George

On 02/02/2017 05:13 AM, Desai Ankit wrote:
George thanks for the example. I think there is little misunderstanding here, From your answer I understand that, you suggested how to improve on RMSE. I really appreciate that. On the other end, I need to know the alternative to RMSE. Any other measure of performance for regression problems?

Thanks.

On 2 February 2017 at 03:01, George Dombi <[hidden email]> wrote:

Hi Ankit,

My suggestion was to use the RMSE measure but to try to normalize the data first.

So if your data was (1,2,3,4,5,6); Mean = 3.5, median = 3.5.  This is normal data, use as is.

If your other data was (1,2,3,4,56); Mean = 13.2, median = 3. This is skewed data. Cut off the top and the bottom

(2,3,4): Mean = 3, median = 3.

If you have some other situation, let me know.

Bye for now,

George


On 02/01/2017 01:10 AM, Desai Ankit wrote:
I got the idea but it will be really helpful if you can help out with a small toy example. 

Thanks. 

On 31 January 2017 at 21:08, George Dombi <[hidden email]> wrote:

Hi Ankit,

You could go Old School and do a trimmed mean until the new mean = Median.

To trim the data, rank order the data and drop the highest and lowest in pairs.

This rapidly shaves off the skewed end. A common trimmed mean is 5% off the top and a matching 5% off the bottom.

Compare the median to the trimmed mean; when the two are equal or within 10%, the assumption of a normal distribution is close enough to do ANOVA or t-test statistics.

Bye for now,

George


On 01/31/2017 03:27 AM, Desai Ankit wrote:
Hi all,

Apart from looking in to Root Mean Square Error (RMSE) what could be other metric to optimise when the class distribution is skewed? 

Please help.

--
Ankit Desai


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
_______________________________________________ Wekalist mailing list Send posts to: [hidden email] List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
--
Ankit Desai
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
_______________________________________________ Wekalist mailing list Send posts to: [hidden email] List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
--
Ankit Desai
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: metric for regression

Eibe Frank-2
Administrator
In reply to this post by Desai Ankit
A few additional measures for regression are in these packages:

  http://weka.sourceforge.net/packageMetaData/percentageErrorMetrics/index.html (implements RMSPE and MAPE)
  http://weka.sourceforge.net/packageMetaData/logarithmicErrorMetrics/index.html (implements RMSLE and MALE)

Some information on how MAPE and RMSLE are defined is here:

  https://en.wikipedia.org/wiki/Mean_absolute_percentage_error
  https://www.kaggle.com/wiki/RootMeanSquaredLogarithmicError

RMSPE and MALE are defined analogously.

Note that I obviously don’t know if these measures are suitable for your application.

Cheers,
Eibe

> On 2/02/2017, at 11:13 PM, Desai Ankit <[hidden email]> wrote:
>
> George thanks for the example. I think there is little misunderstanding here, From your answer I understand that, you suggested how to improve on RMSE. I really appreciate that. On the other end, I need to know the alternative to RMSE. Any other measure of performance for regression problems?
>
> Thanks.
>
> On 2 February 2017 at 03:01, George Dombi <[hidden email]> wrote:
> Hi Ankit,
>
> My suggestion was to use the RMSE measure but to try to normalize the data first.
>
> So if your data was (1,2,3,4,5,6); Mean = 3.5, median = 3.5.  This is normal data, use as is.
>
> If your other data was (1,2,3,4,56); Mean = 13.2, median = 3. This is skewed data. Cut off the top and the bottom
>
> (2,3,4): Mean = 3, median = 3.
>
> If you have some other situation, let me know.
>
> Bye for now,
>
> George
>
>
> On 02/01/2017 01:10 AM, Desai Ankit wrote:
>> I got the idea but it will be really helpful if you can help out with a small toy example.
>>
>> Thanks.
>>
>> On 31 January 2017 at 21:08, George Dombi <[hidden email]> wrote:
>> Hi Ankit,
>>
>> You could go Old School and do a trimmed mean until the new mean = Median.
>>
>> To trim the data, rank order the data and drop the highest and lowest in pairs.
>>
>> This rapidly shaves off the skewed end. A common trimmed mean is 5% off the top and a matching 5% off the bottom.
>>
>> Compare the median to the trimmed mean; when the two are equal or within 10%, the assumption of a normal distribution is close enough to do ANOVA or t-test statistics.
>>
>> Bye for now,
>>
>> George
>>
>> On 01/31/2017 03:27 AM, Desai Ankit wrote:
>>> Hi all,
>>>
>>> Apart from looking in to Root Mean Square Error (RMSE) what could be other metric to optimise when the class distribution is skewed?
>>>
>>> Please help.
>>>
>>> --
>>> Ankit Desai
>>> desaiankitb.tk
>>>
>>>
>>> ______________________________
>>> _________________
>>> Wekalist mailing list
>>> Send posts to:
>>> [hidden email]
>>>
>>> List info and subscription status:
>>> https://list.waikato.ac.nz/mailman/listinfo/wekalist
>>>
>>> List etiquette:
>>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>> _______________________________________________ Wekalist mailing list Send posts to: [hidden email] List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>> --
>> Ankit Desai
>> desaiankitb.tk
>>
>> ______________________________
>> _________________
>> Wekalist mailing list
>> Send posts to:
>> [hidden email]
>>
>> List info and subscription status:
>> https://list.waikato.ac.nz/mailman/listinfo/wekalist
>>
>> List etiquette:
>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
>
>
> --
> Ankit Desai
> desaiankitb.tk
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: metric for regression

Desai Ankit
Thanks guys. I figured out a way by using RMSE how i can be sure that It does not mislead the outcome in terms of metric. It is a data specific way. 

Thanks guys. It all helped. 

On 3 February 2017 at 08:32, Eibe Frank <[hidden email]> wrote:
A few additional measures for regression are in these packages:

  http://weka.sourceforge.net/packageMetaData/percentageErrorMetrics/index.html (implements RMSPE and MAPE)
  http://weka.sourceforge.net/packageMetaData/logarithmicErrorMetrics/index.html (implements RMSLE and MALE)

Some information on how MAPE and RMSLE are defined is here:

  https://en.wikipedia.org/wiki/Mean_absolute_percentage_error
  https://www.kaggle.com/wiki/RootMeanSquaredLogarithmicError

RMSPE and MALE are defined analogously.

Note that I obviously don’t know if these measures are suitable for your application.

Cheers,
Eibe

> On 2/02/2017, at 11:13 PM, Desai Ankit <[hidden email]> wrote:
>
> George thanks for the example. I think there is little misunderstanding here, From your answer I understand that, you suggested how to improve on RMSE. I really appreciate that. On the other end, I need to know the alternative to RMSE. Any other measure of performance for regression problems?
>
> Thanks.
>
> On 2 February 2017 at 03:01, George Dombi <[hidden email]> wrote:
> Hi Ankit,
>
> My suggestion was to use the RMSE measure but to try to normalize the data first.
>
> So if your data was (1,2,3,4,5,6); Mean = 3.5, median = 3.5.  This is normal data, use as is.
>
> If your other data was (1,2,3,4,56); Mean = 13.2, median = 3. This is skewed data. Cut off the top and the bottom
>
> (2,3,4): Mean = 3, median = 3.
>
> If you have some other situation, let me know.
>
> Bye for now,
>
> George
>
>
> On 02/01/2017 01:10 AM, Desai Ankit wrote:
>> I got the idea but it will be really helpful if you can help out with a small toy example.
>>
>> Thanks.
>>
>> On 31 January 2017 at 21:08, George Dombi <[hidden email]> wrote:
>> Hi Ankit,
>>
>> You could go Old School and do a trimmed mean until the new mean = Median.
>>
>> To trim the data, rank order the data and drop the highest and lowest in pairs.
>>
>> This rapidly shaves off the skewed end. A common trimmed mean is 5% off the top and a matching 5% off the bottom.
>>
>> Compare the median to the trimmed mean; when the two are equal or within 10%, the assumption of a normal distribution is close enough to do ANOVA or t-test statistics.
>>
>> Bye for now,
>>
>> George
>>
>> On 01/31/2017 03:27 AM, Desai Ankit wrote:
>>> Hi all,
>>>
>>> Apart from looking in to Root Mean Square Error (RMSE) what could be other metric to optimise when the class distribution is skewed?
>>>
>>> Please help.
>>>
>>> --
>>> Ankit Desai
>>> desaiankitb.tk
>>>
>>>
>>> ______________________________
>>> _________________
>>> Wekalist mailing list
>>> Send posts to:
>>> [hidden email]
>>>
>>> List info and subscription status:
>>> https://list.waikato.ac.nz/mailman/listinfo/wekalist
>>>
>>> List etiquette:
>>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>> _______________________________________________ Wekalist mailing list Send posts to: [hidden email] List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>> --
>> Ankit Desai
>> desaiankitb.tk
>>
>> ______________________________
>> _________________
>> Wekalist mailing list
>> Send posts to:
>> [hidden email]
>>
>> List info and subscription status:
>> https://list.waikato.ac.nz/mailman/listinfo/wekalist
>>
>> List etiquette:
>> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
>
>
> --
> Ankit Desai
> desaiankitb.tk
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



--
Ankit Desai

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html