error measures of nominal attributes

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

error measures of nominal attributes

P. Klaas-Welter

Hello!

Could someone please help me to understand the mean absolute error, root mean squared error,
relative absolute error and root relative squared error of nominal attributes?

I know that one can find this question several times in this mailing-list. But none of these
could really help me. Or does someone know where to find a comprehensive explanation?

As far as I understood (with help from what I read from Eibe Frank):
root relative squared error: Let Y be the root mean squared error that is computed for the
single class' prior probabilities (frequencies). These probabilities are estimated from the training data
with a simple Laplace estimator. Let X be the root mean squared
error that came from the prediction of the model. Then the ~ is 100 * X / Y.
So what is done with the mean value for numerical classes is done with estimated probabilities
for nominal classes. The same for the relative absolute error.

With best wishes, Petra
______________________________________________________________
Verschicken Sie romantische, coole und witzige Bilder per SMS!
Jetzt bei WEB.DE FreeMail: http://f.web.de/?mc=021193


_______________________________________________
Wekalist mailing list
[hidden email]
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
Reply | Threaded
Open this post in threaded view
|

Re: error measures of nominal attributes

Eibe Frank

On Jun 30, 2005, at 12:50 AM, P. Klaas-Welter wrote:

> Could someone please help me to understand the mean absolute error,
> root mean squared error,
> relative absolute error and root relative squared error of nominal
> attributes?
>
> I know that one can find this question several times in this
> mailing-list. But none of these
> could really help me. Or does someone know where to find a
> comprehensive explanation?
>
> As far as I understood (with help from what I read from Eibe Frank):
> root relative squared error: Let Y be the root mean squared error that
> is computed for the
> single class' prior probabilities (frequencies). These probabilities
> are estimated from the training data
> with a simple Laplace estimator. Let X be the root mean squared
> error that came from the prediction of the model. Then the ~ is 100 *
> X / Y.
> So what is done with the mean value for numerical classes is done with
> estimated probabilities
> for nominal classes. The same for the relative absolute error.

Yes, that's correct. Y is the error obtained from the probability
estimates generated by ZeroR (which just estimates the prior
probabilities).

The squared error for a particular instance is given by the "quadratic
loss" function mentioned in our book (where we talk about evaluating
probability estimates). It's the sum of the squared differences between
the predicted class probabilities for a particular instance and the
observed class probabilities for that instance (which are either 0 or
1). The absolute error is computed in the same way by taking the
absolute value of each difference instead of the square.

Cheers,
Eibe


_______________________________________________
Wekalist mailing list
[hidden email]
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
Reply | Threaded
Open this post in threaded view
|

Re: error measures of nominal attributes

P. Klaas-Welter
In reply to this post by P. Klaas-Welter

Dear Eibe,

thank you very much! This was very helpful (and now I also found the right point in the book ;-)

Just to be sure and because the error measures are so important I like to describe the other error values and I like to please you to check wether I'm right:

Let the nominal attribute have m different values. Let the vector P contain all probabilities pi, that the nominal attribute has the value i. Those probabilities come from the frequencies of each value i. Let the vector A be the result from the model for the instance j. When k is the value that the model computed for instance j then all entries in A are zero but ajk, which is 1.

To compute the mean absolute error you have to compute the absolute difference for each instance of vector P and vector Aj. This is done component-wise and is then summed up: dj = ¡Æi=1m | pi ¨C aji | . These differences dj (for the single instances) are then summed up over all instances and then divided by the number of instances.

 

And for the root mean squared error in dj you don¡¯t take the absolute value but the square. And before dividing through the number of instances you take the square root.

Thank you very much! And with best regards, Petra

 


Eibe Frank schrieb am 29.06.05 23:35:59:


>
>
> On Jun 30, 2005, at 12:50 AM, P. Klaas-Welter wrote:
>
> > Could someone please help me to understand the mean absolute error,
> > root mean squared error,
> > relative absolute error and root relative squared error of nominal
> > attributes?
> >
> > I know that one can find this question several times in this
> > mailing-list. But none of these
> > could really help me. Or does someone know where to find a
> > comprehensive explanation?
> >
> > As far as I understood (with help from what I read from Eibe Frank):
> > root relative squared error: Let Y be the root mean squared error that
> > is computed for the
> > single class prior probabilities (frequencies). These probabilities
> > are estimated from the training data
> > with a simple Laplace estimator. Let X be the root mean squared
> > error that came from the prediction of the model. Then the ~ is 100 *
> > X / Y.
> > So what is done with the mean value for numerical classes is done with
> > estimated probabilities
> > for nominal classes. The same for the relative absolute error.
>
> Yes, thats correct. Y is the error obtained from the probability
> estimates generated by ZeroR (which just estimates the prior
> probabilities).
>
> The squared error for a particular instance is given by the "quadratic
> loss" function mentioned in our book (where we talk about evaluating
> probability estimates). Its the sum of the squared differences between
> the predicted class probabilities for a particular instance and the
> observed class probabilities for that instance (which are either 0 or
> 1). The absolute error is computed in the same way by taking the
> absolute value of each difference instead of the square.
>
> Cheers,
> Eibe
>



Verschicken Sie romantische, coole und witzige Bilder per SMS!   
Jetzt bei WEB.DE FreeMail: http://f.web.de/?mc=021193  

_______________________________________________
Wekalist mailing list
[hidden email]
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
Reply | Threaded
Open this post in threaded view
|

Re: error measures of nominal attributes

P. Klaas-Welter
In reply to this post by P. Klaas-Welter

I just noticed that the formula for the sum is not readable, therefore:

dj = ¡Æi=1m | pi ¨C aji |  means: The sum from i=1 to m over | pi - aji |


"P. Klaas-Welter" <[hidden email]> schrieb am 30.06.05 12:06:43:

Dear Eibe,

thank you very much! This was very helpful (and now I also found the right point in the book ;-)

Just to be sure and because the error measures are so important I like to describe the other error values and I like to please you to check wether I'm right:

Let the nominal attribute have m different values. Let the vector P contain all probabilities pi, that the nominal attribute has the value i. Those probabilities come from the frequencies of each value i. Let the vector A be the result from the model for the instance j. When k is the value that the model computed for instance j then all entries in A are zero but ajk, which is 1.

To compute the mean absolute error you have to compute the absolute difference for each instance of vector P and vector Aj. This is done component-wise and is then summed up: dj = ¡Æi=1m | pi ¨C aji | . These differences dj (for the single instances) are then summed up over all instances and then divided by the number of instances.

 

And for the root mean squared error in dj you don¡¯t take the absolute value but the square. And before dividing through the number of instances you take the square root.

Thank you very much! And with best regards, Petra

 


Eibe Frank schrieb am 29.06.05 23:35:59:


>
>
> On Jun 30, 2005, at 12:50 AM, P. Klaas-Welter wrote:
>
> > Could someone please help me to understand the mean absolute error,
> > root mean squared error,
> > relative absolute error and root relative squared error of nominal
> > attributes?
> >
> > I know that one can find this question several times in this
> > mailing-list. But none of these
> > could really help me. Or does someone know where to find a
> > comprehensive explanation?
> >
> > As far as I understood (with help from what I read from Eibe Frank):
> > root relative squared error: Let Y be the root mean squared error that
> > is computed for the
> > single class prior probabilities (frequencies). These probabilities
> > are estimated from the training data
> > with a simple Laplace estimator. Let X be the root mean squared
> > error that came from the prediction of the model. Then the ~ is 100 *
> > X / Y.
> > So what is done with the mean value for numerical classes is done with
> > estimated probabilities
> > for nominal classes. The same for the relative absolute error.
>
> Yes, thats correct. Y is the error obtained from the probability
> estimates generated by ZeroR (which just estimates the prior
> probabilities).
>
> The squared error for a particular instance is given by the "quadratic
> loss" function mentioned in our book (where we talk about evaluating
> probability estimates). Its the sum of the squared differences between
> the predicted class probabilities for a particular instance and the
> observed class probabilities for that instance (which are either 0 or
> 1). The absolute error is computed in the same way by taking the
> absolute value of each difference instead of the square.
>
> Cheers,
> Eibe
>



Verschicken Sie romantische, coole und witzige Bilder per SMS!   
Jetzt bei WEB.DE FreeMail: http://f.web.de/?mc=021193  

_______________________________________________Wekalist mailing [hidden email]://list.scms.waikato.ac.nz/mailman/listinfo/wekalist



Verschicken Sie romantische, coole und witzige Bilder per SMS!   
Jetzt bei WEB.DE FreeMail: http://f.web.de/?mc=021193  

_______________________________________________
Wekalist mailing list
[hidden email]
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
Reply | Threaded
Open this post in threaded view
|

Re: error measures of nominal attributes

Eibe Frank
The formula for the sum is correct but it seems like you are  
misunderstanding how the two vectors are computed. One of the vectors  
contains the predicted class probabilities that are output by the model  
for a particular instance, the other vector contains the observed class  
probabilities for that particular instance. The latter(!) vector has  
one element that is 1 (the one for the actual class of the instance)  
and all other elements are 0.

Cheers,
Eibe

On Jul 1, 2005, at 1:47 AM, P. Klaas-Welter wrote:

> I just noticed that the formula for the sum is not readable, therefore:
>
> dj = ¡Æi=1m | pi ¨C aji |  means: The sum from i=1 to m over | pi -  
> aji |
>
>
> "P. Klaas-Welter" <[hidden email]> schrieb am 30.06.05 12:06:43:
>
>
> Dear Eibe,
>
> thank you very much! This was very helpful (and now I also found the  
> right point in the book ;-)
>
> Just to be sure and because the error measures are so important I like  
> to describe the other error values and I like to please you to check  
> wether I'm right:
> Let the nominal attribute have m different values. Let the vector P  
> contain all probabilities pi, that the nominal attribute has the value  
> i. Those probabilities come from the frequencies of each value i. Let  
> the vector A be the result from the model for the instance j. When k  
> is the value that the model computed for instance j then all entries  
> in A are zero but ajk, which is 1.
> To compute the mean absolute error you have to compute the absolute  
> difference for each instance of vector P and vector Aj. This is done  
> component-wise and is then summed up: dj = ¡Æi=1m | pi ¨C aji | .  
> These differences dj (for the single instances) are then summed up  
> over all instances and then divided by the number of instances.
>  
> And for the root mean squared error in dj you don¡¯t take the absolute  
> value but the square. And before dividing through the number of  
> instances you take the square root.
>
> Thank you very much! And with best regards, Petra
>
>  
>
>
> Eibe Frank schrieb am 29.06.05 23:35:59:
> >
> >
> > On Jun 30, 2005, at 12:50 AM, P. Klaas-Welter wrote:
> >
> > > Could someone please help me to understand the mean absolute error,
> > > root mean squared error,
> > > relative absolute error and root relative squared error of nominal
> > > attributes?
> > >
> > > I know that one can find this question several times in this
> > > mailing-list. But none of these
> > > could really help me. Or does someone know where to find a
> > > comprehensive explanation?
> > >
> > > As far as I understood (with help from what I read from Eibe  
> Frank):
> > > root relative squared error: Let Y be the root mean squared error  
> that
> > > is computed for the
> > > single class prior probabilities (frequencies). These probabilities
> > > are estimated from the training data
> > > with a simple Laplace estimator. Let X be the root mean squared
> > > error that came from the prediction of the model. Then the ~ is  
> 100 *
> > > X / Y.
> > > So what is done with the mean value for numerical classes is done  
> with
> > > estimated probabilities
> > > for nominal classes. The same for the relative absolute error.
> >
> > Yes, thats correct. Y is the error obtained from the probability
> > estimates generated by ZeroR (which just estimates the prior
> > probabilities).
> >
> > The squared error for a particular instance is given by the  
> "quadratic
> > loss" function mentioned in our book (where we talk about evaluating
> > probability estimates). Its the sum of the squared differences  
> between
> > the predicted class probabilities for a particular instance and the
> > observed class probabilities for that instance (which are either 0 or
> > 1). The absolute error is computed in the same way by taking the
> > absolute value of each difference instead of the square.
> >
> > Cheers,
> > Eibe
> >
>
>
>
> <p.gif>Verschicken Sie romantische, coole und witzige Bilder per  
> SMS!   
> Jetzt bei WEB.DE FreeMail: http://f.web.de/?mc=021193  
> _______________________________________________Wekalist mailing  
> [hidden email]://list.scms.waikato.ac.nz/
> mailman/listinfo/wekalist
>
>
>
> <p.gif>Verschicken Sie romantische, coole und witzige Bilder per  
> SMS!   
> Jetzt bei WEB.DE FreeMail: http://f.web.de/?mc=021193  
> _______________________________________________
> Wekalist mailing list
> [hidden email]
> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist


_______________________________________________
Wekalist mailing list
[hidden email]
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
Reply | Threaded
Open this post in threaded view
|

Re: error measures of nominal attributes

P. Klaas-Welter
In reply to this post by P. Klaas-Welter
Dear Eibe, dear Weka-users,
 thank you a lot! You were right: I was mistaken with the vectors. But finally I’ve got it J
 So how are the values of the error measures to interpret? Here are some assumptions of what I understood (from your book, this mailing list etc.):
*) If the root mean squared error is much different from the absolute value it could point to that the data has big and/or many outliers.
*) The relative absolute error compares the actual result to the result from a “simple” calculation (ZeroR for nominals).
If this value is >100% the simple calculation does better.
*) If the root relative squared error is >100% and the relative absolute error <100% it points to that the actual algorithm has more problems
with outliers than the “simple” calculation (ZeroR for nominals).
Am I right with these assumptions? 
But there are some points that I not clear to me. Please have a look on my example:

Decision Table: Options: -R -I

Number of training instances: 17177

Number of Rules : 5049

Non matches covered by IB1.

Best first search for feature set,

terminated after 5 non improving subsets.

Evaluation (for feature selection): CV (leave one out)

Feature set: 1,2,3,4,5

 

Correctly Classified Instances        5531               32.2019 %

Incorrectly Classified Instances     11645               67.7981 %

Kappa statistic                          0.1378

Mean absolute error                      0.0073

Root mean squared error                  0.0663

Relative absolute error                 88.7076 %

Root relative squared error            103.7543 %
Total Number of Instances            17176 

About each third instance was correctly classified. But the Kappa statistic is quite bad (possible best value: 1, worst value: 0).
On the other hand the mean absolute error is quite good (possible best value: 0, worst value: 1). How is this to interpret?

With best wishes, Petra


Eibe Frank <[hidden email]> schrieb am 30.06.05 23:10:27:


The formula for the sum is correct but it seems like you are
misunderstanding how the two vectors are computed. One of the vectors
contains the predicted class probabilities that are output by the model
for a particular instance, the other vector contains the observed class
probabilities for that particular instance. The latter(!) vector has
one element that is 1 (the one for the actual class of the instance)
and all other elements are 0.

Cheers,
Eibe

On Jul 1, 2005, at 1:47 AM, P. Klaas-Welter wrote:

> I just noticed that the formula for the sum is not readable, therefore:
>
> dj = ¡Æi=1m | pi ¨C aji |  means: The sum from i=1 to m over | pi -
> aji |
>
>
> "P. Klaas-Welter" <[hidden email]> schrieb am 30.06.05 12:06:43:
>
>
> Dear Eibe,
>
> thank you very much! This was very helpful (and now I also found the
> right point in the book ;-)
>
> Just to be sure and because the error measures are so important I like
> to describe the other error values and I like to please you to check
> wether I'm right:
> Let the nominal attribute have m different values. Let the vector P
> contain all probabilities pi, that the nominal attribute has the value
> i. Those probabilities come from the frequencies of each value i. Let
> the vector A be the result from the model for the instance j. When k
> is the value that the model computed for instance j then all entries
> in A are zero but ajk, which is 1.
> To compute the mean absolute error you have to compute the absolute
> difference for each instance of vector P and vector Aj. This is done
> component-wise and is then summed up: dj = ¡Æi=1m | pi ¨C aji | .
> These differences dj (for the single instances) are then summed up
> over all instances and then divided by the number of instances.
>  
> And for the root mean squared error in dj you don¡¯t take the absolute
> value but the square. And before dividing through the number of
> instances you take the square root.
>
> Thank you very much! And with best regards, Petra
>
>  
>
>
> Eibe Frank schrieb am 29.06.05 23:35:59:
> >
> >
> > On Jun 30, 2005, at 12:50 AM, P. Klaas-Welter wrote:
> >
> > > Could someone please help me to understand the mean absolute error,
> > > root mean squared error,
> > > relative absolute error and root relative squared error of nominal
> > > attributes?
> > >
> > > I know that one can find this question several times in this
> > > mailing-list. But none of these
> > > could really help me. Or does someone know where to find a
> > > comprehensive explanation?
> > >
> > > As far as I understood (with help from what I read from Eibe
> Frank):
> > > root relative squared error: Let Y be the root mean squared error
> that
> > > is computed for the
> > > single class prior probabilities (frequencies). These probabilities
> > > are estimated from the training data
> > > with a simple Laplace estimator. Let X be the root mean squared
> > > error that came from the prediction of the model. Then the ~ is
> 100 *
> > > X / Y.
> > > So what is done with the mean value for numerical classes is done
> with
> > > estimated probabilities
> > > for nominal classes. The same for the relative absolute error.
> >
> > Yes, thats correct. Y is the error obtained from the probability
> > estimates generated by ZeroR (which just estimates the prior
> > probabilities).
> >
> > The squared error for a particular instance is given by the
> "quadratic
> > loss" function mentioned in our book (where we talk about evaluating
> > probability estimates). Its the sum of the squared differences
> between
> > the predicted class probabilities for a particular instance and the
> > observed class probabilities for that instance (which are either 0 or
> > 1). The absolute error is computed in the same way by taking the
> > absolute value of each difference instead of the square.
> >
> > Cheers,
> > Eibe
> >
>



Mit der Gruppen-SMS von WEB.DE FreeMail können Sie eine SMS an alle    
Freunde gleichzeitig schicken: http://freemail.web.de/features/?mc=021179   
   
  

_______________________________________________
Wekalist mailing list
[hidden email]
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
Reply | Threaded
Open this post in threaded view
|

Re: error measures of nominal attributes

P. Klaas-Welter
In reply to this post by P. Klaas-Welter

(My last email was scrubbed because of HTML, that's why I post it again.)

Dear Eibe, dear Weka-users,

 thank you a lot! You were right: I was mistaken with the vectors. But finally I’ve got it J
 So how are the values of the error measures to interpret? Here are some assumptions of what I understood (from your book, this mailing list etc.):
*) If the root mean squared error is much different from the absolute value it could point to that the data has big and/or many outliers.
*) The relative absolute error compares the actual result to the result from a “simple” calculation (ZeroR for nominals). If this value is >100% the simple calculation does better.
*) If the root relative squared error is >100% and the relative absolute error <100% it points to that the actual algorithm has more problems with outliers than the “simple” calculation (ZeroR for nominals).

Am I right with these assumptions?

But there are some points that I not clear to me. Please have a look on my example:
Decision Table: Options: -R -I
Number of training instances: 17177
Number of Rules : 5049
Non matches covered by IB1.
Best first search for feature set,
terminated after 5 non improving subsets.
Evaluation (for feature selection): CV (leave one out)
Feature set: 1,2,3,4,5
Correctly Classified Instances        5531               32.2019 %
Incorrectly Classified Instances     11645               67.7981 %
Kappa statistic                          0.1378
Mean absolute error                      0.0073
Root mean squared error                  0.0663
Relative absolute error                 88.7076 %
Root relative squared error            103.7543 %
Total Number of Instances            17176  

About each third instance was correctly classified. But the Kappa statistic is quite bad (possible best value: 1, worst value: 0).
On the other hand the mean absolute error is quite good (possible best value: 0, worst value: 1). How is this to interpret?

With best wishes, Petra



Eibe Frank <[hidden email]> schrieb am 30.06.05 23:10:27:

The formula for the sum is correct but it seems like you are
misunderstanding how the two vectors are computed. One of the vectors
contains the predicted class probabilities that are output by the model
for a particular instance, the other vector contains the observed class
probabilities for that particular instance. The latter(!) vector has
one element that is 1 (the one for the actual class of the instance)
and all other elements are 0.

Cheers,
Eibe

On Jul 1, 2005, at 1:47 AM, P. Klaas-Welter wrote:

> I just noticed that the formula for the sum is not readable, therefore:
>
> dj = ¡Æi=1m | pi ¨C aji |  means: The sum from i=1 to m over | pi -
> aji |
>
>
> "P. Klaas-Welter" <[hidden email]> schrieb am 30.06.05 12:06:43:
>
>
> Dear Eibe,
>
> thank you very much! This was very helpful (and now I also found the
> right point in the book ;-)
>
> Just to be sure and because the error measures are so important I like
> to describe the other error values and I like to please you to check
> wether I'm right:
> Let the nominal attribute have m different values. Let the vector P
> contain all probabilities pi, that the nominal attribute has the value
> i. Those probabilities come from the frequencies of each value i. Let
> the vector A be the result from the model for the instance j. When k
> is the value that the model computed for instance j then all entries
> in A are zero but ajk, which is 1.
> To compute the mean absolute error you have to compute the absolute
> difference for each instance of vector P and vector Aj. This is done
> component-wise and is then summed up: dj = ¡Æi=1m | pi ¨C aji | .
> These differences dj (for the single instances) are then summed up
> over all instances and then divided by the number of instances.
>  
> And for the root mean squared error in dj you don¡¯t take the absolute
> value but the square. And before dividing through the number of
> instances you take the square root.
>
> Thank you very much! And with best regards, Petra
>
>  
>
>
> Eibe Frank schrieb am 29.06.05 23:35:59:
> >
> >
> > On Jun 30, 2005, at 12:50 AM, P. Klaas-Welter wrote:
> >
> > > Could someone please help me to understand the mean absolute error,
> > > root mean squared error,
> > > relative absolute error and root relative squared error of nominal
> > > attributes?
> > >
> > > I know that one can find this question several times in this
> > > mailing-list. But none of these
> > > could really help me. Or does someone know where to find a
> > > comprehensive explanation?
> > >
> > > As far as I understood (with help from what I read from Eibe
> Frank):
> > > root relative squared error: Let Y be the root mean squared error
> that
> > > is computed for the
> > > single class prior probabilities (frequencies). These probabilities
> > > are estimated from the training data
> > > with a simple Laplace estimator. Let X be the root mean squared
> > > error that came from the prediction of the model. Then the ~ is
> 100 *
> > > X / Y.
> > > So what is done with the mean value for numerical classes is done
> with
> > > estimated probabilities
> > > for nominal classes. The same for the relative absolute error.
> >
> > Yes, thats correct. Y is the error obtained from the probability
> > estimates generated by ZeroR (which just estimates the prior
> > probabilities).
> >
> > The squared error for a particular instance is given by the
> "quadratic
> > loss" function mentioned in our book (where we talk about evaluating
> > probability estimates). Its the sum of the squared differences
> between
> > the predicted class probabilities for a particular instance and the
> > observed class probabilities for that instance (which are either 0 or
> > 1). The absolute error is computed in the same way by taking the
> > absolute value of each difference instead of the square.
> >
> > Cheers,
> > Eibe
> >





_________________________________________________________________________
Mit der Gruppen-SMS von WEB.DE FreeMail können Sie eine SMS an alle
Freunde gleichzeitig schicken: http://freemail.web.de/features/?mc=021179




_______________________________________________
Wekalist mailing list
[hidden email]
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist