Kappa metric for multi-class classification?

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Kappa metric for multi-class classification?

mcbenly
Hi,
I am having difficulty choosing best performance measure for my multi-class
classification problem.
There are four classes in my dataset, and data is Imbalanced.

Personally I preferred using weighted f-measure and AUROC for binary
classification. But I guess I can't use AUROC for multi-class
classification. Not sure weighted f-measure alone would be good for
multi-class measurement.

I read in few research papers, that for multi-class problem, use F-measure
micro-macro averaging. Use micro if data is imbalanced.

But as far as I understand micro f-measure averaging is same as
classification accuracy...

I was wondering if I could use "classification accuracy + Kappa Statistics"
as my *main performance measure*? Will this be right combination?

OR any other suggestion you might have?


Thanks, Ben






--
Sent from: https://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Kappa metric for multi-class classification?

Eibe Frank-2
Administrator
Kappa is a version of classification accuracy (as given by the rate correct classification) that is rescaled by comparing to the accuracy of a random classifier.

Let C be the classification accuracy of the classifier you are evaluating. Let R be the classification accuracy of a classifier that assigns instances randomly to the different classes but ensures that it assigns the same number of instances to each class as C. Then kappa is (C - R) / (1 - R).

In other words, kappa will be zero when your classifier C does not improve on R. It will be 1 if your classifier is 100% accurate.

Ideally, you know the misclassification costs in your problem. Then, you can specify a cost matrix in WEKA to perform a cost-sensitive evaluation. For example, in the Classify tab of the Explorer, you can specify a cost matrix under “More options…”. If you don’t exact costs, then weighted AUROC is an option, which WEKA also outputs. Classification accuracy is appropriate when all types of classification errors have the same cost (which is often not the case in practical applications).

Cheers,
Eibe

> On 23/09/2019, at 8:55 AM, mcbenly <[hidden email]> wrote:
>
> Hi,
> I am having difficulty choosing best performance measure for my multi-class
> classification problem.
> There are four classes in my dataset, and data is Imbalanced.
>
> Personally I preferred using weighted f-measure and AUROC for binary
> classification. But I guess I can't use AUROC for multi-class
> classification. Not sure weighted f-measure alone would be good for
> multi-class measurement.
>
> I read in few research papers, that for multi-class problem, use F-measure
> micro-macro averaging. Use micro if data is imbalanced.
>
> But as far as I understand micro f-measure averaging is same as
> classification accuracy...
>
> I was wondering if I could use "classification accuracy + Kappa Statistics"
> as my *main performance measure*? Will this be right combination?
>
> OR any other suggestion you might have?
>
>
> Thanks, Ben
>
>
>
>
>
>
> --
> Sent from: https://weka.8497.n7.nabble.com/
> _______________________________________________
> Wekalist mailing list -- [hidden email]
> Send posts to: To unsubscribe send an email to [hidden email]
> To subscribe, unsubscribe, etc., visit
> https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Kappa metric for multi-class classification?

Prof. David MW Powers
In reply to this post by mcbenly

Date: Sun, 22 Sep 2019 13:55:00 -0700 (MST)
From: mcbenly <[hidden email]>
Subject: [Wekalist] Kappa metric for multi-class classification?

Hi, 
I am having difficulty choosing best performance measure for my multi-class
classification problem. 
There are four classes in my dataset, and data is Imbalanced. 

Personally I preferred using weighted f-measure and AUROC for binary
classification. But I guess I can't use AUROC for multi-class
classification. Not sure weighted f-measure alone would be good for
multi-class measurement. 

I read in few research papers, that for multi-class problem, use F-measure
micro-macro averaging. Use micro if data is imbalanced. 

But as far as I understand micro f-measure averaging is same as
classification accuracy...

I was wondering if I could use "classification accuracy + Kappa Statistics"
as my *main performance measure*? Will this be right combination? 

OR any other suggestion you might have? 


Thanks, Ben

In those circumstances F1 is not a good choice, and chance-corrected kappa measures are more appropriate, and can be directly applied to multiclass data.  You can also macroaverage weighting by the bias to a particular prediction (proportion of time that class label is predicted) - it is not appropriate to weight by the prevalence (proportion of the time the real class occurs). Accuracy is also easily biased and is misleading to the extent that bias doesn’t match prevalence.  To the extent you have a per class or per instance cost you can use that, but otherwise a chance correct measure is best.

The Cohen Kappa included in Weka is a reasonable but not a good choice (a chance-corrected version of Accuracy), as like F1 it is not good if prediction bias fails to match prevalence for each class. I include a link to a paper on this below.

What is appropriate is the multiclass form of Kappa called Informedness which is chance correct in the sense that it gives the probability of an informed decision (viz. not chance). Again I include links.

The binary form of this is Peirce(1884)’s I and Youden(1950)’s J and Flach(2003)’s deskewed WRAcc and what is known in Psyc as DeltaP'. It corresponds to the distance above the chance line in the ROC curve, viz. tpr-fpr, which is what is maximized when choosing the standard operating point in ROC.  It macroaverages over predictions as described above to estimate the multiclass form of Informedness (and the short ECAI and long JMLT papers show how the Bookmaker estimate recovers the underlying probability with which a Monte Carlo simulation makes and informed decision or guesses).

This is a hobbyhorse of mine… I originally modelled informedness in terms of gambling on your predictions (hence the multiclass measure is also known as Bookmaker, Bookmaker Informedness or Bookmaker Probability, and that makes it clear why you should weight classes by their bias - the appropriate weight across horses is how much you bet in on each horse. I have written extensively on this, and including providing Matlab scripts, an eXcel calculator and a version of Weka that provides it as an alternate evaluation measure (in Explorer and Experimenter as well as Adaboost, which turns it into Adabook). I include a selection below (but e.g. exclude ones about visualizations, including the relation to ROC and AUC - there’s also a paper about why you should never use F-score, and one that focuses on mutliclass visualizations - both available on arXiv). 



Informedness papers


2011  JMLT - Evaluation: from Precision, Recall and F-measure to ROC, Informedness, Markedness and Correlation



You also mentioned liking AUROC. It is important to understand what this actually measures! 

ROC AUC gives the probability that a positive prediction is ranked above a negative prediction, and represents a balance between finding a specific operating point (Certainty = (Informedness+1)/2 is then the area under a three point curve) and how much room there is for distributional variance (Consistency = AUC-Certainty - area between the multipoint curve or convex hull and the three point curve - as discussed in my ROC ConCert paper - I’ve added a link to this).

2012 ROC ConCert

dp
Prof. David M W Powers, Ph.D.   http://flinders.edu.au/people/David.Powers
                                                                   mail: [hidden email]

Professor of Computer Science & Cognitive Science, TON2.10
South Australia Research Director,  ARC ITRH Digital Enhanced Living Hub

College of Science and Engineering                              (Phone: 08-8201 3663)
Flinders University, Tonsley, South Australia 5042       (Fax: +61-8-8201 3626)
GPO Box 2100 Adelaide SA 5001                       (Mobile/Viber: 0414-824-307)
AUSTRALIA               



_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Kappa metric for multi-class classification?

Prof. David MW Powers
Sorry about the base64 dump from my Apple+Microsoft mailer - have forced to plain text...


> On 23 Sep 2019, at 1:20 pm, David Powers <[hidden email]> wrote:
>
>
>> Date: Sun, 22 Sep 2019 13:55:00 -0700 (MST)
>> From: mcbenly <[hidden email]>
>> Subject: [Wekalist] Kappa metric for multi-class classification?
>>
>> Hi,
>> I am having difficulty choosing best performance measure for my multi-class
>> classification problem.
>> There are four classes in my dataset, and data is Imbalanced.
>>
>> Personally I preferred using weighted f-measure and AUROC for binary
>> classification. But I guess I can't use AUROC for multi-class
>> classification. Not sure weighted f-measure alone would be good for
>> multi-class measurement.
>>
>> I read in few research papers, that for multi-class problem, use F-measure
>> micro-macro averaging. Use micro if data is imbalanced.
>>
>> But as far as I understand micro f-measure averaging is same as
>> classification accuracy...
>>
>> I was wondering if I could use "classification accuracy + Kappa Statistics"
>> as my *main performance measure*? Will this be right combination?
>>
>> OR any other suggestion you might have?
>>
>>
>> Thanks, Ben
>
> In those circumstances F1 is not a good choice, and chance-corrected kappa measures are more appropriate, and can be directly applied to multiclass data.  You can also macroaverage weighting by the bias to a particular prediction (proportion of time that class label is predicted) - it is not appropriate to weight by the prevalence (proportion of the time the real class occurs). Accuracy is also easily biased and is misleading to the extent that bias doesn’t match prevalence.  To the extent you have a per class or per instance cost you can use that, but otherwise a chance correct measure is best.
>
> The Cohen Kappa included in Weka is a reasonable but not a good choice (a chance-corrected version of Accuracy), as like F1 it is not good if prediction bias fails to match prevalence for each class. I include a link to a paper on this below.
>
> What is appropriate is the multiclass form of Kappa called Informedness which is chance correct in the sense that it gives the probability of an informed decision (viz. not chance). Again I include links.
>
> The binary form of this is Peirce(1884)’s I and Youden(1950)’s J and Flach(2003)’s deskewed WRAcc and what is known in Psyc as DeltaP'. It corresponds to the distance above the chance line in the ROC curve, viz. tpr-fpr, which is what is maximized when choosing the standard operating point in ROC.  It macroaverages over predictions as described above to estimate the multiclass form of Informedness (and the short ECAI and long JMLT papers show how the Bookmaker estimate recovers the underlying probability with which a Monte Carlo simulation makes and informed decision or guesses).
>
> This is a hobbyhorse of mine… I originally modelled informedness in terms of gambling on your predictions (hence the multiclass measure is also known as Bookmaker, Bookmaker Informedness or Bookmaker Probability, and that makes it clear why you should weight classes by their bias - the appropriate weight across horses is how much you bet in on each horse. I have written extensively on this, and including providing Matlab scripts, an eXcel calculator and a version of Weka that provides it as an alternate evaluation measure (in Explorer and Experimenter as well as Adaboost, which turns it into Adabook). I include a selection below (but e.g. exclude ones about visualizations, including the relation to ROC and AUC - there’s also a paper about why you should never use F-score, and one that focuses on mutliclass visualizations - both available on arXiv).
>
>
> My Weka fork with Informedness: https://www.dropbox.com/s/artzz1l3vozb6c4/weka.jar?dl=0 
>
> Informedness papers
> 2013 ICINCO Paper+Poster - Adabook & Multibook
> https://dspace.flinders.edu.au/jspui/bitstream/2328/27163/2/Powers%20Evaluation%20poster.pdf
>
> 2012 EACL Paper+Poster - The Problem with Kappa
> https://dspace.flinders.edu.au/jspui/bitstream/2328/27160/1/Powers%20Problem.pdf
> https://dspace.flinders.edu.au/jspui/bitstream/2328/27160/2/Powers%20Problem%20poster.pdf
>
> 2011  JMLT - Evaluation: from Precision, Recall and F-measure to ROC, Informedness, Markedness and Correlation
> http://dspace.flinders.edu.au/jspui/bitstream/2328/27165/1/Powers%20Evaluation.pdf
>
> 2008 ECAI Paper+Poster+Talk  - Evaluation Evaluation
> http://dspace2.flinders.edu.au/xmlui/bitstream/handle/2328/27163/Powers%20Evaluation.pdf
> https://dspace.flinders.edu.au/jspui/bitstream/2328/27163/2/Powers%20Evaluation%20poster.pdf
> http://dspace2.flinders.edu.au/xmlui/bitstream/handle/2328/27163/Powers%20Evaluation%20poster%20TALKY.ppt
>  
> 2003 ICCS Paper+Poster - Recall and Precision vs the Bookmaker (38)
> https://dspace.flinders.edu.au/jspui/bitstream/2328/27159/1/Powers%20Recall.pdf
>  
> 1998 CoNLL Paper - The Present use of Statistics in evaluation of NLP parsers
> http://www.aclweb.org/anthology/W98-1226
>
>
> You also mentioned liking AUROC. It is important to understand what this actually measures!
>
> ROC AUC gives the probability that a positive prediction is ranked above a negative prediction, and represents a balance between finding a specific operating point (Certainty = (Informedness+1)/2 is then the area under a three point curve) and how much room there is for distributional variance (Consistency = AUC-Certainty - area between the multipoint curve or convex hull and the three point curve - as discussed in my ROC ConCert paper - I’ve added a link to this).
>
> 2012 ROC ConCert
> http://www.academia.edu/download/31939951/201203-SCET30795-ROC-ConCert-PID1124774.pdf 
>
> dp
> —
> Prof. David M W Powers, Ph.D.   http://flinders.edu.au/people/David.Powers
>                                                                    mail: [hidden email]
>
> Professor of Computer Science & Cognitive Science, TON2.10
> South Australia Research Director,  ARC ITRH Digital Enhanced Living Hub
>
> College of Science and Engineering                              (Phone: 08-8201 3663)
> Flinders University, Tonsley, South Australia 5042       (Fax: +61-8-8201 3626)
> GPO Box 2100 Adelaide SA 5001                       (Mobile/Viber: 0414-824-307)
> AUSTRALIA              
>
>

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Kappa metric for multi-class classification?

mcbenly
Thanks Eibe for clarification on kappa and Prof David for providing such a
wonderful material.

But honestly, I am still confused and still haven't come to conclusion that
what performance measure should I go with?

My classes distribution is : A = 85%, B=6%, C=4%, D=5%.. I have used SMOTE
to handle class imbalance.

Its text data and application is "detection" from text.

What performance measure you would have chosen for this multi-class
classification?


Thanks,



--
Sent from: https://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Kappa metric for multi-class classification?

mcbenly
I guess my previous question was kinda silly to ask -- apologies for that. As
I understand it's subject to application and goal.

@David, I am trying out your weka fork with different experiment settings.
Thanks for that.


@Eibe, quick question for you. Kappa statistics that weka generate in
output, is it suitable for multi-class? Generally I have seen people using
it for binary classification, is it based on two ranker? is the ranker in
other words a class? so let's say if I have four by four matrix (4 classes)
, does that mean I will have four ranker? ... Once again apologies for this
silly question.. Tried googling but couldn't find useful resource in
relation to multi-class.
Also, on this forum, I have seen some kappa related results, and in their
output there is Quadratic Kappa score as well. But I don't see in my output.
What's story with that? and how do I enable it.

Your response is highly appreciated.


Thanks,



--
Sent from: https://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Kappa metric for multi-class classification?

Eibe Frank-2
Administrator
Quadratic weighted kappa is used for ordinal classification problems.

As I said in my earlier email, the basic kappa statistic is essentially just a normalised version of the percentage of correct classifications, where normalisation is performed with respect to the performance of a random classifier. It shows, at a glance, how much your classifier improves on a random one. Please take another look at the formula for kappa that I posted.

Cheers,
Eibe

> On 26/09/2019, at 11:30 PM, mcbenly <[hidden email]> wrote:
>
> I guess my previous question was kinda silly to ask -- apologies for that. As
> I understand it's subject to application and goal.
>
> @David, I am trying out your weka fork with different experiment settings.
> Thanks for that.
>
>
> @Eibe, quick question for you. Kappa statistics that weka generate in
> output, is it suitable for multi-class? Generally I have seen people using
> it for binary classification, is it based on two ranker? is the ranker in
> other words a class? so let's say if I have four by four matrix (4 classes)
> , does that mean I will have four ranker? ... Once again apologies for this
> silly question.. Tried googling but couldn't find useful resource in
> relation to multi-class.
> Also, on this forum, I have seen some kappa related results, and in their
> output there is Quadratic Kappa score as well. But I don't see in my output.
> What's story with that? and how do I enable it.
>
> Your response is highly appreciated.
>
>
> Thanks,
>
>
>
> --
> Sent from: https://weka.8497.n7.nabble.com/
> _______________________________________________
> Wekalist mailing list -- [hidden email]
> Send posts to: To unsubscribe send an email to [hidden email]
> To subscribe, unsubscribe, etc., visit
> https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html