Kappa & MCC & ROC Area: how are they calculated in Weka?

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Kappa & MCC & ROC Area: how are they calculated in Weka?

Marina Santini
Hi,

I would like to have more details on how kappa and MCC are calculated
in Weka. I rely on them in Weka Explorer when I have unbalanced
datasets (either binary or multi-class) and I would like to be sure
that I correctly understand how weka has implemented these two
statistics (MCC is not documented in the latest weka book 4th ed).

I usually interpret small kappa and MCC values (such as 0.1) as an
indication that a classifier is guessing randomly even if I have an
accuracy of, say, 80%, and a precision of 0.7 and a recall of 0.6.

I understand that kappa is related to accuracy: kappa is a global
measure of how reliable is the accuracy of  classifier as a whole:.

I understand that MCC is linked to Precision and Recall.
Weka breaks down Precision and Recall per class: so I can see whether
one class is performing better than the other. In the case of
unbalanced datasets, this information is very important.
I observe that often the values of MCC are the same for all the
classes. Is this a coincidence or weka is using only one (unified)
calculation for all the classes? Would it be possible see the MCC per
individual class?

The same is true for  the value listed under ROC Area: one value
identical for all the classes. To my understanding they should not be
always the same.

Last but not least, what does it mean when I have a kappa of 0.13, a
MCC of 1.3 and a ROC Area of 0.7: given these kappa and MCC values, I
would expect a ROC Area of 0.5 or at most 0.55.

Thanks in advance for any clarification

Cheers, Marina
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Kappa & MCC & ROC Area: how are they calculated in Weka?

Eibe Frank-2
Administrator

On 19/05/2017, at 3:29 AM, Marina Santini <[hidden email]> wrote:

I understand that MCC is linked to Precision and Recall.
Weka breaks down Precision and Recall per class: so I can see whether
one class is performing better than the other. In the case of
unbalanced datasets, this information is very important.
I observe that often the values of MCC are the same for all the
classes. Is this a coincidence or weka is using only one (unified)
calculation for all the classes? Would it be possible see the MCC per
individual class?

Not sure what you mean. WEKA does output an MCC value for each class. Here is the output for J48 on the iris data (default settings for everything):

=== Detailed Accuracy By Class ===

                 TP Rate  FP Rate  Precision  Recall   F-Measure  MCC      ROC Area  PRC Area  Class
                 0.980    0.000    1.000      0.980    0.990      0.985    0.990     0.987     Iris-setosa
                 0.940    0.030    0.940      0.940    0.940      0.910    0.952     0.880     Iris-versicolor
                 0.960    0.030    0.941      0.960    0.950      0.925    0.961     0.905     Iris-virginica
Weighted Avg.    0.960    0.020    0.960      0.960    0.960      0.940    0.968     0.924     



The same is true for  the value listed under ROC Area: one value
identical for all the classes. To my understanding they should not be
always the same.

In the two-class case, they should always be the same in theory, but they are sometimes not in practice. The reason for this is lack of floating-point precision for representing the probability estimates.

As you can see above, they are usually not the same when you have more than two classes.

Last but not least, what does it mean when I have a kappa of 0.13, a
MCC of 1.3 and a ROC Area of 0.7: given these kappa and MCC values, I
would expect a ROC Area of 0.5 or at most 0.55.

MCC and kappa are based directly on the classifications obtained from the classifier. ROC area is obtained by ranking test instances based on the classifiers’ class probability estimates for them. The ranking can be good even if classification accuracy is low. For example, in the two-class case, if, for all the test instances, the estimated probability for the positive class is smaller than 0.5, all truly positive test instances will be misclassified as negative. However, ROC area can still be a perfect 1 if all positive instances are ranked above all negative instances according to the estimated probability of belonging to the positive class.

For example, when applying naive Bayes, the accuracy of the classifications may be poor even though the ranking of instances according to its probability estimates is quite good.

In your scenario, you could try to increase kappa and MCC by applying the CostSensitiveClassifier with the minimum expected cost approach and specifying an appropriate cost matrix. Inspecting the confusion matrix and the ROC curves for the different classes might give you a hint as to which misclassifications should incur higher costs to achieve greater kappa or MCC.

Another option is to use the ThresholdSelector to optimise for accuracy based on the classifiers’ probability estimates. It can only optimise one class against the rest though.

Finally, you could try calibrating the classifier's class probability estimates, by using Stacking with ClassificationViaRegression+IsotonicRegression as the meta learner.

Cheers,
Eibe


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Kappa & MCC & ROC Area: how are they calculated in Weka?

Marina Santini
Thanks for your exhaustive answer, Eibe. It is informative and contains the details that I needed. 

Have a great day

Marina

On 19 May 2017 at 06:17, Eibe Frank <[hidden email]> wrote:

On 19/05/2017, at 3:29 AM, Marina Santini <[hidden email]> wrote:

I understand that MCC is linked to Precision and Recall.
Weka breaks down Precision and Recall per class: so I can see whether
one class is performing better than the other. In the case of
unbalanced datasets, this information is very important.
I observe that often the values of MCC are the same for all the
classes. Is this a coincidence or weka is using only one (unified)
calculation for all the classes? Would it be possible see the MCC per
individual class?

Not sure what you mean. WEKA does output an MCC value for each class. Here is the output for J48 on the iris data (default settings for everything):

=== Detailed Accuracy By Class ===

                 TP Rate  FP Rate  Precision  Recall   F-Measure  MCC      ROC Area  PRC Area  Class
                 0.980    0.000    1.000      0.980    0.990      0.985    0.990     0.987     Iris-setosa
                 0.940    0.030    0.940      0.940    0.940      0.910    0.952     0.880     Iris-versicolor
                 0.960    0.030    0.941      0.960    0.950      0.925    0.961     0.905     Iris-virginica
Weighted Avg.    0.960    0.020    0.960      0.960    0.960      0.940    0.968     0.924     



The same is true for  the value listed under ROC Area: one value
identical for all the classes. To my understanding they should not be
always the same.

In the two-class case, they should always be the same in theory, but they are sometimes not in practice. The reason for this is lack of floating-point precision for representing the probability estimates.

As you can see above, they are usually not the same when you have more than two classes.

Last but not least, what does it mean when I have a kappa of 0.13, a
MCC of 1.3 and a ROC Area of 0.7: given these kappa and MCC values, I
would expect a ROC Area of 0.5 or at most 0.55.

MCC and kappa are based directly on the classifications obtained from the classifier. ROC area is obtained by ranking test instances based on the classifiers’ class probability estimates for them. The ranking can be good even if classification accuracy is low. For example, in the two-class case, if, for all the test instances, the estimated probability for the positive class is smaller than 0.5, all truly positive test instances will be misclassified as negative. However, ROC area can still be a perfect 1 if all positive instances are ranked above all negative instances according to the estimated probability of belonging to the positive class.

For example, when applying naive Bayes, the accuracy of the classifications may be poor even though the ranking of instances according to its probability estimates is quite good.

In your scenario, you could try to increase kappa and MCC by applying the CostSensitiveClassifier with the minimum expected cost approach and specifying an appropriate cost matrix. Inspecting the confusion matrix and the ROC curves for the different classes might give you a hint as to which misclassifications should incur higher costs to achieve greater kappa or MCC.

Another option is to use the ThresholdSelector to optimise for accuracy based on the classifiers’ probability estimates. It can only optimise one class against the rest though.

Finally, you could try calibrating the classifier's class probability estimates, by using Stacking with ClassificationViaRegression+IsotonicRegression as the meta learner.

Cheers,
Eibe


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Kappa & MCC & ROC Area: how are they calculated in Weka?

Marina Santini
In reply to this post by Eibe Frank-2
Hi, 

two follow-up questions on your answer, Eibe. 

1) Where is the ThresholdSelector in Weka Explorer?

Where is the IsotonicRegression in Weka Explorer? I have AdditiveRegression, but not IsotonicRegression. 

I am using version 3.8.1

Thanks a lot. 

Cheers, Marina

On 19 May 2017 at 06:17, Eibe Frank <[hidden email]> wrote:

On 19/05/2017, at 3:29 AM, Marina Santini <[hidden email]> wrote:

I understand that MCC is linked to Precision and Recall.
Weka breaks down Precision and Recall per class: so I can see whether
one class is performing better than the other. In the case of
unbalanced datasets, this information is very important.
I observe that often the values of MCC are the same for all the
classes. Is this a coincidence or weka is using only one (unified)
calculation for all the classes? Would it be possible see the MCC per
individual class?

Not sure what you mean. WEKA does output an MCC value for each class. Here is the output for J48 on the iris data (default settings for everything):

=== Detailed Accuracy By Class ===

                 TP Rate  FP Rate  Precision  Recall   F-Measure  MCC      ROC Area  PRC Area  Class
                 0.980    0.000    1.000      0.980    0.990      0.985    0.990     0.987     Iris-setosa
                 0.940    0.030    0.940      0.940    0.940      0.910    0.952     0.880     Iris-versicolor
                 0.960    0.030    0.941      0.960    0.950      0.925    0.961     0.905     Iris-virginica
Weighted Avg.    0.960    0.020    0.960      0.960    0.960      0.940    0.968     0.924     



The same is true for  the value listed under ROC Area: one value
identical for all the classes. To my understanding they should not be
always the same.

In the two-class case, they should always be the same in theory, but they are sometimes not in practice. The reason for this is lack of floating-point precision for representing the probability estimates.

As you can see above, they are usually not the same when you have more than two classes.

Last but not least, what does it mean when I have a kappa of 0.13, a
MCC of 1.3 and a ROC Area of 0.7: given these kappa and MCC values, I
would expect a ROC Area of 0.5 or at most 0.55.

MCC and kappa are based directly on the classifications obtained from the classifier. ROC area is obtained by ranking test instances based on the classifiers’ class probability estimates for them. The ranking can be good even if classification accuracy is low. For example, in the two-class case, if, for all the test instances, the estimated probability for the positive class is smaller than 0.5, all truly positive test instances will be misclassified as negative. However, ROC area can still be a perfect 1 if all positive instances are ranked above all negative instances according to the estimated probability of belonging to the positive class.

For example, when applying naive Bayes, the accuracy of the classifications may be poor even though the ranking of instances according to its probability estimates is quite good.

In your scenario, you could try to increase kappa and MCC by applying the CostSensitiveClassifier with the minimum expected cost approach and specifying an appropriate cost matrix. Inspecting the confusion matrix and the ROC curves for the different classes might give you a hint as to which misclassifications should incur higher costs to achieve greater kappa or MCC.

Another option is to use the ThresholdSelector to optimise for accuracy based on the classifiers’ probability estimates. It can only optimise one class against the rest though.

Finally, you could try calibrating the classifier's class probability estimates, by using Stacking with ClassificationViaRegression+IsotonicRegression as the meta learner.

Cheers,
Eibe


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Kappa & MCC & ROC Area: how are they calculated in Weka?

Eibe Frank-2
Administrator
They are in packages:

http://weka.sourceforge.net/packageMetaData/thresholdSelector/index.html
http://weka.sourceforge.net/packageMetaData/isotonicRegression/index.html

Cheers,
Eibe

> On 26/05/2017, at 4:15 AM, Marina Santini <[hidden email]> wrote:
>
> Hi,
>
> two follow-up questions on your answer, Eibe.
>
> 1) Where is the ThresholdSelector in Weka Explorer?
>
> Where is the IsotonicRegression in Weka Explorer? I have AdditiveRegression, but not IsotonicRegression.
>
> I am using version 3.8.1
>
> Thanks a lot.
>
> Cheers, Marina
>
> On 19 May 2017 at 06:17, Eibe Frank <[hidden email]> wrote:
>
>> On 19/05/2017, at 3:29 AM, Marina Santini <[hidden email]> wrote:
>>
>> I understand that MCC is linked to Precision and Recall.
>> Weka breaks down Precision and Recall per class: so I can see whether
>> one class is performing better than the other. In the case of
>> unbalanced datasets, this information is very important.
>> I observe that often the values of MCC are the same for all the
>> classes. Is this a coincidence or weka is using only one (unified)
>> calculation for all the classes? Would it be possible see the MCC per
>> individual class?
>
> Not sure what you mean. WEKA does output an MCC value for each class. Here is the output for J48 on the iris data (default settings for everything):
>
> === Detailed Accuracy By Class ===
>
>                  TP Rate  FP Rate  Precision  Recall   F-Measure  MCC      ROC Area  PRC Area  Class
>                  0.980    0.000    1.000      0.980    0.990      0.985    0.990     0.987     Iris-setosa
>                  0.940    0.030    0.940      0.940    0.940      0.910    0.952     0.880     Iris-versicolor
>                  0.960    0.030    0.941      0.960    0.950      0.925    0.961     0.905     Iris-virginica
> Weighted Avg.    0.960    0.020    0.960      0.960    0.960      0.940    0.968     0.924    
>
>
>>
>> The same is true for  the value listed under ROC Area: one value
>> identical for all the classes. To my understanding they should not be
>> always the same.
>
> In the two-class case, they should always be the same in theory, but they are sometimes not in practice. The reason for this is lack of floating-point precision for representing the probability estimates.
>
> As you can see above, they are usually not the same when you have more than two classes.
>
>> Last but not least, what does it mean when I have a kappa of 0.13, a
>> MCC of 1.3 and a ROC Area of 0.7: given these kappa and MCC values, I
>> would expect a ROC Area of 0.5 or at most 0.55.
>
> MCC and kappa are based directly on the classifications obtained from the classifier. ROC area is obtained by ranking test instances based on the classifiers’ class probability estimates for them. The ranking can be good even if classification accuracy is low. For example, in the two-class case, if, for all the test instances, the estimated probability for the positive class is smaller than 0.5, all truly positive test instances will be misclassified as negative. However, ROC area can still be a perfect 1 if all positive instances are ranked above all negative instances according to the estimated probability of belonging to the positive class.
>
> For example, when applying naive Bayes, the accuracy of the classifications may be poor even though the ranking of instances according to its probability estimates is quite good.
>
> In your scenario, you could try to increase kappa and MCC by applying the CostSensitiveClassifier with the minimum expected cost approach and specifying an appropriate cost matrix. Inspecting the confusion matrix and the ROC curves for the different classes might give you a hint as to which misclassifications should incur higher costs to achieve greater kappa or MCC.
>
> Another option is to use the ThresholdSelector to optimise for accuracy based on the classifiers’ probability estimates. It can only optimise one class against the rest though.
>
> Finally, you could try calibrating the classifier's class probability estimates, by using Stacking with ClassificationViaRegression+IsotonicRegression as the meta learner.
>
> Cheers,
> Eibe
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Kappa & MCC & ROC Area: how are they calculated in Weka?

Marina Santini
Hi Eibe and thank you for your answer. From your reply I understand that these options are not available from the graphical interfaces, but they must the called via the API.
Is this correct?

Thanx, Marina

On 26 May 2017 at 00:28, Eibe Frank <[hidden email]> wrote:
They are in packages:

http://weka.sourceforge.net/packageMetaData/thresholdSelector/index.html
http://weka.sourceforge.net/packageMetaData/isotonicRegression/index.html

Cheers,
Eibe

> On 26/05/2017, at 4:15 AM, Marina Santini <[hidden email]> wrote:
>
> Hi,
>
> two follow-up questions on your answer, Eibe.
>
> 1) Where is the ThresholdSelector in Weka Explorer?
>
> Where is the IsotonicRegression in Weka Explorer? I have AdditiveRegression, but not IsotonicRegression.
>
> I am using version 3.8.1
>
> Thanks a lot.
>
> Cheers, Marina
>
> On 19 May 2017 at 06:17, Eibe Frank <[hidden email]> wrote:
>
>> On 19/05/2017, at 3:29 AM, Marina Santini <[hidden email]> wrote:
>>
>> I understand that MCC is linked to Precision and Recall.
>> Weka breaks down Precision and Recall per class: so I can see whether
>> one class is performing better than the other. In the case of
>> unbalanced datasets, this information is very important.
>> I observe that often the values of MCC are the same for all the
>> classes. Is this a coincidence or weka is using only one (unified)
>> calculation for all the classes? Would it be possible see the MCC per
>> individual class?
>
> Not sure what you mean. WEKA does output an MCC value for each class. Here is the output for J48 on the iris data (default settings for everything):
>
> === Detailed Accuracy By Class ===
>
>                  TP Rate  FP Rate  Precision  Recall   F-Measure  MCC      ROC Area  PRC Area  Class
>                  0.980    0.000    1.000      0.980    0.990      0.985    0.990     0.987     Iris-setosa
>                  0.940    0.030    0.940      0.940    0.940      0.910    0.952     0.880     Iris-versicolor
>                  0.960    0.030    0.941      0.960    0.950      0.925    0.961     0.905     Iris-virginica
> Weighted Avg.    0.960    0.020    0.960      0.960    0.960      0.940    0.968     0.924
>
>
>>
>> The same is true for  the value listed under ROC Area: one value
>> identical for all the classes. To my understanding they should not be
>> always the same.
>
> In the two-class case, they should always be the same in theory, but they are sometimes not in practice. The reason for this is lack of floating-point precision for representing the probability estimates.
>
> As you can see above, they are usually not the same when you have more than two classes.
>
>> Last but not least, what does it mean when I have a kappa of 0.13, a
>> MCC of 1.3 and a ROC Area of 0.7: given these kappa and MCC values, I
>> would expect a ROC Area of 0.5 or at most 0.55.
>
> MCC and kappa are based directly on the classifications obtained from the classifier. ROC area is obtained by ranking test instances based on the classifiers’ class probability estimates for them. The ranking can be good even if classification accuracy is low. For example, in the two-class case, if, for all the test instances, the estimated probability for the positive class is smaller than 0.5, all truly positive test instances will be misclassified as negative. However, ROC area can still be a perfect 1 if all positive instances are ranked above all negative instances according to the estimated probability of belonging to the positive class.
>
> For example, when applying naive Bayes, the accuracy of the classifications may be poor even though the ranking of instances according to its probability estimates is quite good.
>
> In your scenario, you could try to increase kappa and MCC by applying the CostSensitiveClassifier with the minimum expected cost approach and specifying an appropriate cost matrix. Inspecting the confusion matrix and the ROC curves for the different classes might give you a hint as to which misclassifications should incur higher costs to achieve greater kappa or MCC.
>
> Another option is to use the ThresholdSelector to optimise for accuracy based on the classifiers’ probability estimates. It can only optimise one class against the rest though.
>
> Finally, you could try calibrating the classifier's class probability estimates, by using Stacking with ClassificationViaRegression+IsotonicRegression as the meta learner.
>
> Cheers,
> Eibe
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Kappa & MCC & ROC Area: how are they calculated in Weka?

Eibe Frank-2
Administrator
No, they are available via the GUIs. All you need to do is install the packages with the WEKA package manager (available from the Tools menu of the WEKA GUIChooser).

Cheers,
Eibe

> On 26 May 2017, at 19:45, Marina Santini <[hidden email]> wrote:
>
> Hi Eibe and thank you for your answer. From your reply I understand that these options are not available from the graphical interfaces, but they must the called via the API.
> Is this correct?
>
> Thanx, Marina
>
> On 26 May 2017 at 00:28, Eibe Frank <[hidden email]> wrote:
> They are in packages:
>
> http://weka.sourceforge.net/packageMetaData/thresholdSelector/index.html
> http://weka.sourceforge.net/packageMetaData/isotonicRegression/index.html
>
> Cheers,
> Eibe
>
> > On 26/05/2017, at 4:15 AM, Marina Santini <[hidden email]> wrote:
> >
> > Hi,
> >
> > two follow-up questions on your answer, Eibe.
> >
> > 1) Where is the ThresholdSelector in Weka Explorer?
> >
> > Where is the IsotonicRegression in Weka Explorer? I have AdditiveRegression, but not IsotonicRegression.
> >
> > I am using version 3.8.1
> >
> > Thanks a lot.
> >
> > Cheers, Marina
> >
> > On 19 May 2017 at 06:17, Eibe Frank <[hidden email]> wrote:
> >
> >> On 19/05/2017, at 3:29 AM, Marina Santini <[hidden email]> wrote:
> >>
> >> I understand that MCC is linked to Precision and Recall.
> >> Weka breaks down Precision and Recall per class: so I can see whether
> >> one class is performing better than the other. In the case of
> >> unbalanced datasets, this information is very important.
> >> I observe that often the values of MCC are the same for all the
> >> classes. Is this a coincidence or weka is using only one (unified)
> >> calculation for all the classes? Would it be possible see the MCC per
> >> individual class?
> >
> > Not sure what you mean. WEKA does output an MCC value for each class. Here is the output for J48 on the iris data (default settings for everything):
> >
> > === Detailed Accuracy By Class ===
> >
> >                  TP Rate  FP Rate  Precision  Recall   F-Measure  MCC      ROC Area  PRC Area  Class
> >                  0.980    0.000    1.000      0.980    0.990      0.985    0.990     0.987     Iris-setosa
> >                  0.940    0.030    0.940      0.940    0.940      0.910    0.952     0.880     Iris-versicolor
> >                  0.960    0.030    0.941      0.960    0.950      0.925    0.961     0.905     Iris-virginica
> > Weighted Avg.    0.960    0.020    0.960      0.960    0.960      0.940    0.968     0.924
> >
> >
> >>
> >> The same is true for  the value listed under ROC Area: one value
> >> identical for all the classes. To my understanding they should not be
> >> always the same.
> >
> > In the two-class case, they should always be the same in theory, but they are sometimes not in practice. The reason for this is lack of floating-point precision for representing the probability estimates.
> >
> > As you can see above, they are usually not the same when you have more than two classes.
> >
> >> Last but not least, what does it mean when I have a kappa of 0.13, a
> >> MCC of 1.3 and a ROC Area of 0.7: given these kappa and MCC values, I
> >> would expect a ROC Area of 0.5 or at most 0.55.
> >
> > MCC and kappa are based directly on the classifications obtained from the classifier. ROC area is obtained by ranking test instances based on the classifiers’ class probability estimates for them. The ranking can be good even if classification accuracy is low. For example, in the two-class case, if, for all the test instances, the estimated probability for the positive class is smaller than 0.5, all truly positive test instances will be misclassified as negative. However, ROC area can still be a perfect 1 if all positive instances are ranked above all negative instances according to the estimated probability of belonging to the positive class.
> >
> > For example, when applying naive Bayes, the accuracy of the classifications may be poor even though the ranking of instances according to its probability estimates is quite good.
> >
> > In your scenario, you could try to increase kappa and MCC by applying the CostSensitiveClassifier with the minimum expected cost approach and specifying an appropriate cost matrix. Inspecting the confusion matrix and the ROC curves for the different classes might give you a hint as to which misclassifications should incur higher costs to achieve greater kappa or MCC.
> >
> > Another option is to use the ThresholdSelector to optimise for accuracy based on the classifiers’ probability estimates. It can only optimise one class against the rest though.
> >
> > Finally, you could try calibrating the classifier's class probability estimates, by using Stacking with ClassificationViaRegression+IsotonicRegression as the meta learner.
> >
> > Cheers,
> > Eibe
> >
> >
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> >
> >
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html