Probability calibration and instance weights

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Probability calibration and instance weights

Tom Horrocks-2
Hi all,

I'm using the MIWrapper for multi-instance learning, which requires the base classifier to (1) accept instance weights, and (2) produce well-calibrated probability estimates. I would like to try Random Forests as the base classifier, however it gives poor probability estimates and so requires calibration. From previous mailing list posts, it seems the standard way to achieve this is by stacking the base classifier (Random Forest) with Logistic Regression or isotonic regression. The ideal solution would look something ilke: MIWrapper(Stacking(LogisticRegresion, RandomForests)).

Unfortuntely, the Stacking meta-classifier cannot handle instance weights, let alone pass them through to its base classifier (Random Forests). Can anyone suggest a way around this? Without a work-around, the MIWrapper can only really take classifiers that natively produce good probability estimates (e.g., Logistic Regression) or otherwise have a calibration method baked into the class (e.g., SMO).

I'd also prefer not to use the WeightedInstanceHandler; I want the instance weights to get to the base classifier.

Cheers,
Tom

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Probability calibration and instance weights

Eibe Frank-2
Administrator
You are right. The following meta classifiers in  the core WEKA distribution do not currently implement WeightedInstancesHandler:

ClassificationViaRegression
CostSensitiveClassifier
+ CVParameterSelection
+ IterativeClassifierOptimizer
MultiClassClassifier
+ MultiScheme
RegressionByDiscretization
+ Stacking
Vote

This includes Stacking. However, Stacking, just like the other schemes marked with a “+” in the above list, use k-fold cross-validation internally, which does not take instance weights into account. (The others should be updated to implement WeightedInstancesHandler.)

It is a pity that we don’t have an option in Bagging to perform baked-in calibration on the out-of-bag data. That would be ideal. (RandomForest essentially just applies Bagging with RandomTree as the base learner.)

In the meantime, your best option may be to use WeightedInstancesHandlerWrapper, perhaps forming a RandomCommittee based on it inside MIWrapper.

Note that calibration with logistic regression works best when feeding it log-odds rather than probabilities. This can be done using the FilteredClassifier. However, I would perhaps just use isotonic regression as the meta learner in Stacking. (It can be applied to classification problems using ClassificationViaRegression).

Cheers,
Eibe

> On 22/11/2018, at 3:18 PM, Tom Horrocks <[hidden email]> wrote:
>
> Hi all,
>
> I'm using the MIWrapper for multi-instance learning, which requires the base classifier to (1) accept instance weights, and (2) produce well-calibrated probability estimates. I would like to try Random Forests as the base classifier, however it gives poor probability estimates and so requires calibration. From previous mailing list posts, it seems the standard way to achieve this is by stacking the base classifier (Random Forest) with Logistic Regression or isotonic regression. The ideal solution would look something ilke: MIWrapper(Stacking(LogisticRegresion, RandomForests)).
>
> Unfortuntely, the Stacking meta-classifier cannot handle instance weights, let alone pass them through to its base classifier (Random Forests). Can anyone suggest a way around this? Without a work-around, the MIWrapper can only really take classifiers that natively produce good probability estimates (e.g., Logistic Regression) or otherwise have a calibration method baked into the class (e.g., SMO).
>
> I'd also prefer not to use the WeightedInstanceHandler; I want the instance weights to get to the base classifier.
>
> Cheers,
> Tom
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html