Using Resample to Oversample and Undersample

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Using Resample to Oversample and Undersample

lp77
I'm using WEKA API in Java to perform sampling methods on a dataset. Two of
the sampling methods I am performing are oversampling and undersampling. I
know I need to use the Resample filter, but I am confused as to what each
attribute of the filter needs to be set to in order to perform oversampling
and undersampling.
Would anyone be able to advise me on using the resample filter for
oversampling and undersampling?

Thanks.



--
Sent from: http://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Using Resample to Oversample and Undersample

Eibe Frank-2
Administrator
From an earlier message to this list:

.supervised.instance.Resample uses the following expression to determine the number of instances to sample for a particular class i:

int sampleSize = (int)((m_SampleSizePercent / 100.0) * ((1 - m_BiasToUniformClass) * numInstancesPerClass[i] + m_BiasToUniformClass * data.numInstances() / numActualClasses));

where data.numInstances() gives the total number of instances in the dataset, numInstancesPerClass[i] holds the number of instances in class i and numActualClasses corresponds to the number of classes that actually occur in the dataset (some classes declared in an ARFF file may not have any instances in the data).

Assuming you have only two classes, you can do the following.

To undersample the majority class so that both classes have the same number of instances, use noReplacement=true, biasToUniformClass=1.0, and sampleSizePercent=X, where X/2 is (approximately) the percentage of data that belongs to the minority class.

For example, on the diabetes data that comes with WEKA, you can use the following configuration:

  weka.filters.supervised.instance.Resample -B 1.0 -Z 69.8 -no-replacement

You will probably need to fiddle with the -Z parameter (sampleSizePercent) a bit to keep all the instances of the minority class. Watch out for something like "WARNING: Not enough instances of tested_positive for selected value of bias parameter in supervised Resample filter when sampling without replacement.” It means the value specified by -Z is too large.

A much easier way to achieve the same effect is to use the SpreadSubsample filter instead, with distributionSpread=1.0:

  weka.filters.supervised.instance.SpreadSubsample -M 1.0

To oversample the minority class so that both classes have the same number of instances, use the supervised Resample filter with noReplacement=false, biasToUniformClass=1.0, and sampleSizePercent=Y, where Y/2 is (approximately) the percentage of data that belongs to the majority class. Example for the diabetes data:

  weka.filters.supervised.instance.Resample -B 1.0 -Z 130.3

Note that this will apply sampling *with* replacement to the majority class as well, so it may not be ideal for your application! To get oversampling of the minority class and keep the majority class untouched, you may need to write your own program or use the KnowledgeFlow.

Cheers,
Eibe

> On 30/01/2019, at 7:00 AM, lp77 <[hidden email]> wrote:
>
> I'm using WEKA API in Java to perform sampling methods on a dataset. Two of
> the sampling methods I am performing are oversampling and undersampling. I
> know I need to use the Resample filter, but I am confused as to what each
> attribute of the filter needs to be set to in order to perform oversampling
> and undersampling.
> Would anyone be able to advise me on using the resample filter for
> oversampling and undersampling?
>
> Thanks.
>
>
>
> --
> Sent from: http://weka.8497.n7.nabble.com/
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html