Problem with class balancing (SMOTE)

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Problem with class balancing (SMOTE)

asadbtk
Hi
I am using few datasets about software fault predictions, some of which are very imbalance, i.e. the class values are True and False and these values in some datasets are 0:100. It means true values are 0 and false values are 100%. 

I am using the SMOTE in Weka and for some datasets (when I apply SMOTE) it only increases only the majority class i.e. false values which are already 100%. I use the default index in SMOTE settings which is 0 and it supposes to increase the values of true but it doesn't. I am not sure what exactly I am doing wrong? .

Thanks 

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Problem with class balancing (SMOTE)

Eibe Frank-2
Administrator
It seems to work fine for me. Note that you obviously need to have a least some data in the minority class. Also the “percentage” parameter needs to be set appropriately. 100% means that SMOTE will double the number of instances in the minority class. For example, if you have only 2 instances in the minority class, the filtered data will have 4 instances.

Cheers,
Eibe

> On 15/09/2019, at 8:28 AM, javed khan <[hidden email]> wrote:
>
> Hi
> I am using few datasets about software fault predictions, some of which are very imbalance, i.e. the class values are True and False and these values in some datasets are 0:100. It means true values are 0 and false values are 100%.
>
> I am using the SMOTE in Weka and for some datasets (when I apply SMOTE) it only increases only the majority class i.e. false values which are already 100%. I use the default index in SMOTE settings which is 0 and it supposes to increase the values of true but it doesn't. I am not sure what exactly I am doing wrong? .
>
> Thanks
> _______________________________________________
> Wekalist mailing list -- [hidden email]
> Send posts to: To unsubscribe send an email to [hidden email]
> To subscribe, unsubscribe, etc., visit %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Problem with class balancing (SMOTE)

asadbtk
Hi Eibe, thanks for your reply. 

It means if we have 0 percent data in the minority class, Smote will not work? If it is so, then what would be the solution? 

Thanks again 

On Tuesday, September 17, 2019, Eibe Frank <[hidden email]> wrote:
It seems to work fine for me. Note that you obviously need to have a least some data in the minority class. Also the “percentage” parameter needs to be set appropriately. 100% means that SMOTE will double the number of instances in the minority class. For example, if you have only 2 instances in the minority class, the filtered data will have 4 instances.

Cheers,
Eibe

> On 15/09/2019, at 8:28 AM, javed khan <[hidden email]> wrote:
>
> Hi
> I am using few datasets about software fault predictions, some of which are very imbalance, i.e. the class values are True and False and these values in some datasets are 0:100. It means true values are 0 and false values are 100%.
>
> I am using the SMOTE in Weka and for some datasets (when I apply SMOTE) it only increases only the majority class i.e. false values which are already 100%. I use the default index in SMOTE settings which is 0 and it supposes to increase the values of true but it doesn't. I am not sure what exactly I am doing wrong? .
>
> Thanks
> _______________________________________________
> Wekalist mailing list -- [hidden email]
> Send posts to: To unsubscribe send an email to [hidden email]
> To subscribe, unsubscribe, etc., visit %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Problem with class balancing (SMOTE)

Eibe Frank-2
Administrator
If there is no data, SMOTE won’t be able to generate instances.

If you have data from one class only and you want to learn a discriminative model, one option is to apply a technique for one-class classification.

Cheers,
Eibe

> On 17/09/2019, at 11:37 PM, javed khan <[hidden email]> wrote:
>
> Hi Eibe, thanks for your reply.
>
> It means if we have 0 percent data in the minority class, Smote will not work? If it is so, then what would be the solution?
>
> Thanks again
>
> On Tuesday, September 17, 2019, Eibe Frank <[hidden email]> wrote:
> It seems to work fine for me. Note that you obviously need to have a least some data in the minority class. Also the “percentage” parameter needs to be set appropriately. 100% means that SMOTE will double the number of instances in the minority class. For example, if you have only 2 instances in the minority class, the filtered data will have 4 instances.
>
> Cheers,
> Eibe
>
> > On 15/09/2019, at 8:28 AM, javed khan <[hidden email]> wrote:
> >
> > Hi
> > I am using few datasets about software fault predictions, some of which are very imbalance, i.e. the class values are True and False and these values in some datasets are 0:100. It means true values are 0 and false values are 100%.
> >
> > I am using the SMOTE in Weka and for some datasets (when I apply SMOTE) it only increases only the majority class i.e. false values which are already 100%. I use the default index in SMOTE settings which is 0 and it supposes to increase the values of true but it doesn't. I am not sure what exactly I am doing wrong? .
> >
> > Thanks
> > _______________________________________________
> > Wekalist mailing list -- [hidden email]
> > Send posts to: To unsubscribe send an email to [hidden email]
> > To subscribe, unsubscribe, etc., visit %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> _______________________________________________
> Wekalist mailing list -- [hidden email]
> Send posts to: To unsubscribe send an email to [hidden email]
> To subscribe, unsubscribe, etc., visit %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> _______________________________________________
> Wekalist mailing list -- [hidden email]
> Send posts to: To unsubscribe send an email to [hidden email]
> To subscribe, unsubscribe, etc., visit %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Problem with class balancing (SMOTE)

asadbtk
Hi Eibe, don't you think in one class classification the results will be biased? In my dataset, there is no true class data, but what if we make one row(instance) from false to true and then apply the smote to make the ratio as 20:80? Do you think it would work or justifiable? 

Thanks 

On Wednesday, September 18, 2019, Eibe Frank <[hidden email]> wrote:
If there is no data, SMOTE won’t be able to generate instances.

If you have data from one class only and you want to learn a discriminative model, one option is to apply a technique for one-class classification.

Cheers,
Eibe

> On 17/09/2019, at 11:37 PM, javed khan <[hidden email]> wrote:
>
> Hi Eibe, thanks for your reply.
>
> It means if we have 0 percent data in the minority class, Smote will not work? If it is so, then what would be the solution?
>
> Thanks again
>
> On Tuesday, September 17, 2019, Eibe Frank <[hidden email]> wrote:
> It seems to work fine for me. Note that you obviously need to have a least some data in the minority class. Also the “percentage” parameter needs to be set appropriately. 100% means that SMOTE will double the number of instances in the minority class. For example, if you have only 2 instances in the minority class, the filtered data will have 4 instances.
>
> Cheers,
> Eibe
>
> > On 15/09/2019, at 8:28 AM, javed khan <[hidden email]> wrote:
> >
> > Hi
> > I am using few datasets about software fault predictions, some of which are very imbalance, i.e. the class values are True and False and these values in some datasets are 0:100. It means true values are 0 and false values are 100%.
> >
> > I am using the SMOTE in Weka and for some datasets (when I apply SMOTE) it only increases only the majority class i.e. false values which are already 100%. I use the default index in SMOTE settings which is 0 and it supposes to increase the values of true but it doesn't. I am not sure what exactly I am doing wrong? .
> >
> > Thanks
> > _______________________________________________
> > Wekalist mailing list -- [hidden email]
> > Send posts to: To unsubscribe send an email to [hidden email]
> > To subscribe, unsubscribe, etc., visit %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> _______________________________________________
> Wekalist mailing list -- [hidden email]
> Send posts to: To unsubscribe send an email to [hidden email]
> To subscribe, unsubscribe, etc., visit %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> _______________________________________________
> Wekalist mailing list -- [hidden email]
> Send posts to: To unsubscribe send an email to [hidden email]
> To subscribe, unsubscribe, etc., visit %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Problem with class balancing (SMOTE)

Eibe Frank-2
Administrator
It sounds like one-class classification is what you need. Flipping a label and then generating data from that doesn't seem to make sense.

Cheers,
Eibe

On Wed, Sep 18, 2019 at 9:17 PM javed khan <[hidden email]> wrote:
Hi Eibe, don't you think in one class classification the results will be biased? In my dataset, there is no true class data, but what if we make one row(instance) from false to true and then apply the smote to make the ratio as 20:80? Do you think it would work or justifiable? 

Thanks 

On Wednesday, September 18, 2019, Eibe Frank <[hidden email]> wrote:
If there is no data, SMOTE won’t be able to generate instances.

If you have data from one class only and you want to learn a discriminative model, one option is to apply a technique for one-class classification.

Cheers,
Eibe

> On 17/09/2019, at 11:37 PM, javed khan <[hidden email]> wrote:
>
> Hi Eibe, thanks for your reply.
>
> It means if we have 0 percent data in the minority class, Smote will not work? If it is so, then what would be the solution?
>
> Thanks again
>
> On Tuesday, September 17, 2019, Eibe Frank <[hidden email]> wrote:
> It seems to work fine for me. Note that you obviously need to have a least some data in the minority class. Also the “percentage” parameter needs to be set appropriately. 100% means that SMOTE will double the number of instances in the minority class. For example, if you have only 2 instances in the minority class, the filtered data will have 4 instances.
>
> Cheers,
> Eibe
>
> > On 15/09/2019, at 8:28 AM, javed khan <[hidden email]> wrote:
> >
> > Hi
> > I am using few datasets about software fault predictions, some of which are very imbalance, i.e. the class values are True and False and these values in some datasets are 0:100. It means true values are 0 and false values are 100%.
> >
> > I am using the SMOTE in Weka and for some datasets (when I apply SMOTE) it only increases only the majority class i.e. false values which are already 100%. I use the default index in SMOTE settings which is 0 and it supposes to increase the values of true but it doesn't. I am not sure what exactly I am doing wrong? .
> >
> > Thanks
> > _______________________________________________
> > Wekalist mailing list -- [hidden email]
> > Send posts to: To unsubscribe send an email to [hidden email]
> > To subscribe, unsubscribe, etc., visit %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> _______________________________________________
> Wekalist mailing list -- [hidden email]
> Send posts to: To unsubscribe send an email to [hidden email]
> To subscribe, unsubscribe, etc., visit %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> _______________________________________________
> Wekalist mailing list -- [hidden email]
> Send posts to: To unsubscribe send an email to [hidden email]
> To subscribe, unsubscribe, etc., visit %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html