Sampling techniques classification

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Sampling techniques classification

MOHAMMED KAMAL


Hi,

I want to classify class balancer & resample& spread subsample
which of them over sampling and undersampling|
if one play both rules what's the parameter do this 
thanks in advance
Mohammed kamal




_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Sampling techniques classification

Eibe Frank-2
Administrator
ClassBalancer doesn’t do any sampling. It simply reweights the instances so that all classes have equal weight. It will only produce useful results when the base classifier implements the WeightedInstancesHandler interface.

The unsupervised Resample filter allows you to perform oversampling or undersampling, depending on how you set the percentage. Also, oversampling is only possible if you use sampling with replacement (the default). The same applies to the supervised Resample filter. However, in this case you can also get over/undersampling of individual classes if you change the sampling bias. (But, again, oversampling of classes is only possible when you use sampling with replacement.)

SpreadSubsample performs undersampling (for all classes except the minority class) as specified by the parameters.

Cheers,
Eibe

> On 16/05/2017, at 10:36 PM, ENGMohammed kamal <[hidden email]> wrote:
>
>
> Hi,
> I want to classify class balancer & resample& spread subsample
> which of them over sampling and undersampling|
> if one play both rules what's the parameter do this
> thanks in advance
> Mohammed kamal
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Sampling techniques classification

Peter holunaro
Hi Eibe,

What are the classifiers that implement the WeightedInstancesHandler interface?

Peter


On Wed, May 17, 2017 at 11:13 AM, Eibe Frank <[hidden email]> wrote:
ClassBalancer doesn’t do any sampling. It simply reweights the instances so that all classes have equal weight. It will only produce useful results when the base classifier implements the WeightedInstancesHandler interface.

The unsupervised Resample filter allows you to perform oversampling or undersampling, depending on how you set the percentage. Also, oversampling is only possible if you use sampling with replacement (the default). The same applies to the supervised Resample filter. However, in this case you can also get over/undersampling of individual classes if you change the sampling bias. (But, again, oversampling of classes is only possible when you use sampling with replacement.)

SpreadSubsample performs undersampling (for all classes except the minority class) as specified by the parameters.

Cheers,
Eibe

> On 16/05/2017, at 10:36 PM, ENGMohammed kamal <[hidden email]> wrote:
>
>
> Hi,
> I want to classify class balancer & resample& spread subsample
> which of them over sampling and undersampling|
> if one play both rules what's the parameter do this
> thanks in advance
> Mohammed kamal
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Sampling techniques classification

Peter Reutemann
> What are the classifiers that implement the WeightedInstancesHandler
> interface?

http://weka.sourceforge.net/doc.dev/weka/core/WeightedInstancesHandler.html

Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 858-5174
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Sampling techniques classification

Eibe Frank-2
Administrator
In reply to this post by Peter holunaro
The methods in the core WEKA distribution that implement WeightedInstancesHandler are listed here:

http://weka.sourceforge.net/doc.stable-3-8/weka/core/WeightedInstancesHandler.html

For the other ones, you will have to check the Javadoc of each corresponding package, e.g.:

http://weka.sourceforge.net/doc.packages/LibSVM/weka/classifiers/functions/LibSVM.html

shows that LibSVM does not implement the interface.

Cheers,
Eibe

> On 17/05/2017, at 3:50 PM, Peter holunaro <[hidden email]> wrote:
>
> Hi Eibe,
>
> What are the classifiers that implement the WeightedInstancesHandler interface?
>
> Peter
>
>
> On Wed, May 17, 2017 at 11:13 AM, Eibe Frank <[hidden email]> wrote:
> ClassBalancer doesn’t do any sampling. It simply reweights the instances so that all classes have equal weight. It will only produce useful results when the base classifier implements the WeightedInstancesHandler interface.
>
> The unsupervised Resample filter allows you to perform oversampling or undersampling, depending on how you set the percentage. Also, oversampling is only possible if you use sampling with replacement (the default). The same applies to the supervised Resample filter. However, in this case you can also get over/undersampling of individual classes if you change the sampling bias. (But, again, oversampling of classes is only possible when you use sampling with replacement.)
>
> SpreadSubsample performs undersampling (for all classes except the minority class) as specified by the parameters.
>
> Cheers,
> Eibe
>
> > On 16/05/2017, at 10:36 PM, ENGMohammed kamal <[hidden email]> wrote:
> >
> >
> > Hi,
> > I want to classify class balancer & resample& spread subsample
> > which of them over sampling and undersampling|
> > if one play both rules what's the parameter do this
> > thanks in advance
> > Mohammed kamal
> >
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Sampling techniques classification

Peter holunaro
Thank you very much, Peter and Eibe, for the very helpful links.

Peter

On Wed, May 17, 2017 at 11:57 AM, Eibe Frank <[hidden email]> wrote:
The methods in the core WEKA distribution that implement WeightedInstancesHandler are listed here:

http://weka.sourceforge.net/doc.stable-3-8/weka/core/WeightedInstancesHandler.html

For the other ones, you will have to check the Javadoc of each corresponding package, e.g.:

http://weka.sourceforge.net/doc.packages/LibSVM/weka/classifiers/functions/LibSVM.html

shows that LibSVM does not implement the interface.

Cheers,
Eibe

> On 17/05/2017, at 3:50 PM, Peter holunaro <[hidden email]> wrote:
>
> Hi Eibe,
>
> What are the classifiers that implement the WeightedInstancesHandler interface?
>
> Peter
>
>
> On Wed, May 17, 2017 at 11:13 AM, Eibe Frank <[hidden email]> wrote:
> ClassBalancer doesn’t do any sampling. It simply reweights the instances so that all classes have equal weight. It will only produce useful results when the base classifier implements the WeightedInstancesHandler interface.
>
> The unsupervised Resample filter allows you to perform oversampling or undersampling, depending on how you set the percentage. Also, oversampling is only possible if you use sampling with replacement (the default). The same applies to the supervised Resample filter. However, in this case you can also get over/undersampling of individual classes if you change the sampling bias. (But, again, oversampling of classes is only possible when you use sampling with replacement.)
>
> SpreadSubsample performs undersampling (for all classes except the minority class) as specified by the parameters.
>
> Cheers,
> Eibe
>
> > On 16/05/2017, at 10:36 PM, ENGMohammed kamal <[hidden email]> wrote:
> >
> >
> > Hi,
> > I want to classify class balancer & resample& spread subsample
> > which of them over sampling and undersampling|
> > if one play both rules what's the parameter do this
> > thanks in advance
> > Mohammed kamal
> >
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Sampling techniques classification

Keith Roy
Dear all, 

How "ClassBalancer" reweights the instances (what method it uses)?

Best
Keith

On Wed, May 17, 2017 at 12:00 PM, Peter holunaro <[hidden email]> wrote:
Thank you very much, Peter and Eibe, for the very helpful links.

Peter

On Wed, May 17, 2017 at 11:57 AM, Eibe Frank <[hidden email]> wrote:
The methods in the core WEKA distribution that implement WeightedInstancesHandler are listed here:

http://weka.sourceforge.net/doc.stable-3-8/weka/core/WeightedInstancesHandler.html

For the other ones, you will have to check the Javadoc of each corresponding package, e.g.:

http://weka.sourceforge.net/doc.packages/LibSVM/weka/classifiers/functions/LibSVM.html

shows that LibSVM does not implement the interface.

Cheers,
Eibe

> On 17/05/2017, at 3:50 PM, Peter holunaro <[hidden email]> wrote:
>
> Hi Eibe,
>
> What are the classifiers that implement the WeightedInstancesHandler interface?
>
> Peter
>
>
> On Wed, May 17, 2017 at 11:13 AM, Eibe Frank <[hidden email]> wrote:
> ClassBalancer doesn’t do any sampling. It simply reweights the instances so that all classes have equal weight. It will only produce useful results when the base classifier implements the WeightedInstancesHandler interface.
>
> The unsupervised Resample filter allows you to perform oversampling or undersampling, depending on how you set the percentage. Also, oversampling is only possible if you use sampling with replacement (the default). The same applies to the supervised Resample filter. However, in this case you can also get over/undersampling of individual classes if you change the sampling bias. (But, again, oversampling of classes is only possible when you use sampling with replacement.)
>
> SpreadSubsample performs undersampling (for all classes except the minority class) as specified by the parameters.
>
> Cheers,
> Eibe
>
> > On 16/05/2017, at 10:36 PM, ENGMohammed kamal <[hidden email]> wrote:
> >
> >
> > Hi,
> > I want to classify class balancer & resample& spread subsample
> > which of them over sampling and undersampling|
> > if one play both rules what's the parameter do this
> > thanks in advance
> > Mohammed kamal
> >
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Sampling techniques classification

MOHAMMED KAMAL
In reply to this post by MOHAMMED KAMAL


Thx Eibe and Peter 
Now i have more questions

1- what's the difference between resample (supervised and unsupervised ) versions ?   is it bias to uniform class?
2- I know that it's better to use filtered classifier if i want to compare same classifier performance with different sampling techniques to compare ( say j48 ) j48+ resample vs. j48+ SMOTE vs. j48+subsample is this true?
3-  if the answer is yes in question 2.  , as i asked before i have a problem now because i can't use filtered classifier with sampling techniques in WEKA-SPARK?? may be i make something wrong as i understand from previous answer that this because i use filtered classifier after arff header spark job. Therefore tell me if i want to make job using SPARK from command line what's the correct sequence to use sampling techniques please.

regards

Mohammed kamal
 






From: [hidden email] <[hidden email]> on behalf of [hidden email] <[hidden email]>
Sent: Wednesday, May 17, 2017 7:00 AM
To: [hidden email]
Subject: Wekalist Digest, Vol 171, Issue 65
 
Send Wekalist mailing list submissions to
        [hidden email]

To subscribe or unsubscribe via the World Wide Web, visit
        https://list.waikato.ac.nz/mailman/listinfo/wekalist


or, via email, send a message with subject or body 'help' to
        [hidden email]

You can reach the person managing the list at
        [hidden email]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Wekalist digest..."


Today's Topics:

   1. Re: Sampling techniques classification (Peter holunaro)
   2. Re: Sampling techniques classification (Peter Reutemann)
   3. Re: Sampling techniques classification (Eibe Frank)
   4. Re: Sampling techniques classification (Peter holunaro)


----------------------------------------------------------------------

Message: 1
Date: Wed, 17 May 2017 11:50:18 +0800
From: Peter holunaro <[hidden email]>
To: "Weka machine learning workbench list."
        <[hidden email]>
Subject: Re: [Wekalist] Sampling techniques classification
Message-ID:
        <CAGn1Z5_ZF8gmugRupqHfX-AV6vnzz2XmWJzGZ5VZM+[hidden email]>
Content-Type: text/plain; charset="utf-8"

Hi Eibe,

What are the classifiers that implement the WeightedInstancesHandler
interface?

Peter


On Wed, May 17, 2017 at 11:13 AM, Eibe Frank <[hidden email]> wrote:

> ClassBalancer doesn?t do any sampling. It simply reweights the instances
> so that all classes have equal weight. It will only produce useful results
> when the base classifier implements the WeightedInstancesHandler interface.
>
> The unsupervised Resample filter allows you to perform oversampling or
> undersampling, depending on how you set the percentage. Also, oversampling
> is only possible if you use sampling with replacement (the default). The
> same applies to the supervised Resample filter. However, in this case you
> can also get over/undersampling of individual classes if you change the
> sampling bias. (But, again, oversampling of classes is only possible when
> you use sampling with replacement.)
>
> SpreadSubsample performs undersampling (for all classes except the
> minority class) as specified by the parameters.
>
> Cheers,
> Eibe
>
> > On 16/05/2017, at 10:36 PM, ENGMohammed kamal <[hidden email]>
> wrote:
> >
> >
> > Hi,
> > I want to classify class balancer & resample& spread subsample
> > which of them over sampling and undersampling|
> > if one play both rules what's the parameter do this
> > thanks in advance
> > Mohammed kamal
> >
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/


> mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~
> ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/


> mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~
> ml/weka/mailinglist_etiquette.html
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.waikato.ac.nz/pipermail/wekalist/attachments/20170517/496a965b/attachment-0001.html>

------------------------------

Message: 2
Date: Wed, 17 May 2017 15:56:34 +1200
From: Peter Reutemann <[hidden email]>
To: "Weka machine learning workbench list."
        <[hidden email]>
Subject: Re: [Wekalist] Sampling techniques classification
Message-ID:
        <CAHoQ12+CPhNaUvV2G0CQgoJCQo+cHMLPwqZ0b5bYnxMpa9d=[hidden email]>
Content-Type: text/plain; charset="UTF-8"

> What are the classifiers that implement the WeightedInstancesHandler
> interface?

http://weka.sourceforge.net/doc.dev/weka/core/WeightedInstancesHandler.html



Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 858-5174
http://www.cms.waikato.ac.nz/~fracpete/


http://www.data-mining.co.nz/




------------------------------

Message: 3
Date: Wed, 17 May 2017 15:57:55 +1200
From: Eibe Frank <[hidden email]>
To: "Weka machine learning workbench list."
        <[hidden email]>
Subject: Re: [Wekalist] Sampling techniques classification
Message-ID: <[hidden email]>
Content-Type: text/plain; charset=utf-8

The methods in the core WEKA distribution that implement WeightedInstancesHandler are listed here:

http://weka.sourceforge.net/doc.stable-3-8/weka/core/WeightedInstancesHandler.html



For the other ones, you will have to check the Javadoc of each corresponding package, e.g.:

http://weka.sourceforge.net/doc.packages/LibSVM/weka/classifiers/functions/LibSVM.html



shows that LibSVM does not implement the interface.

Cheers,
Eibe

> On 17/05/2017, at 3:50 PM, Peter holunaro <[hidden email]> wrote:
>
> Hi Eibe,
>
> What are the classifiers that implement the WeightedInstancesHandler interface?
>
> Peter
>
>
> On Wed, May 17, 2017 at 11:13 AM, Eibe Frank <[hidden email]> wrote:
> ClassBalancer doesn?t do any sampling. It simply reweights the instances so that all classes have equal weight. It will only produce useful results when the base classifier implements the WeightedInstancesHandler interface.
>
> The unsupervised Resample filter allows you to perform oversampling or undersampling, depending on how you set the percentage. Also, oversampling is only possible if you use sampling with replacement (the default). The same applies to the supervised Resample filter. However, in this case you can also get over/undersampling of individual classes if you change the sampling bias. (But, again, oversampling of classes is only possible when you use sampling with replacement.)
>
> SpreadSubsample performs undersampling (for all classes except the minority class) as specified by the parameters.
>
> Cheers,
> Eibe
>
> > On 16/05/2017, at 10:36 PM, ENGMohammed kamal <[hidden email]> wrote:
> >
> >
> > Hi,
> > I want to classify class balancer & resample& spread subsample
> > which of them over sampling and undersampling|
> > if one play both rules what's the parameter do this
> > thanks in advance
> > Mohammed kamal
> >
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist


> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist


> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist


> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html





------------------------------

Message: 4
Date: Wed, 17 May 2017 12:00:10 +0800
From: Peter holunaro <[hidden email]>
To: "Weka machine learning workbench list."
        <[hidden email]>
Subject: Re: [Wekalist] Sampling techniques classification
Message-ID:
        <[hidden email]>
Content-Type: text/plain; charset="utf-8"

Thank you very much, Peter and Eibe, for the very helpful links.

Peter

On Wed, May 17, 2017 at 11:57 AM, Eibe Frank <[hidden email]> wrote:

> The methods in the core WEKA distribution that implement
> WeightedInstancesHandler are listed here:
>
> http://weka.sourceforge.net/doc.stable-3-8/weka/core/
> WeightedInstancesHandler.html
>
> For the other ones, you will have to check the Javadoc of each
> corresponding package, e.g.:
>
> http://weka.sourceforge.net/doc.packages/LibSVM/weka/
> classifiers/functions/LibSVM.html
>
> shows that LibSVM does not implement the interface.
>
> Cheers,
> Eibe
>
> > On 17/05/2017, at 3:50 PM, Peter holunaro <[hidden email]>
> wrote:
> >
> > Hi Eibe,
> >
> > What are the classifiers that implement the WeightedInstancesHandler
> interface?
> >
> > Peter
> >
> >
> > On Wed, May 17, 2017 at 11:13 AM, Eibe Frank <[hidden email]> wrote:
> > ClassBalancer doesn?t do any sampling. It simply reweights the instances
> so that all classes have equal weight. It will only produce useful results
> when the base classifier implements the WeightedInstancesHandler interface.
> >
> > The unsupervised Resample filter allows you to perform oversampling or
> undersampling, depending on how you set the percentage. Also, oversampling
> is only possible if you use sampling with replacement (the default). The
> same applies to the supervised Resample filter. However, in this case you
> can also get over/undersampling of individual classes if you change the
> sampling bias. (But, again, oversampling of classes is only possible when
> you use sampling with replacement.)
> >
> > SpreadSubsample performs undersampling (for all classes except the
> minority class) as specified by the parameters.
> >
> > Cheers,
> > Eibe
> >
> > > On 16/05/2017, at 10:36 PM, ENGMohammed kamal <[hidden email]>
> wrote:
> > >
> > >
> > > Hi,
> > > I want to classify class balancer & resample& spread subsample
> > > which of them over sampling and undersampling|
> > > if one play both rules what's the parameter do this
> > > thanks in advance
> > > Mohammed kamal
> > >
> > > _______________________________________________
> > > Wekalist mailing list
> > > Send posts to: [hidden email]
> > > List info and subscription status: https://list.waikato.ac.nz/


> mailman/listinfo/wekalist
> > > List etiquette: http://www.cs.waikato.ac.nz/~
> ml/weka/mailinglist_etiquette.html
> >
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/


> mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~
> ml/weka/mailinglist_etiquette.html
> >
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/


> mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~
> ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/


> mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~
> ml/weka/mailinglist_etiquette.html
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.waikato.ac.nz/pipermail/wekalist/attachments/20170517/43f77d1a/attachment.html>

------------------------------

_______________________________________________
Wekalist mailing list
[hidden email]
https://list.waikato.ac.nz/mailman/listinfo/wekalist




End of Wekalist Digest, Vol 171, Issue 65
*****************************************

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Sampling techniques classification

Eibe Frank-2
Administrator

> On 17 May 2017, at 18:14, ENGMohammed kamal <[hidden email]> wrote:
>
> 1- what's the difference between resample (supervised and unsupervised ) versions ?   is it bias to uniform class?

The supervised version takes the distribution of instances into classes into account when sampling. The unsupervised version completely ignores the class when picking instances.

> 2- I know that it's better to use filtered classifier if i want to compare same classifier performance with different sampling techniques to compare ( say j48 ) j48+ resample vs. j48+ SMOTE vs. j48+subsample is this true?

Yes, it's best to use the FilteredClassifier to compare accuracy.

I don't know enough about the distributed WEKA stuff to answer your answer question. However, it seems that you should be able to build several FilteredClassifier objects in parallel and combine them using Voting.

Cheers,
Eibe



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Sampling techniques classification

Eibe Frank-2
Administrator
In reply to this post by Keith Roy
This is what happens when the class is a nominal attribute:

Let W_i be the sum of the weights of the training instances in class i and let C be the number of classes. Let W be the sum of all weights in the data. Then, given an instance with weight w in class i, the weight of this instance is set to W/(W_i*C)*w.

After all the instances' weights have been changed this way, the sum of weights of the instances in each class will be W/C. The total sum of weights will be W, i.e., it will remain unchanged.

This is what happens when the target attribute ("class") is numeric:

The target attribute of the data is discretised into the user-specified number of bins using the unsupervised Discretize filter with default settings (i.e., equal-width discretization). The bins in the discretisation are treated like nominal class values and the above process for changing weights in the nominal class case is applied.

Cheers,
Eibe

> On 17 May 2017, at 16:19, Keith Roy <[hidden email]> wrote:
>
> Dear all,
>
> How "ClassBalancer" reweights the instances (what method it uses)?
>
> Best
> Keith
>
> On Wed, May 17, 2017 at 12:00 PM, Peter holunaro <[hidden email]> wrote:
> Thank you very much, Peter and Eibe, for the very helpful links.
>
> Peter
>
> On Wed, May 17, 2017 at 11:57 AM, Eibe Frank <[hidden email]> wrote:
> The methods in the core WEKA distribution that implement WeightedInstancesHandler are listed here:
>
> http://weka.sourceforge.net/doc.stable-3-8/weka/core/WeightedInstancesHandler.html
>
> For the other ones, you will have to check the Javadoc of each corresponding package, e.g.:
>
> http://weka.sourceforge.net/doc.packages/LibSVM/weka/classifiers/functions/LibSVM.html
>
> shows that LibSVM does not implement the interface.
>
> Cheers,
> Eibe
>
> > On 17/05/2017, at 3:50 PM, Peter holunaro <[hidden email]> wrote:
> >
> > Hi Eibe,
> >
> > What are the classifiers that implement the WeightedInstancesHandler interface?
> >
> > Peter
> >
> >
> > On Wed, May 17, 2017 at 11:13 AM, Eibe Frank <[hidden email]> wrote:
> > ClassBalancer doesn’t do any sampling. It simply reweights the instances so that all classes have equal weight. It will only produce useful results when the base classifier implements the WeightedInstancesHandler interface.
> >
> > The unsupervised Resample filter allows you to perform oversampling or undersampling, depending on how you set the percentage. Also, oversampling is only possible if you use sampling with replacement (the default). The same applies to the supervised Resample filter. However, in this case you can also get over/undersampling of individual classes if you change the sampling bias. (But, again, oversampling of classes is only possible when you use sampling with replacement.)
> >
> > SpreadSubsample performs undersampling (for all classes except the minority class) as specified by the parameters.
> >
> > Cheers,
> > Eibe
> >
> > > On 16/05/2017, at 10:36 PM, ENGMohammed kamal <[hidden email]> wrote:
> > >
> > >
> > > Hi,
> > > I want to classify class balancer & resample& spread subsample
> > > which of them over sampling and undersampling|
> > > if one play both rules what's the parameter do this
> > > thanks in advance
> > > Mohammed kamal
> > >
> > > _______________________________________________
> > > Wekalist mailing list
> > > Send posts to: [hidden email]
> > > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> >
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> >
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Sampling techniques classification

Keith Roy


This is what happens when the class is a nominal attribute:

Let W_i be the sum of the weights of the training instances in class i and let C be the number of classes. Let W be the sum of all weights in the data. Then, given an instance with weight w in class i, the weight of this instance is set to W/(W_i*C)*w.

After all the instances' weights have been changed this way, the sum of weights of the instances in each class will be W/C. The total sum of weights will be W, i.e., it will remain unchanged.

This is what happens when the target attribute ("class") is numeric:

The target attribute of the data is discretised into the user-specified number of bins using the unsupervised Discretize filter with default settings (i.e., equal-width discretization). The bins in the discretisation are treated like nominal class values and the above process for changing weights in the nominal class case is applied.

Thak you so much, Eibe, for the very clear insight.  

However, I am wondering whether same process of ClassBalancer is performed with numeric class. Any ideas about that?

Thank you!

Keith  

Cheers,
Eibe

> On 17 May 2017, at 16:19, Keith Roy <[hidden email]> wrote:
>
> Dear all,
>
> How "ClassBalancer" reweights the instances (what method it uses)?
>
> Best
> Keith
>
> On Wed, May 17, 2017 at 12:00 PM, Peter holunaro <[hidden email]> wrote:
> Thank you very much, Peter and Eibe, for the very helpful links.
>
> Peter
>
> On Wed, May 17, 2017 at 11:57 AM, Eibe Frank <[hidden email]> wrote:
> The methods in the core WEKA distribution that implement WeightedInstancesHandler are listed here:
>
> http://weka.sourceforge.net/doc.stable-3-8/weka/core/WeightedInstancesHandler.html
>
> For the other ones, you will have to check the Javadoc of each corresponding package, e.g.:
>
> http://weka.sourceforge.net/doc.packages/LibSVM/weka/classifiers/functions/LibSVM.html
>
> shows that LibSVM does not implement the interface.
>
> Cheers,
> Eibe
>
> > On 17/05/2017, at 3:50 PM, Peter holunaro <[hidden email]> wrote:
> >
> > Hi Eibe,
> >
> > What are the classifiers that implement the WeightedInstancesHandler interface?
> >
> > Peter
> >
> >
> > On Wed, May 17, 2017 at 11:13 AM, Eibe Frank <[hidden email]> wrote:
> > ClassBalancer doesn’t do any sampling. It simply reweights the instances so that all classes have equal weight. It will only produce useful results when the base classifier implements the WeightedInstancesHandler interface.
> >
> > The unsupervised Resample filter allows you to perform oversampling or undersampling, depending on how you set the percentage. Also, oversampling is only possible if you use sampling with replacement (the default). The same applies to the supervised Resample filter. However, in this case you can also get over/undersampling of individual classes if you change the sampling bias. (But, again, oversampling of classes is only possible when you use sampling with replacement.)
> >
> > SpreadSubsample performs undersampling (for all classes except the minority class) as specified by the parameters.
> >
> > Cheers,
> > Eibe
> >
> > > On 16/05/2017, at 10:36 PM, ENGMohammed kamal <[hidden email]> wrote:
> > >
> > >
> > > Hi,
> > > I want to classify class balancer & resample& spread subsample
> > > which of them over sampling and undersampling|
> > > if one play both rules what's the parameter do this
> > > thanks in advance
> > > Mohammed kamal
> > >
> > > _______________________________________________
> > > Wekalist mailing list
> > > Send posts to: [hidden email]
> > > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> >
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> >
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Sampling techniques classification

Eibe Frank-2
Administrator

> On 18/05/2017, at 1:46 PM, Keith Roy <[hidden email]> wrote:
>
>
>
> This is what happens when the class is a nominal attribute:
>
> Let W_i be the sum of the weights of the training instances in class i and let C be the number of classes. Let W be the sum of all weights in the data. Then, given an instance with weight w in class i, the weight of this instance is set to W/(W_i*C)*w.
>
> After all the instances' weights have been changed this way, the sum of weights of the instances in each class will be W/C. The total sum of weights will be W, i.e., it will remain unchanged.
>
> This is what happens when the target attribute ("class") is numeric:
>
> The target attribute of the data is discretised into the user-specified number of bins using the unsupervised Discretize filter with default settings (i.e., equal-width discretization). The bins in the discretisation are treated like nominal class values and the above process for changing weights in the nominal class case is applied.
>
> Thak you so much, Eibe, for the very clear insight.  
>
> However, I am wondering whether same process of ClassBalancer is performed with numeric class. Any ideas about that?

I’ve actually already tried to explain how it handles numeric target attributes by discretisation (see above).

Cheers,
Eibe

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Sampling techniques classification

Keith Roy
Sorry Eibe, I meant the way that ClassBalancer handles relational class.

Keith

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Sampling techniques classification

Eibe Frank-2
Administrator
WEKA doesn’t support relation-valued classes at the moment.

Cheers,
Eibe

> On 18/05/2017, at 2:47 PM, Keith Roy <[hidden email]> wrote:
>
> Sorry Eibe, I meant the way that ClassBalancer handles relational class.
>
> Keith
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Sampling techniques classification

Keith Roy
I see. Thanks for your valuable reply.

Cheers, 
Keith

On Thu, May 18, 2017 at 10:57 AM, Eibe Frank <[hidden email]> wrote:
WEKA doesn’t support relation-valued classes at the moment.

Cheers,
Eibe

> On 18/05/2017, at 2:47 PM, Keith Roy <[hidden email]> wrote:
>
> Sorry Eibe, I meant the way that ClassBalancer handles relational class.
>
> Keith
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html