> On 12 May 2017, at 18:49, valerio jus <[hidden email]> wrote:
> 1- What are the situations of using the supervised version of the "Resample" filter?
When using the FilteredClassifier for a classification problem to reduce the training data and/or reduce class skew before applying a base learner.
> 2- What are the cases to use supervised version of the Resample with "FilteredClassifier"? Is it useful (?) or better to apply the *unsupervised* version of the Resample filter in this case?
The unsupervised filter does not take the class distribution into account at all. If you have a classification problem, you will probably want to use the supervised version, even if you don't want to reduce class skew: the supervised version will ensure that the subsample is stratified so that the original class distribution is preserved in the subsample. The unsupervised version just samples randomly, so the subsample may not represent the original class distribution exactly.
You will obviously also want to use the supervised version if you want to reduce class skew in the training data.
> 3- Is the supervised version of the Resample filter considered as oversample technique?
That depends on the configuration. Yes, *if* you use sampling with replacement and you sample more instances for a particular class than there are in the original training set.
Note that both filters only modify the first batch of data they filter. For example, in the FilteredClassifier, they modify only the training data, and leave the test data untouched.