Generating training and test sets with supervised resample filter

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Generating training and test sets with supervised resample filter

Paulo Ferreira
Hello, all!!

I'm trying to split a dataset into training and test sets using the supervised resample filter in a Java application.

1) is this possible?
2) My code looks like this, but the test set keeps returning all the dataset:

public static Instances[] geraTreinoTeste(Instances dataset, double percentTraining, int seed){
        Instances dados[],training,test;
       
        dados=new Instances[2];
       
        Resample filtro=new Resample();

        // 0.0 => distribuição da classe como está
        // 1.0 => classe com distribuição uniformizada
        filtro.setBiasToUniformClass(0.0);

        // Sem reposição das instâncias seleccionadas
        filtro.setNoReplacement(true);

        // Percentagem a amostrar
        filtro.setSampleSizePercent(percentTraining);

        // Seed
        filtro.setSeed(seed);
           
        try {    
            filtro.setInputFormat(dataset);       // prepare the filter for the data format
           
            filtro.setInvertSelection(false);  // do not invert the selection
           
            // apply filter
            training = Filter.useFilter(dataset, filtro);
           
            filtro.setInvertSelection(true);  // invert the selection
            test = Filter.useFilter(dataset, filtro);
  
            dados[0]=training;
            dados[1]=test;
           
            return dados;
        } catch (Exception ex) {
            Logger.getLogger(BenchmarkIDS.class.getName()).log(Level.SEVERE, null, ex);
            System.out.println("Error: " + ex);
            System.exit(1);
            return null;
        }                        
    }

What am I doing wrong?

Thank you for your help,

Paulo Ferreira

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Generating training and test sets with supervised resample filter

Eibe Frank-3
1) Yes.

2) The problem in your code is that the second time you apply useFilter() on the same Resample object without applying setInputFormat() again to reinitialise the filter, it will simply pass through the data without modification. By design, the Resample filter will only modify the first batch of data it receives. This is so that the FilteredClassifier works correctly when applying this filter with it: we do not want the test data to be modified.

Try reinitialising the filter by calling setInputFormat() again before you apply useFilter() the second time. Alternatively, just create a new Resample filter object from scratch for the second application of useFilter.

Cheers,
Eibe

On Wed, Feb 5, 2020 at 9:48 AM Paulo Ferreira <[hidden email]> wrote:
Hello, all!!

I'm trying to split a dataset into training and test sets using the supervised resample filter in a Java application.

1) is this possible?
2) My code looks like this, but the test set keeps returning all the dataset:

public static Instances[] geraTreinoTeste(Instances dataset, double percentTraining, int seed){
        Instances dados[],training,test;
       
        dados=new Instances[2];
       
        Resample filtro=new Resample();

        // 0.0 => distribuição da classe como está
        // 1.0 => classe com distribuição uniformizada
        filtro.setBiasToUniformClass(0.0);

        // Sem reposição das instâncias seleccionadas
        filtro.setNoReplacement(true);

        // Percentagem a amostrar
        filtro.setSampleSizePercent(percentTraining);

        // Seed
        filtro.setSeed(seed);
           
        try {    
            filtro.setInputFormat(dataset);       // prepare the filter for the data format
           
            filtro.setInvertSelection(false);  // do not invert the selection
           
            // apply filter
            training = Filter.useFilter(dataset, filtro);
           
            filtro.setInvertSelection(true);  // invert the selection
            test = Filter.useFilter(dataset, filtro);
  
            dados[0]=training;
            dados[1]=test;
           
            return dados;
        } catch (Exception ex) {
            Logger.getLogger(BenchmarkIDS.class.getName()).log(Level.SEVERE, null, ex);
            System.out.println("Error: " + ex);
            System.exit(1);
            return null;
        }                        
    }

What am I doing wrong?

Thank you for your help,

Paulo Ferreira
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Generating training and test sets with supervised resample filter

Paulo Ferreira
Good afternoon, Eibe.

Just to let you know that, thanks to your help, my code is working correctly now.

Thank you for your support!

Best regards,

Paulo Ferreira
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html