Percentage split option

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Percentage split option

Fernando Lavin
Hi all, 

How Weka splits training and test data in "Percentage split" option? For instance, when we insert the value 66% (in Percentage split option), how Weka selects those 66% instances to be a training set, while the remaining instances are considered as a test set?

Thanks in advance.
Fernando

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Percentage split option

Eibe Frank-3
First, the data is shuffled randomly based on the random number seed found under "More options..." (unless the user has requested the order to be preserved). This is done using the Fisher-Yates shuffle. Then the first 66% of the instances in the shuffled data will be used for training and the rest for testing.

This is the same for the Explorer, the KnowledgeFlow, and the CLI.

The Experimenter does something very slightly different to reduce variance in the estimates if the class attribute is nominal by implementing stratification: after shuffling, approximately 66% of the data for *each class* is taken and the union of these per-class subsets is used as the training set. The rest of the data is used for testing.

Cheers,
Eibe

On Fri, Jan 6, 2017 at 3:39 AM, Fernando Lavin <[hidden email]> wrote:
Hi all, 

How Weka splits training and test data in "Percentage split" option? For instance, when we insert the value 66% (in Percentage split option), how Weka selects those 66% instances to be a training set, while the remaining instances are considered as a test set?

Thanks in advance.
Fernando

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Percentage split option

Fernando Lavin


First, the data is shuffled randomly based on the random number seed found under "More options..." (unless the user has requested the order to be preserved). This is done using the Fisher-Yates shuffle. Then the first 66% of the instances in the shuffled data will be used for training and the rest for testing.

Eibe, 

- Is it the process of selecting *first* 66% (if set) of the data is *always* required either if user has requested or not requested to preserve the order? In other words, if the order is not preserved is WEKA going to select also first 66% of the instances as a training set or this would be occurred randomly (i.e., first instances are not necessary to be selected, but rather, last instances might be chosen as well)?
 


 
This is the same for the Explorer, the KnowledgeFlow, and the CLI.

The Experimenter does something very slightly different to reduce variance in the estimates if the class attribute is nominal by implementing stratification: after shuffling, approximately 66% of the data for *each class* is taken and the union of these per-class subsets is used as the training set. The rest of the data is used for testing.

- Does WEKA (by default) performs cross-validation 1 time in the Explorer, while it runs cross-validation 10 times in the Experimenter?


- Are we able to run cross-validation 10 times in the Explorer? If yes, then how to do that?


Thank you.
Fernando
 

Cheers,
Eibe

On Fri, Jan 6, 2017 at 3:39 AM, Fernando Lavin <[hidden email]> wrote:
Hi all, 

How Weka splits training and test data in "Percentage split" option? For instance, when we insert the value 66% (in Percentage split option), how Weka selects those 66% instances to be a training set, while the remaining instances are considered as a test set?

Thanks in advance.
Fernando

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Percentage split option

Eibe Frank-3


On Sun, Jan 8, 2017 at 5:05 PM, Fernando Lavin <[hidden email]> wrote:


First, the data is shuffled randomly based on the random number seed found under "More options..." (unless the user has requested the order to be preserved). This is done using the Fisher-Yates shuffle. Then the first 66% of the instances in the shuffled data will be used for training and the rest for testing.

Eibe, 

- Is it the process of selecting *first* 66% (if set) of the data is *always* required either if user has requested or not requested to preserve the order? In other words, if the order is not preserved is WEKA going to select also first 66% of the instances as a training set or this would be occurred randomly (i.e., first instances are not necessary to be selected, but rather, last instances might be chosen as well)?
 
All instances will have equal probability of being chosen because the data is shuffled first, e.g., each instance in the original dataset will have the same probability of being the first instance in the shuffled version of the data.
 
This is the same for the Explorer, the KnowledgeFlow, and the CLI.

The Experimenter does something very slightly different to reduce variance in the estimates if the class attribute is nominal by implementing stratification: after shuffling, approximately 66% of the data for *each class* is taken and the union of these per-class subsets is used as the training set. The rest of the data is used for testing.

- Does WEKA (by default) performs cross-validation 1 time in the Explorer, while it runs cross-validation 10 times in the Experimenter?
 
Yes, but you can easily change this in the Experimenter. If you select 1 run in the Experimenter, cross-validation will be performed only once.
 
- Are we able to run cross-validation 10 times in the Explorer? If yes, then how to do that?

You can run 10-fold cross-validation manually ten times, each time changing the seed for the random number generator under "More options...". However, you will also need to average the results manually.

I think there is a code example for an Explorer plug-in somewhere that enables you to run experiments in a separate Explorer tab, but I suppose you might as well just start up the Experimenter.

Cheers,
Eibe

 
On Fri, Jan 6, 2017 at 3:39 AM, Fernando Lavin <[hidden email]> wrote:
Hi all, 

How Weka splits training and test data in "Percentage split" option? For instance, when we insert the value 66% (in Percentage split option), how Weka selects those 66% instances to be a training set, while the remaining instances are considered as a test set?

Thanks in advance.
Fernando

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Loading...