Some Bugs and Qt as to how to create a valid Validation Classification Pipeline

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Some Bugs and Qt as to how to create a valid Validation Classification Pipeline

JamesS

Hi,

 

For reference for Data assume 100 numerical instances with ten attributes and a binary categorical class of Yes/No.

 

Initiative:

I am trying to create a classification pipeline which incorporates the “Cluster Membership” filter found under the Preprocess tab in the Explorer under Unsupervised attribute, as a part of a preprocessing feature engineering strategy. The filter converts data to clusters and associated weights for each instance. It uses class information for its learning.

 

Problem:

I don’t know how to create a valid classification pipeline with the above filter in Weka.

 

Goals:

Training Data:

(1) Learn “Cluster Membership” filter on class information. Save learned model (ClusterMembershipModel)

(2) Learn Classifier on new features learned by the Cluster Membership filter. Save Classifier Model.

 

Validation Data: (that contains no Class information)

(1) Apply ClusterMembershipModel  THEN apply Classifier Model . Optimize results.

 

Questions:

(1) In Weka how can I learn the Cluster Membership filter and save the model?

 

(2) If (1) is not possible,  Is there any way in Weka that will allow me to learn the Cluster Membership filter on the class data and then apply that learned model to unclassified validation data?

 

(3) Is there any way in Weka of generating a pipeline which will allow me to confirm the benefits of the Cluster Membership filter? (Unless one can save a learned model for the filter cross-validation or percentage split or supplied validation test set cannot be used to validate the benefits of the filter. The reason is that the data is biased as it has been generated by a filter that uses class information for the generation  of new features)

 

 

Here are some bugs:

Is there an official place to upload Bug info.?

 

(1) In the Explorer, under Rules, if one chooses the MultiObjectiveEvolutionary Classifier and then does a percentage split test the classifier builds the model on the full training data  as opposed to building the model on the training split. One can see that as this fact  is being displayed on the bottom of the GUI.  This can be corrected if one clicks the “More Options” and deselects “Output Model”.

(2) If one goes under the “Select Attibutes” tab and chooses “WrapperSubsetEval” and one chooses the MultiObjectiveEvolutionaryFuzzyClassifier  under Rules an error is logged which states: “This classiffier doesn't support databases with only categorical attributes. Please, use MultiObjectiveEvolutionaryCategoricalClassifier”.  This does not make sense when all the attributes are in an “arff” file and all attributes are numerical.

(3) If one goes under the preprocess tab and selects filters.supervised.attribute.AttributeSelection and chooses attribute.selection.MultiObjectiveEvolutionarySearch in the Search Command line after about 1minute of searching for attributes the following error is displayed:

“Attribute names are not unique! Causes: ‘Atribute1’ ‘Atribute1’ ‘Atribute1’

All my attributes are unique so the above is an error.

 

 

 

Best,

Richard


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Some Bugs and Qt as to how to create a valid ValidationClassification Pipeline

Eibe Frank-2
Administrator

You can apply the ClusterMembership filter as part of the FilteredClassifier. This will ensure that the filtering model will be built from the training set(s) only and that it will be applied to the test data correctly.

 

Regarding the bugs:

 

  1. That’s a feature. By default, WEKA always outputs the model built from the full dataset (e.g., in the Explorer, the model built from the entire data loaded into the Preprocess panel), regardless of the evaluation method that is chosen. (That means the learning algorithm is run an additional time on the full dataset, not just once on the training split if you use a percentage split evaluation.)

2 and 3) This looks like an issue with the MultiObjectiveEvolutionaryFuzzyClassifier package. I’m copying this to the package maintainer.

 

Cheers,

Eibe

 

From: [hidden email]
Sent: Tuesday, 30 July 2019 1:38 PM
To: [hidden email]
Subject: [Wekalist] Some Bugs and Qt as to how to create a valid ValidationClassification Pipeline

 

Hi,

 

For reference for Data assume 100 numerical instances with ten attributes and a binary categorical class of Yes/No.

 

Initiative:

I am trying to create a classification pipeline which incorporates the “Cluster Membership” filter found under the Preprocess tab in the Explorer under Unsupervised attribute, as a part of a preprocessing feature engineering strategy. The filter converts data to clusters and associated weights for each instance. It uses class information for its learning.

 

Problem:

I don’t know how to create a valid classification pipeline with the above filter in Weka.

 

Goals:

Training Data:

(1) Learn “Cluster Membership” filter on class information. Save learned model (ClusterMembershipModel)

(2) Learn Classifier on new features learned by the Cluster Membership filter. Save Classifier Model.

 

Validation Data: (that contains no Class information)

(1) Apply ClusterMembershipModel  THEN apply Classifier Model . Optimize results.

 

Questions:

(1) In Weka how can I learn the Cluster Membership filter and save the model?

 

(2) If (1) is not possible,  Is there any way in Weka that will allow me to learn the Cluster Membership filter on the class data and then apply that learned model to unclassified validation data?

 

(3) Is there any way in Weka of generating a pipeline which will allow me to confirm the benefits of the Cluster Membership filter? (Unless one can save a learned model for the filter cross-validation or percentage split or supplied validation test set cannot be used to validate the benefits of the filter. The reason is that the data is biased as it has been generated by a filter that uses class information for the generation  of new features)

 

 

Here are some bugs:

Is there an official place to upload Bug info.?

 

(1) In the Explorer, under Rules, if one chooses the MultiObjectiveEvolutionary Classifier and then does a percentage split test the classifier builds the model on the full training data  as opposed to building the model on the training split. One can see that as this fact  is being displayed on the bottom of the GUI.  This can be corrected if one clicks the “More Options” and deselects “Output Model”.

(2) If one goes under the “Select Attibutes” tab and chooses “WrapperSubsetEval” and one chooses the MultiObjectiveEvolutionaryFuzzyClassifier  under Rules an error is logged which states: “This classiffier doesn't support databases with only categorical attributes. Please, use MultiObjectiveEvolutionaryCategoricalClassifier”.  This does not make sense when all the attributes are in an “arff” file and all attributes are numerical.

(3) If one goes under the preprocess tab and selects filters.supervised.attribute.AttributeSelection and chooses attribute.selection.MultiObjectiveEvolutionarySearch in the Search Command line after about 1minute of searching for attributes the following error is displayed:

“Attribute names are not unique! Causes: ‘Atribute1’ ‘Atribute1’ ‘Atribute1’

All my attributes are unique so the above is an error.

 

 

 

Best,

Richard

 


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html