sorry but another simple student question about meta attribute selected classifier

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

sorry but another simple student question about meta attribute selected classifier

Hao Li

Sorry but one more short question, I know I can find these simple information online, I did that. Its just that I’m not 100% certain if my understanding is correct. Hence I very much desire a confirmation from an expert. Its really just a yes or no question.

 

My question is: Weka can automatically perform feature selection prior to classification by using    ‘weka meta attribute selected classifier’ Is this correct. For example,

I use the run option as below

 

weka.classifiers.meta.AttributeSelectedClassifier -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W weka.classifiers.trees.RandomForest -- -P 100 -attribute-importance -I 100 -num-slots 1 -K 0 -M 1.0 -V 0.001 -S 1

 

In this case Weka will first take the input data ,pass it through CfsSubsetEval feature selector and then pass the reduced dataset on to random forest. Is my understanding correct.

And if test set validation is used, Weka will automatically use the features selected by CfsSubsetEval from the TRAINING set and select the selected features from the TEST set.

Is this correct?

Basically the above run option will make sure the training set undergoes feature selection first before training a prediction model, afterwards, the test set features will be reduced to the same selected features as the training set before testing the prediction model. Is my understanding correct.

 

I base my deduction on the following observation:

If I run Iris classification using random forest with no feature selection, I get the following,

 

=== Run information ===

 

Scheme:       weka.classifiers.trees.RandomForest -P 100 -attribute-importance -I 100 -num-slots 1 -K 0 -M 1.0 -V 0.001 -S 1

Relation:     iris

Instances:    150

Attributes:   5

              sepallength

              sepalwidth

              petallength

              petalwidth

              class

Test mode:    evaluate on training data

 

=== Classifier model (full training set) ===

 

RandomForest

 

Bagging with 100 iterations and base learner

 

weka.classifiers.trees.RandomTree -K 0 -M 1.0 -V 0.001 -S 1 -do-not-check-capabilities

 

Attribute importance based on average impurity decrease (and number of nodes using that attribute)

 

      0.64 (    75)  sepalwidth

      0.6  (   123)  sepallength

      0.59 (   251)  petallength

      0.52 (   172)  petalwidth

 

 

Time taken to build model: 0.14 seconds

 

=== Evaluation on training set ===

 

Time taken to test model on training data: 0.05 seconds

 

=== Summary ===

 

Correctly Classified Instances         150              100      %

Incorrectly Classified Instances         0                0      %

Kappa statistic                          1    

Mean absolute error                      0.0148

Root mean squared error                  0.0603

Relative absolute error                  3.32   %

Root relative squared error             12.7859 %

Total Number of Instances              150    

****************************************************’

 

The model (recall) has 4 features and 1 class as expected.

Now when I do a meta attribute selected classifier randomforest using user supplied test set option (test set is the same as the training set, the full iris data, for simplicity) using the command line  below.

 

java -cp "D:\TOOLS\Weka\Weka-3-8\weka.jar" weka.classifiers.meta.AttributeSelectedClassifier -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W weka.classifiers.trees.RandomForest -t "D:\IRIS_training_set.arff" -T "D:\IRIS_test_set.arff" > "D:\IRIS_test_output.csv" -- -P 100 -attribute-importance -I 500 -num-slots 1 -K 0 -M 1.0 -V 0.001 -S 862743

 

 

Weka will automatically use CfssubsetEval for feature selection on the training set, use the reduced feature training set to train a random forest model and then reduce the same features on the test set and use the trained random forest model to predict the test set.

Is my understanding 100% correct?

 

I base my understanding on the following observation When using meta attribute selected classifier with training set being iris.arff and test set also iris.arff. When the meta attribute selected classifier is used, only 2 featues can be seen and there are no weird messages when using user supplied test set despite the test set having 4 features. So I came to the conclusion Weak meta attribute selected classifier automatically performs the same feature selection for the test set as it has done for the training set. Am I right?

 

I know these are very simple questions found online, but I really really want an expert opinion to be sure

 

 

un information ===

 

Scheme:       weka.classifiers.meta.AttributeSelectedClassifier -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W weka.classifiers.trees.RandomForest -- -P 100 -attribute-importance -I 100 -num-slots 1 -K 0 -M 1.0 -V 0.001 -S 1

Relation:     iris

Instances:    150

Attributes:   5

              sepallength

              sepalwidth

              petallength

              petalwidth

              class

Test mode:    user supplied test set:  size unknown (reading incrementally)

 

=== Classifier model (full training set) ===

 

AttributeSelectedClassifier:

 

 

 

=== Attribute Selection on all input data ===

 

Search Method:

         Best first.

         Start set: no attributes

         Search direction: forward

         Stale search after 5 node expansions

         Total number of subsets evaluated: 12

         Merit of best subset found:    0.887

 

Attribute Subset Evaluator (supervised, Class (nominal): 5 class):

         CFS Subset Evaluator

         Including locally predictive attributes

 

Selected attributes: 3,4 : 2

                     petallength

                     petalwidth

 

 

Header of reduced data:

@relation iris-weka.filters.unsupervised.attribute.Remove-V-R3-5

 

@attribute petallength numeric

@attribute petalwidth numeric

@attribute class {Iris-setosa,Iris-versicolor,Iris-virginica}

 

@data

 

 

Classifier Model

RandomForest

 

Bagging with 100 iterations and base learner

 

weka.classifiers.trees.RandomTree -K 0 -M 1.0 -V 0.001 -S 1 -do-not-check-capabilities

 

Attribute importance based on average impurity decrease (and number of nodes using that attribute)

 

      0.53 (   377)  petallength

      0.52 (   219)  petalwidth

 

 

Time taken to build model: 0.08 seconds

 

=== Evaluation on test set ===

 

Time taken to test model on supplied test set: 0.01 seconds

 

=== Summary ===

 

Correctly Classified Instances         149               99.3333 %

Incorrectly Classified Instances         1                0.6667 %

Kappa statistic                          0.99 

Mean absolute error                      0.0152

Root mean squared error                  0.0709

Relative absolute error                  3.4152 %

Root relative squared error             15.0507 %

Total Number of Instances              150    

 

=== Detailed Accuracy By Class ===

 

                 TP Rate  FP Rate  Precision  Recall   F-Measure  MCC      ROC Area  PRC Area  Class

                 1.000    0.000    1.000      1.000    1.000      1.000    1.000     1.000     Iris-setosa

                 0.980    0.000    1.000      0.980    0.990      0.985    1.000     0.999     Iris-versicolor

                 1.000    0.010    0.980      1.000    0.990      0.985    1.000     0.999     Iris-virginica

Weighted Avg.    0.993    0.003    0.993      0.993    0.993      0.990    1.000     0.999    

 

=== Confusion Matrix ===

 

  a  b  c   <-- classified as

 50  0  0 |  a = Iris-setosa

  0 49  1 |  b = Iris-versicolor

  0  0 50 |  c = Iris-virginica



_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: sorry but another simple student question about meta attribute selected classifier

Eibe Frank-2
Administrator
Yes, that's correct.

Cheers,
Eibe

On Sat, Mar 7, 2020 at 3:16 PM Hao Li <[hidden email]> wrote:

Sorry but one more short question, I know I can find these simple information online, I did that. Its just that I’m not 100% certain if my understanding is correct. Hence I very much desire a confirmation from an expert. Its really just a yes or no question.

 

My question is: Weka can automatically perform feature selection prior to classification by using    ‘weka meta attribute selected classifier’ Is this correct. For example,

I use the run option as below

 

weka.classifiers.meta.AttributeSelectedClassifier -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W weka.classifiers.trees.RandomForest -- -P 100 -attribute-importance -I 100 -num-slots 1 -K 0 -M 1.0 -V 0.001 -S 1

 

In this case Weka will first take the input data ,pass it through CfsSubsetEval feature selector and then pass the reduced dataset on to random forest. Is my understanding correct.

And if test set validation is used, Weka will automatically use the features selected by CfsSubsetEval from the TRAINING set and select the selected features from the TEST set.

Is this correct?

Basically the above run option will make sure the training set undergoes feature selection first before training a prediction model, afterwards, the test set features will be reduced to the same selected features as the training set before testing the prediction model. Is my understanding correct.

 

I base my deduction on the following observation:

If I run Iris classification using random forest with no feature selection, I get the following,

 

=== Run information ===

 

Scheme:       weka.classifiers.trees.RandomForest -P 100 -attribute-importance -I 100 -num-slots 1 -K 0 -M 1.0 -V 0.001 -S 1

Relation:     iris

Instances:    150

Attributes:   5

              sepallength

              sepalwidth

              petallength

              petalwidth

              class

Test mode:    evaluate on training data

 

=== Classifier model (full training set) ===

 

RandomForest

 

Bagging with 100 iterations and base learner

 

weka.classifiers.trees.RandomTree -K 0 -M 1.0 -V 0.001 -S 1 -do-not-check-capabilities

 

Attribute importance based on average impurity decrease (and number of nodes using that attribute)

 

      0.64 (    75)  sepalwidth

      0.6  (   123)  sepallength

      0.59 (   251)  petallength

      0.52 (   172)  petalwidth

 

 

Time taken to build model: 0.14 seconds

 

=== Evaluation on training set ===

 

Time taken to test model on training data: 0.05 seconds

 

=== Summary ===

 

Correctly Classified Instances         150              100      %

Incorrectly Classified Instances         0                0      %

Kappa statistic                          1    

Mean absolute error                      0.0148

Root mean squared error                  0.0603

Relative absolute error                  3.32   %

Root relative squared error             12.7859 %

Total Number of Instances              150    

****************************************************’

 

The model (recall) has 4 features and 1 class as expected.

Now when I do a meta attribute selected classifier randomforest using user supplied test set option (test set is the same as the training set, the full iris data, for simplicity) using the command line  below.

 

java -cp "D:\TOOLS\Weka\Weka-3-8\weka.jar" weka.classifiers.meta.AttributeSelectedClassifier -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W weka.classifiers.trees.RandomForest -t "D:\IRIS_training_set.arff" -T "D:\IRIS_test_set.arff" > "D:\IRIS_test_output.csv" -- -P 100 -attribute-importance -I 500 -num-slots 1 -K 0 -M 1.0 -V 0.001 -S 862743

 

 

Weka will automatically use CfssubsetEval for feature selection on the training set, use the reduced feature training set to train a random forest model and then reduce the same features on the test set and use the trained random forest model to predict the test set.

Is my understanding 100% correct?

 

I base my understanding on the following observation When using meta attribute selected classifier with training set being iris.arff and test set also iris.arff. When the meta attribute selected classifier is used, only 2 featues can be seen and there are no weird messages when using user supplied test set despite the test set having 4 features. So I came to the conclusion Weak meta attribute selected classifier automatically performs the same feature selection for the test set as it has done for the training set. Am I right?

 

I know these are very simple questions found online, but I really really want an expert opinion to be sure

 

 

un information ===

 

Scheme:       weka.classifiers.meta.AttributeSelectedClassifier -E "weka.attributeSelection.CfsSubsetEval -P 1 -E 1" -S "weka.attributeSelection.BestFirst -D 1 -N 5" -W weka.classifiers.trees.RandomForest -- -P 100 -attribute-importance -I 100 -num-slots 1 -K 0 -M 1.0 -V 0.001 -S 1

Relation:     iris

Instances:    150

Attributes:   5

              sepallength

              sepalwidth

              petallength

              petalwidth

              class

Test mode:    user supplied test set:  size unknown (reading incrementally)

 

=== Classifier model (full training set) ===

 

AttributeSelectedClassifier:

 

 

 

=== Attribute Selection on all input data ===

 

Search Method:

         Best first.

         Start set: no attributes

         Search direction: forward

         Stale search after 5 node expansions

         Total number of subsets evaluated: 12

         Merit of best subset found:    0.887

 

Attribute Subset Evaluator (supervised, Class (nominal): 5 class):

         CFS Subset Evaluator

         Including locally predictive attributes

 

Selected attributes: 3,4 : 2

                     petallength

                     petalwidth

 

 

Header of reduced data:

@relation iris-weka.filters.unsupervised.attribute.Remove-V-R3-5

 

@attribute petallength numeric

@attribute petalwidth numeric

@attribute class {Iris-setosa,Iris-versicolor,Iris-virginica}

 

@data

 

 

Classifier Model

RandomForest

 

Bagging with 100 iterations and base learner

 

weka.classifiers.trees.RandomTree -K 0 -M 1.0 -V 0.001 -S 1 -do-not-check-capabilities

 

Attribute importance based on average impurity decrease (and number of nodes using that attribute)

 

      0.53 (   377)  petallength

      0.52 (   219)  petalwidth

 

 

Time taken to build model: 0.08 seconds

 

=== Evaluation on test set ===

 

Time taken to test model on supplied test set: 0.01 seconds

 

=== Summary ===

 

Correctly Classified Instances         149               99.3333 %

Incorrectly Classified Instances         1                0.6667 %

Kappa statistic                          0.99 

Mean absolute error                      0.0152

Root mean squared error                  0.0709

Relative absolute error                  3.4152 %

Root relative squared error             15.0507 %

Total Number of Instances              150    

 

=== Detailed Accuracy By Class ===

 

                 TP Rate  FP Rate  Precision  Recall   F-Measure  MCC      ROC Area  PRC Area  Class

                 1.000    0.000    1.000      1.000    1.000      1.000    1.000     1.000     Iris-setosa

                 0.980    0.000    1.000      0.980    0.990      0.985    1.000     0.999     Iris-versicolor

                 1.000    0.010    0.980      1.000    0.990      0.985    1.000     0.999     Iris-virginica

Weighted Avg.    0.993    0.003    0.993      0.993    0.993      0.990    1.000     0.999    

 

=== Confusion Matrix ===

 

  a  b  c   <-- classified as

 50  0  0 |  a = Iris-setosa

  0 49  1 |  b = Iris-versicolor

  0  0 50 |  c = Iris-virginica


_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html