Re: Wekalist Digest, Vol 197, Issue 49

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Wekalist Digest, Vol 197, Issue 49

JamesS
Hi Eibe,

Thanks for taking my question.

I reviewed your suggestion that one should use Weka's FilterClassifier to
create my ClusterMembership filter pipeline.

The filter states that: "the filter is based strictly on the training data
and test instances will be processed by the filter without changing their
structure"

The above description makes me believe that it cannot handle the pipeline
that I want to create. For I want the filter to change the structure of the
test instances prior to classification in accordance to the way the
ClusterMembership filter was learned on the training data. The results that
I am getting is random chance.  This is totally different if I preprocess
the data under the preprocess tab and do a cross validation test on that
data. I don't think the Filter learns a ClusteringMembership model and
applies that model to test data.

The filter needs to do the following:

Filter the training data using the ClusterMembership and learn the
clustering model.

When classifying  TRAINING DATA the classifier needs to learn on the above
filtered data.

When classifying TEST DATA the FilterClassifier has to go back to the
original data and apply the learned ClusteringMembership clustering model
that it learned from the training data and then undertake the classification
tests.

QUESTION:

(1) Is it possible to confirm that Weka's FilterClassifier learns a
ClusterMembership clustering model and applies that model to test data prior
to classification?

My goal is to be able to learn cluster algorithms on class attribute
information for the purpose of generating more coherent data and thereby
more easily classifiable data. The process to learning the ClusterMembership
filter has to be similar to learning a classifier.

(2) Is there any way of bootstrapping something in Weka that will enable to
train the ClusterMembership Filter and create a model so it can be applied
to test data?


Thanks in advance,
Richard




Message: 1
Date: Mon, 29 Jul 2019 18:17:59 -0700
From: <[hidden email]>
To: <[hidden email]>
Subject: [Wekalist] Some Bugs and Qt as to how to create a valid
        Validation Classification Pipeline
Message-ID: <008201d54674$a0f180b0$e2d48210$@gmail.com>
Content-Type: text/plain; charset="us-ascii"

Hi,

 

For reference for Data assume 100 numerical instances with ten attributes
and a binary categorical class of Yes/No.

 

Initiative:

I am trying to create a classification pipeline which incorporates the
"Cluster Membership" filter found under the Preprocess tab in the Explorer
under Unsupervised attribute, as a part of a preprocessing feature
engineering strategy. The filter converts data to clusters and associated
weights for each instance. It uses class information for its learning.

 

Problem:

I don't know how to create a valid classification pipeline with the above
filter in Weka.

 

Goals:

Training Data:

(1) Learn "Cluster Membership" filter on class information. Save learned
model (ClusterMembershipModel)

(2) Learn Classifier on new features learned by the Cluster Membership
filter. Save Classifier Model.

 

Validation Data: (that contains no Class information)

(1) Apply ClusterMembershipModel  THEN apply Classifier Model . Optimize
results.

 

Questions:

(1) In Weka how can I learn the Cluster Membership filter and save the
model?

 

(2) If (1) is not possible,  Is there any way in Weka that will allow me to
learn the Cluster Membership filter on the class data and then apply that
learned model to unclassified validation data?

 

(3) Is there any way in Weka of generating a pipeline which will allow me to
confirm the benefits of the Cluster Membership filter? (Unless one can save
a learned model for the filter cross-validation or percentage split or
supplied validation test set cannot be used to validate the benefits of the
filter. The reason is that the data is biased as it has been generated by a
filter that uses class information for the generation  of new features)

 

 

Here are some bugs:

Is there an official place to upload Bug info.?

 

(1) In the Explorer, under Rules, if one chooses the
MultiObjectiveEvolutionary Classifier and then does a percentage split test
the classifier builds the model on the full training data  as opposed to
building the model on the training split. One can see that as this fact  is
being displayed on the bottom of the GUI.  This can be corrected if one
clicks the "More Options" and deselects "Output Model".

(2) If one goes under the "Select Attibutes" tab and chooses
"WrapperSubsetEval" and one chooses the
MultiObjectiveEvolutionaryFuzzyClassifier  under Rules an error is logged
which states: "This classiffier doesn't support databases with only
categorical attributes. Please, use
MultiObjectiveEvolutionaryCategoricalClassifier".  This does not make sense
when all the attributes are in an "arff" file and all attributes are
numerical.

(3) If one goes under the preprocess tab and selects
filters.supervised.attribute.AttributeSelection and chooses
attribute.selection.MultiObjectiveEvolutionarySearch in the Search Command
line after about 1minute of searching for attributes the following error is
displayed:

"Attribute names are not unique! Causes: 'Atribute1' 'Atribute1' 'Atribute1'

All my attributes are unique so the above is an error.

 

 

 

Best,

Richard

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://list.waikato.ac.nz/pipermail/wekalist/attachments/20190729/43ab2281/
attachment-0001.html>

------------------------------

Message: 2
Date: Tue, 30 Jul 2019 15:02:51 +1200
From: Eibe Frank <[hidden email]>
To: Weka machine learning workbench list.
        <[hidden email]>
Cc: "Carlos Martinez Cortes<[hidden email]>"
        <[hidden email]>
Subject: Re: [Wekalist] Some Bugs and Qt as to how to create a valid
        ValidationClassification Pipeline
Message-ID: <[hidden email]>
Content-Type: text/plain; charset="utf-8"

You can apply the ClusterMembership filter as part of the
FilteredClassifier. This will ensure that the filtering model will be built
from the training set(s) only and that it will be applied to the test data
correctly.

Regarding the bugs:

1) That?s a feature. By default, WEKA always outputs the model built from
the full dataset (e.g., in the Explorer, the model built from the entire
data loaded into the Preprocess panel), regardless of the evaluation method
that is chosen. (That means the learning algorithm is run an additional time
on the full dataset, not just once on the training split if you use a
percentage split evaluation.)
2 and 3) This looks like an issue with the
MultiObjectiveEvolutionaryFuzzyClassifier package. I?m copying this to the
package maintainer.

Cheers,
Eibe

From: [hidden email]
Sent: Tuesday, 30 July 2019 1:38 PM
To: [hidden email]
Subject: [Wekalist] Some Bugs and Qt as to how to create a valid
ValidationClassification Pipeline

Hi,

For reference for Data assume 100 numerical instances with ten attributes
and a binary categorical class of Yes/No.

Initiative:
I am trying to create a classification pipeline which incorporates the
?Cluster Membership? filter found under the Preprocess tab in the Explorer
under Unsupervised attribute, as a part of a preprocessing feature
engineering strategy. The filter converts data to clusters and associated
weights for each instance. It uses class information for its learning.

Problem:
I don?t know how to create a valid classification pipeline with the above
filter in Weka.

Goals:
Training Data:
(1) Learn ?Cluster Membership? filter on class information. Save learned
model (ClusterMembershipModel)
(2) Learn Classifier on new features learned by the Cluster Membership
filter. Save Classifier Model.

Validation Data: (that contains no Class information)
(1) Apply ClusterMembershipModel? THEN apply Classifier Model . Optimize
results.

Questions:
(1) In Weka how can I learn the Cluster Membership filter and save the
model?

(2) If (1) is not possible,? Is there any way in Weka that will allow me to
learn the Cluster Membership filter on the class data and then apply that
learned model to unclassified validation data?

(3) Is there any way in Weka of generating a pipeline which will allow me to
confirm the benefits of the Cluster Membership filter? (Unless one can save
a learned model for the filter cross-validation or percentage split or
supplied validation test set cannot be used to validate the benefits of the
filter. The reason is that the data is biased as it has been generated by a
filter that uses class information for the generation? of new features)


Here are some bugs:
Is there an official place to upload Bug info.?

(1) In the Explorer, under Rules, if one chooses the
MultiObjectiveEvolutionary Classifier and then does a percentage split test
the classifier builds the model on the full training data? as opposed to
building the model on the training split. One can see that as this fact? is
being displayed on the bottom of the GUI.? This can be corrected if one
clicks the ?More Options? and deselects ?Output Model?.
(2) If one goes under the ?Select Attibutes? tab and chooses
?WrapperSubsetEval? and one chooses the
MultiObjectiveEvolutionaryFuzzyClassifier? under Rules an error is logged
which states: ?This classiffier doesn't support databases with only
categorical attributes. Please, use
MultiObjectiveEvolutionaryCategoricalClassifier?.? This does not make sense
when all the attributes are in an ?arff? file and all attributes are
numerical.
(3) If one goes under the preprocess tab and selects
filters.supervised.attribute.AttributeSelection and chooses
attribute.selection.MultiObjectiveEvolutionarySearch in the Search Command
line after about 1minute of searching for attributes the following error is
displayed:
?Attribute names are not unique! Causes: ?Atribute1? ?Atribute1? ?Atribute1?
All my attributes are unique so the above is an error.



Best,
Richard

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://list.waikato.ac.nz/pipermail/wekalist/attachments/20190730/ad93b0a0/
attachment.html>

------------------------------

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email] To subscribe, unsubscribe, etc.,
visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette:
http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


End of Wekalist Digest, Vol 197, Issue 49
*****************************************

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Wekalist Digest, Vol 197, Issue 49

Eibe Frank-2
Administrator
Assuming a class attribute has been set, if you apply the ClusterMembership filter in the Preprocess panel and then run a cross-validation afterwards, you will get highly optimistic, useless performance estimates. The reason is that this filter is not actually unsupervised when a class attribute has been set.

Let’s say you run the filter with EM as the clusterer, setting the number of clusters to three, for a two-class problem. Then, assuming the class attribute has been set in the Preprocess panel (the last attribute is taken as the class by default), clustering will be performed separately for the data of each class. That means the first three clusters will model the data in the first class and the remaining three clusters will model the data in the second class. Once the clusters have been established, the filter will add six attributes to each instance in the data. Each attribute represents the probability of membership for a particular cluster.

Now, because *all* the data is used for clustering in the Preprocess panel, it will be trivial for the learning algorithm in the Classify panel to achieve high accuracy: for example, when the cluster membership probability is high for one of the first three clusters, this almost certainly means the corresponding instance belongs to the first class.

In contrast, when you apply the FilteredClassifier in conjunction with the filter, the clusters will be established based on the training data only. However, the six attributes will be added to all instances, including the test instances.

Unfortunately, your random chance result means that your (current) clustering approach does not find structure in the training data that is actually useful when predicting new data (e.g., the test data in a percentage split).

Note that, when the filter model is built, the class attribute is only used to divide the data into different subsets for clustering when applying ClusterMembership. It will not be used subsequently when the clusters for each class are established by the clustering algorithm.

I checked the source code but couldn’t find the statement you quote. That statement is not quite right because the cluster membership attributes *will* be added to the test data as well, so the structure of the test data will in fact be changed.

Cheers,
Eibe

PS: Rather than responding to the digest, consider using the Nabble mailing list archive to respond to messages that you may have already deleted.

> On 31/07/2019, at 11:03 AM, [hidden email] wrote:
>
> Hi Eibe,
>
> Thanks for taking my question.
>
> I reviewed your suggestion that one should use Weka's FilterClassifier to
> create my ClusterMembership filter pipeline.
>
> The filter states that: "the filter is based strictly on the training data
> and test instances will be processed by the filter without changing their
> structure"
>
> The above description makes me believe that it cannot handle the pipeline
> that I want to create. For I want the filter to change the structure of the
> test instances prior to classification in accordance to the way the
> ClusterMembership filter was learned on the training data. The results that
> I am getting is random chance.  This is totally different if I preprocess
> the data under the preprocess tab and do a cross validation test on that
> data. I don't think the Filter learns a ClusteringMembership model and
> applies that model to test data.
>
> The filter needs to do the following:
>
> Filter the training data using the ClusterMembership and learn the
> clustering model.
>
> When classifying  TRAINING DATA the classifier needs to learn on the above
> filtered data.
>
> When classifying TEST DATA the FilterClassifier has to go back to the
> original data and apply the learned ClusteringMembership clustering model
> that it learned from the training data and then undertake the classification
> tests.
>
> QUESTION:
>
> (1) Is it possible to confirm that Weka's FilterClassifier learns a
> ClusterMembership clustering model and applies that model to test data prior
> to classification?
>
> My goal is to be able to learn cluster algorithms on class attribute
> information for the purpose of generating more coherent data and thereby
> more easily classifiable data. The process to learning the ClusterMembership
> filter has to be similar to learning a classifier.
>
> (2) Is there any way of bootstrapping something in Weka that will enable to
> train the ClusterMembership Filter and create a model so it can be applied
> to test data?
>
>
> Thanks in advance,
> Richard
>
>
>
>
> Message: 1
> Date: Mon, 29 Jul 2019 18:17:59 -0700
> From: <[hidden email]>
> To: <[hidden email]>
> Subject: [Wekalist] Some Bugs and Qt as to how to create a valid
> Validation Classification Pipeline
> Message-ID: <008201d54674$a0f180b0$e2d48210$@gmail.com>
> Content-Type: text/plain; charset="us-ascii"
>
> Hi,
>
>
>
> For reference for Data assume 100 numerical instances with ten attributes
> and a binary categorical class of Yes/No.
>
>
>
> Initiative:
>
> I am trying to create a classification pipeline which incorporates the
> "Cluster Membership" filter found under the Preprocess tab in the Explorer
> under Unsupervised attribute, as a part of a preprocessing feature
> engineering strategy. The filter converts data to clusters and associated
> weights for each instance. It uses class information for its learning.
>
>
>
> Problem:
>
> I don't know how to create a valid classification pipeline with the above
> filter in Weka.
>
>
>
> Goals:
>
> Training Data:
>
> (1) Learn "Cluster Membership" filter on class information. Save learned
> model (ClusterMembershipModel)
>
> (2) Learn Classifier on new features learned by the Cluster Membership
> filter. Save Classifier Model.
>
>
>
> Validation Data: (that contains no Class information)
>
> (1) Apply ClusterMembershipModel  THEN apply Classifier Model . Optimize
> results.
>
>
>
> Questions:
>
> (1) In Weka how can I learn the Cluster Membership filter and save the
> model?
>
>
>
> (2) If (1) is not possible,  Is there any way in Weka that will allow me to
> learn the Cluster Membership filter on the class data and then apply that
> learned model to unclassified validation data?
>
>
>
> (3) Is there any way in Weka of generating a pipeline which will allow me to
> confirm the benefits of the Cluster Membership filter? (Unless one can save
> a learned model for the filter cross-validation or percentage split or
> supplied validation test set cannot be used to validate the benefits of the
> filter. The reason is that the data is biased as it has been generated by a
> filter that uses class information for the generation  of new features)
>
>
>
>
>
> Here are some bugs:
>
> Is there an official place to upload Bug info.?
>
>
>
> (1) In the Explorer, under Rules, if one chooses the
> MultiObjectiveEvolutionary Classifier and then does a percentage split test
> the classifier builds the model on the full training data  as opposed to
> building the model on the training split. One can see that as this fact  is
> being displayed on the bottom of the GUI.  This can be corrected if one
> clicks the "More Options" and deselects "Output Model".
>
> (2) If one goes under the "Select Attibutes" tab and chooses
> "WrapperSubsetEval" and one chooses the
> MultiObjectiveEvolutionaryFuzzyClassifier  under Rules an error is logged
> which states: "This classiffier doesn't support databases with only
> categorical attributes. Please, use
> MultiObjectiveEvolutionaryCategoricalClassifier".  This does not make sense
> when all the attributes are in an "arff" file and all attributes are
> numerical.
>
> (3) If one goes under the preprocess tab and selects
> filters.supervised.attribute.AttributeSelection and chooses
> attribute.selection.MultiObjectiveEvolutionarySearch in the Search Command
> line after about 1minute of searching for attributes the following error is
> displayed:
>
> "Attribute names are not unique! Causes: 'Atribute1' 'Atribute1' 'Atribute1'
>
> All my attributes are unique so the above is an error.
>
>
>
>
>
>
>
> Best,
>
> Richard
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> <http://list.waikato.ac.nz/pipermail/wekalist/attachments/20190729/43ab2281/
> attachment-0001.html>
>
> ------------------------------
>
> Message: 2
> Date: Tue, 30 Jul 2019 15:02:51 +1200
> From: Eibe Frank <[hidden email]>
> To: Weka machine learning workbench list.
> <[hidden email]>
> Cc: "Carlos Martinez Cortes<[hidden email]>"
> <[hidden email]>
> Subject: Re: [Wekalist] Some Bugs and Qt as to how to create a valid
> ValidationClassification Pipeline
> Message-ID: <[hidden email]>
> Content-Type: text/plain; charset="utf-8"
>
> You can apply the ClusterMembership filter as part of the
> FilteredClassifier. This will ensure that the filtering model will be built
> from the training set(s) only and that it will be applied to the test data
> correctly.
>
> Regarding the bugs:
>
> 1) That?s a feature. By default, WEKA always outputs the model built from
> the full dataset (e.g., in the Explorer, the model built from the entire
> data loaded into the Preprocess panel), regardless of the evaluation method
> that is chosen. (That means the learning algorithm is run an additional time
> on the full dataset, not just once on the training split if you use a
> percentage split evaluation.)
> 2 and 3) This looks like an issue with the
> MultiObjectiveEvolutionaryFuzzyClassifier package. I?m copying this to the
> package maintainer.
>
> Cheers,
> Eibe
>
> From: [hidden email]
> Sent: Tuesday, 30 July 2019 1:38 PM
> To: [hidden email]
> Subject: [Wekalist] Some Bugs and Qt as to how to create a valid
> ValidationClassification Pipeline
>
> Hi,
>
> For reference for Data assume 100 numerical instances with ten attributes
> and a binary categorical class of Yes/No.
>
> Initiative:
> I am trying to create a classification pipeline which incorporates the
> ?Cluster Membership? filter found under the Preprocess tab in the Explorer
> under Unsupervised attribute, as a part of a preprocessing feature
> engineering strategy. The filter converts data to clusters and associated
> weights for each instance. It uses class information for its learning.
>
> Problem:
> I don?t know how to create a valid classification pipeline with the above
> filter in Weka.
>
> Goals:
> Training Data:
> (1) Learn ?Cluster Membership? filter on class information. Save learned
> model (ClusterMembershipModel)
> (2) Learn Classifier on new features learned by the Cluster Membership
> filter. Save Classifier Model.
>
> Validation Data: (that contains no Class information)
> (1) Apply ClusterMembershipModel? THEN apply Classifier Model . Optimize
> results.
>
> Questions:
> (1) In Weka how can I learn the Cluster Membership filter and save the
> model?
>
> (2) If (1) is not possible,? Is there any way in Weka that will allow me to
> learn the Cluster Membership filter on the class data and then apply that
> learned model to unclassified validation data?
>
> (3) Is there any way in Weka of generating a pipeline which will allow me to
> confirm the benefits of the Cluster Membership filter? (Unless one can save
> a learned model for the filter cross-validation or percentage split or
> supplied validation test set cannot be used to validate the benefits of the
> filter. The reason is that the data is biased as it has been generated by a
> filter that uses class information for the generation? of new features)
>
>
> Here are some bugs:
> Is there an official place to upload Bug info.?
>
> (1) In the Explorer, under Rules, if one chooses the
> MultiObjectiveEvolutionary Classifier and then does a percentage split test
> the classifier builds the model on the full training data? as opposed to
> building the model on the training split. One can see that as this fact? is
> being displayed on the bottom of the GUI.? This can be corrected if one
> clicks the ?More Options? and deselects ?Output Model?.
> (2) If one goes under the ?Select Attibutes? tab and chooses
> ?WrapperSubsetEval? and one chooses the
> MultiObjectiveEvolutionaryFuzzyClassifier? under Rules an error is logged
> which states: ?This classiffier doesn't support databases with only
> categorical attributes. Please, use
> MultiObjectiveEvolutionaryCategoricalClassifier?.? This does not make sense
> when all the attributes are in an ?arff? file and all attributes are
> numerical.
> (3) If one goes under the preprocess tab and selects
> filters.supervised.attribute.AttributeSelection and chooses
> attribute.selection.MultiObjectiveEvolutionarySearch in the Search Command
> line after about 1minute of searching for attributes the following error is
> displayed:
> ?Attribute names are not unique! Causes: ?Atribute1? ?Atribute1? ?Atribute1?
> All my attributes are unique so the above is an error.
>
>
>
> Best,
> Richard
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> <http://list.waikato.ac.nz/pipermail/wekalist/attachments/20190730/ad93b0a0/
> attachment.html>
>
> ------------------------------
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email] To subscribe, unsubscribe, etc.,
> visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette:
> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
> End of Wekalist Digest, Vol 197, Issue 49
> *****************************************
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html