How to examine subset of features for each training sample?

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

How to examine subset of features for each training sample?

asadbtk
Hello Eibe and Peter

If I use FS algorithms on 10 fold CV, can I examine a subset of features that are selected in each fold? I am using Weka Explorer.

I read a paper which have mentioned something like:

 " We compute the consistency as a percentage of the unique metrics that consistently appeared among all of the 100 training samples and all of the unique metrics for all training samples. "

The authors examined features selected in each fold in order to determine the consistency of features selection algorithms (this is what I got the meaning from the above statement of the authors).  

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: How to examine subset of features for each training sample?

Peter Reutemann
> If I use FS algorithms on 10 fold CV, can I examine a subset of features that are selected in each fold? I am using Weka Explorer.
>
> I read a paper which have mentioned something like:
>
>  " We compute the consistency as a percentage of the unique metrics that consistently appeared among all of the 100 training samples and all of the unique metrics for all training samples. "
>
> The authors examined features selected in each fold in order to determine the consistency of features selection algorithms (this is what I got the meaning from the above statement of the authors).

Unless you manually generate cross-validation folds
(weka.filters.supervised.instance.StratifiedRemoveFolds or
weka.filters.unsupervised.instance.RemoveFolds) and then perform
attribute selection for each of the folds, then the answer is no for
the Explorer. Maybe it is possible with the KnowledgeFlow.

The following ADAMS workflow simlates CV and generates a spreadsheet
showing what attributes get selected in what fold (and you can even
perform multiple repetitions of CV):
adams-weka-attribute_selection_simulated_cv.flow

Just use the adams-ml-app snapshot
(https://adams.cms.waikato.ac.nz/download/snapshot/), the above flow
is part of that download (when downloading the zip file).

Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 858-5174
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: How to examine subset of features for each training sample?

asadbtk
Hello Peter and thanks for your reply.

I did not understand ADAMS? 

(a)  Is it an extended form of Weka?   I can see that it is also provided by Waikato university..

(b) There are a lot of zip files, which to download and use?

(c) How can I use it, is there any manual/videos available? Is it like Weka explorer?

Thanks and regards

On Wed, Jul 22, 2020 at 11:01 PM Peter Reutemann <[hidden email]> wrote:
> If I use FS algorithms on 10 fold CV, can I examine a subset of features that are selected in each fold? I am using Weka Explorer.
>
> I read a paper which have mentioned something like:
>
>  " We compute the consistency as a percentage of the unique metrics that consistently appeared among all of the 100 training samples and all of the unique metrics for all training samples. "
>
> The authors examined features selected in each fold in order to determine the consistency of features selection algorithms (this is what I got the meaning from the above statement of the authors).

Unless you manually generate cross-validation folds
(weka.filters.supervised.instance.StratifiedRemoveFolds or
weka.filters.unsupervised.instance.RemoveFolds) and then perform
attribute selection for each of the folds, then the answer is no for
the Explorer. Maybe it is possible with the KnowledgeFlow.

The following ADAMS workflow simlates CV and generates a spreadsheet
showing what attributes get selected in what fold (and you can even
perform multiple repetitions of CV):
adams-weka-attribute_selection_simulated_cv.flow

Just use the adams-ml-app snapshot
(https://adams.cms.waikato.ac.nz/download/snapshot/), the above flow
is part of that download (when downloading the zip file).

Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 858-5174
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: How to examine subset of features for each training sample?

Peter Reutemann
> I did not understand ADAMS?
>
> (a)  Is it an extended form of Weka?   I can see that it is also provided by Waikato university..

No. ADAMS is a modular system with Weka being just one of its modules.
We provide commercial applications built on top of ADAMS for
processing spectral data (eg NIR, MIR, XRF) - all using workflows,
with some of them being generated dynamically.

> (b) There are a lot of zip files, which to download and use?

Like I said before, download the adams-ml-app snapshot.

> (c) How can I use it, is there any manual/videos available? Is it like Weka explorer?

For running the workflow that I mentioned you need to use the Flow
editor (you can find it in the main menu under "Tools").
The workflow engine in ADAMS is basically a graphical programming
language. There are some (old) videos on our youtube channel that show
how you create flows (and the adams-core manual explains the basics).
Unfortunately, we don't have time, resources or money to create new
content. Over the years, we've developed our own data science tools as
part of ADAMS, with the Weka Investigator being a more powerful
version of the Weka Explorer.

Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 858-5174
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: How to examine subset of features for each training sample?

asadbtk
Thank you Peter for the information. 

I am going to install it and will elaborate it further.

Thanks again.

Best regards

On Wed, Jul 22, 2020 at 11:35 PM Peter Reutemann <[hidden email]> wrote:
> I did not understand ADAMS?
>
> (a)  Is it an extended form of Weka?   I can see that it is also provided by Waikato university..

No. ADAMS is a modular system with Weka being just one of its modules.
We provide commercial applications built on top of ADAMS for
processing spectral data (eg NIR, MIR, XRF) - all using workflows,
with some of them being generated dynamically.

> (b) There are a lot of zip files, which to download and use?

Like I said before, download the adams-ml-app snapshot.

> (c) How can I use it, is there any manual/videos available? Is it like Weka explorer?

For running the workflow that I mentioned you need to use the Flow
editor (you can find it in the main menu under "Tools").
The workflow engine in ADAMS is basically a graphical programming
language. There are some (old) videos on our youtube channel that show
how you create flows (and the adams-core manual explains the basics).
Unfortunately, we don't have time, resources or money to create new
content. Over the years, we've developed our own data science tools as
part of ADAMS, with the Weka Investigator being a more powerful
version of the Weka Explorer.

Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 858-5174
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: How to examine subset of features for each training sample?

Eibe Frank-2
Administrator
In reply to this post by Peter Reutemann


> On 23/07/2020, at 9:00 AM, Peter Reutemann <[hidden email]> wrote:
>
>> If I use FS algorithms on 10 fold CV, can I examine a subset of features that are selected in each fold? I am using Weka Explorer.
>>
>> I read a paper which have mentioned something like:
>>
>> " We compute the consistency as a percentage of the unique metrics that consistently appeared among all of the 100 training samples and all of the unique metrics for all training samples. "
>>
>> The authors examined features selected in each fold in order to determine the consistency of features selection algorithms (this is what I got the meaning from the above statement of the authors).
>
> Unless you manually generate cross-validation folds
> (weka.filters.supervised.instance.StratifiedRemoveFolds or
> weka.filters.unsupervised.instance.RemoveFolds) and then perform
> attribute selection for each of the folds, then the answer is no for
> the Explorer. Maybe it is possible with the KnowledgeFlow.

The most recent version of WEKA has an option in the Classify panel of the Explorer that shows the model for each fold (available under “More options”: tick “Output models for training splits"). Using the AttributeSelectedClassifier with the appropriate feature selection algorithm will show all the subsets selected.

> The following ADAMS workflow simlates CV and generates a spreadsheet
> showing what attributes get selected in what fold (and you can even
> perform multiple repetitions of CV):
> adams-weka-attribute_selection_simulated_cv.flow

This sounds like the information you get if you use “Cross-validation” in the “Select attributes” panel of the Explorer. It cannot perform multiple runs of cross-validation though.

Cheers,
Eibe
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: How to examine subset of features for each training sample?

asadbtk
Hello Eibe and thanks for your information. 

Which version of weka has this information? 3.9?

Best regards 

On Thursday, July 23, 2020, Eibe Frank <[hidden email]> wrote:


> On 23/07/2020, at 9:00 AM, Peter Reutemann <[hidden email]> wrote:
>
>> If I use FS algorithms on 10 fold CV, can I examine a subset of features that are selected in each fold? I am using Weka Explorer.
>>
>> I read a paper which have mentioned something like:
>>
>> " We compute the consistency as a percentage of the unique metrics that consistently appeared among all of the 100 training samples and all of the unique metrics for all training samples. "
>>
>> The authors examined features selected in each fold in order to determine the consistency of features selection algorithms (this is what I got the meaning from the above statement of the authors).
>
> Unless you manually generate cross-validation folds
> (weka.filters.supervised.instance.StratifiedRemoveFolds or
> weka.filters.unsupervised.instance.RemoveFolds) and then perform
> attribute selection for each of the folds, then the answer is no for
> the Explorer. Maybe it is possible with the KnowledgeFlow.

The most recent version of WEKA has an option in the Classify panel of the Explorer that shows the model for each fold (available under “More options”: tick “Output models for training splits"). Using the AttributeSelectedClassifier with the appropriate feature selection algorithm will show all the subsets selected.

> The following ADAMS workflow simlates CV and generates a spreadsheet
> showing what attributes get selected in what fold (and you can even
> perform multiple repetitions of CV):
> adams-weka-attribute_selection_simulated_cv.flow

This sounds like the information you get if you use “Cross-validation” in the “Select attributes” panel of the Explorer. It cannot perform multiple runs of cross-validation though.

Cheers,
Eibe
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: How to examine subset of features for each training sample?

asadbtk
Thank you Eibe and Peter for the information

I have installed Weka 3.9.4 and now I can see features for each fold, but it would be better if we could have 10*10 fold so that we have to find subset features of all the 100 training samples.

Thank you again



On Thu, Jul 23, 2020 at 10:59 AM javed khan <[hidden email]> wrote:
Hello Eibe and thanks for your information. 

Which version of weka has this information? 3.9?

Best regards 

On Thursday, July 23, 2020, Eibe Frank <[hidden email]> wrote:


> On 23/07/2020, at 9:00 AM, Peter Reutemann <[hidden email]> wrote:
>
>> If I use FS algorithms on 10 fold CV, can I examine a subset of features that are selected in each fold? I am using Weka Explorer.
>>
>> I read a paper which have mentioned something like:
>>
>> " We compute the consistency as a percentage of the unique metrics that consistently appeared among all of the 100 training samples and all of the unique metrics for all training samples. "
>>
>> The authors examined features selected in each fold in order to determine the consistency of features selection algorithms (this is what I got the meaning from the above statement of the authors).
>
> Unless you manually generate cross-validation folds
> (weka.filters.supervised.instance.StratifiedRemoveFolds or
> weka.filters.unsupervised.instance.RemoveFolds) and then perform
> attribute selection for each of the folds, then the answer is no for
> the Explorer. Maybe it is possible with the KnowledgeFlow.

The most recent version of WEKA has an option in the Classify panel of the Explorer that shows the model for each fold (available under “More options”: tick “Output models for training splits"). Using the AttributeSelectedClassifier with the appropriate feature selection algorithm will show all the subsets selected.

> The following ADAMS workflow simlates CV and generates a spreadsheet
> showing what attributes get selected in what fold (and you can even
> perform multiple repetitions of CV):
> adams-weka-attribute_selection_simulated_cv.flow

This sounds like the information you get if you use “Cross-validation” in the “Select attributes” panel of the Explorer. It cannot perform multiple runs of cross-validation though.

Cheers,
Eibe
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html