Quantcast

Attributes Selection

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Attributes Selection

manal alghamdi

I want to select a set of attributes that are important, so I applied that through the AttributeSelectedClassifier in WEKA; however, the following output was printed using RandomForest as a classifier and InformationGain Ranking Filter as the evaluator; I am confused because the selected attributes printed all the attributes and I don't know how to know if the attribute is significant or not.  :

Ranked attributes:
 0.0483842    3 mets_c
 0.0478885    2 target_heart_rt
 0.0325577   28 htn
 0.0261418   16 htnmed
 0.0234156    7 percent_hr_achieved
 0.0191224   13 diuretic
 0.0170437    6 resting_diastolic
 0.0162095    5 resting_systolic
 0.013243    19 all_lipid
 0.01201     35 obesity2_binarized
 0.0110515   17 statin
 0.0108296   30 famhx
 0.0107172    4 resting_heart_rt
 0.010679    11 acei
 0.0098718   10 bb
 0.0058655   20 depressionmed
 0.0046325   29 hyperlipid
 0.0042723   14 ccb
 0.0041837   27 thyroidmeds
 0.0038014   23 nitrates
 0.0031521   33 smoke
 0.0029182   21 smokingmeds
 0.002813    18 other_lipid
 0.0020482    9 aspirin
 0.0013633   31 oldchf
 0.0013114   37 black_binarized
 0.001179    15 other_htnmed
 0.0005669   24 edmeds
 0.0005324   12 arb
 0.0004939   26 lungdiseasemeds
 0.0003638    1 sex
 0.0002786    8 norm_rest_ekg_flg=2_binarized
 0.0002267   22 plavix
 0.0001086   25 dig
 0.000087    32 priorCVA
 0.0000237   34 knowncad
 0.0000156   36 sedentary2_binarized

Selected attributes: 3,2,28,16,7,13,6,5,19,35,17,30,4,11,10,20,29,14,27,23,33,21,18,9,31,37,15,24,12,26,1,8,22,25,32,34,36 : 37




_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Attributes Selection

Peter Reutemann
> I want to select a set of attributes that are important, so I applied that
> through the AttributeSelectedClassifier in WEKA; however, the following
> output was printed using RandomForest as a classifier and InformationGain
> Ranking Filter as the evaluator; I am confused because the selected
> attributes printed all the attributes and I don't know how to know if the
> attribute is significant or not.  :
>
> Ranked attributes:
>  0.0483842    3 mets_c
>  0.0478885    2 target_heart_rt

[...]


The "Ranker", as the name suggests, only ranks attributes according to
their importance. The first value is the "merit" of an attribute. It's
up to the user then to decide at what merit to cut off.

Instead of using an "attribute evaluator", you need to use a "subset
evaluator", like CfsSubsetEval.

Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 858-5174
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Attributes Selection

Peter Reutemann-3
Please use reply all.

>Thanks for the reply.
>1. But how I define the cut off?? depending on what exactly?  

Ranking attributes makes not much sense in conjunction with the AttributeSelectedClassifier. You'd apply it separately and in your own code. You use the ranking to add attributes til the performance degrades, for instance.

>2. I used it, but when I run all kind of algorithms with this subset of
>attributes, they never give any good accuracy eventhough the dataset is
>cleansed, descritized, replaced missing values, ...etc.  The label
>class is imbalance; however, I applied SMOTE and undersampling using
>the FilteredClassifier, but did not work either.

Your data might simply not have sufficient information to build a good model.

>3. If I used the select attribute evaluator is it wrong to use the full
>training set choice with the whole dataset? 

Evaluating against the training set will result in overly optimistic results - the model has already seen that data!


Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 858-5174
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Loading...