attribute selection

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

attribute selection

Katrin Tomanek
Hi,

I have a question about attribute selection. I have a dataset with about 15
attributes. I was thinking about performing some attribute selection, not for
the sake of dimensionality reduction (since complexitiy and speed doesnt
seem to be a problem) but to remove irrelevant and "bad" attributes and to
increase classifier effectivity by this.

My question is now: would you try to perform attribute selection on such a
rather small dataset or would you let it to the classifier to find and take
the best attributes?

And are there classifiers in weka that do already do attribute selection
(can the linear regression module do that?) as part of their learning and
optimization process (such as p.e. forward selection) ?

Thanx for your help,
Katrin
--
1024D/2E3AEDE3 2005-03-20 Katrin Tomanek <[hidden email]>
Key fingerprint = 368D 2CCC F659 3F54 768E  A709 37BC 79E1 2E3A EDE3
public key available at http://www.katrintomanek.de/pubkey

_______________________________________________
Wekalist mailing list
[hidden email]
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
Reply | Threaded
Open this post in threaded view
|

RE: attribute selection

subrat
Hi,

I think that wheather to use attribute selection procedures or not is a function of accuracy desired, size of data set(in terms of available cases available) and utility of misclassification, all taken together. Attribute selection is mostly favoured when desired accuracy is extremely high, cases are available in ample(so that generalisation is possible and the domain/problem semantics can 'absorb' discarding some attributes!) and cost of misclassification is high.

If your data set is mostly continuous and it is hard to draw 'discrete rules' on attribute values for learning, then perhaps going to feature space transformations (like PCA) may be tried. If you have a well defined global measure of accuracy, then methods like forward or backward feature selection would make sense.

In any case you do need a 'ranking' of the features based on their separability of the data set, which can be roughly taken to be the value that maximizes the ratio of the information content in the retained attrbutes to that in all the attributes when taken together. A good measure would the information gain metric or the gini index. This is available in weka.

I hope this helps,

Subrat



-----Original Message-----
From: [hidden email]
[mailto:[hidden email]]On Behalf Of Katrin
Tomanek
Sent: Tuesday, June 28, 2005 2:21 PM
To: [hidden email]
Subject: [Wekalist] attribute selection


Hi,

I have a question about attribute selection. I have a dataset with about 15
attributes. I was thinking about performing some attribute selection, not for
the sake of dimensionality reduction (since complexitiy and speed doesnt
seem to be a problem) but to remove irrelevant and "bad" attributes and to
increase classifier effectivity by this.

My question is now: would you try to perform attribute selection on such a
rather small dataset or would you let it to the classifier to find and take
the best attributes?

And are there classifiers in weka that do already do attribute selection
(can the linear regression module do that?) as part of their learning and
optimization process (such as p.e. forward selection) ?

Thanx for your help,
Katrin
--
1024D/2E3AEDE3 2005-03-20 Katrin Tomanek <[hidden email]>
Key fingerprint = 368D 2CCC F659 3F54 768E  A709 37BC 79E1 2E3A EDE3
public key available at http://www.katrintomanek.de/pubkey

_______________________________________________
Wekalist mailing list
[hidden email]
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist

_______________________________________________
Wekalist mailing list
[hidden email]
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
Reply | Threaded
Open this post in threaded view
|

Re: attribute selection

bthom
There is a wonderful summary-style paper on feature selection and its relationship to learning algorithm performance here:


There were also two commonly-cited earlier papers (Blum was an author on one, Kohavi an author on the other), but those are less accessible in terms of pragmatically getting work done. The Guyon article has a nice list of "advised guidelines" to help you make progress in the situation where---generally speaking---no one knows what best is.

Ideally, you'd like the learning algorithm to do it all for you---determine which features are best, construct those most useful ones in the best performing way, etc. The issue that kills seems to be practical---if only we had an unlimited amount of data and computing time, that grand desire might be realizable.

Hope you find this helpful,
--b

On Jun 28, 2005, at 2:09 AM, Nanda, Subrat (Research) wrote:

Hi,

I think that wheather to use attribute selection procedures or not is a function of accuracy desired, size of data set(in terms of available cases available) and utility of misclassification, all taken together. Attribute selection is mostly favoured when desired accuracy is extremely high, cases are available in ample(so that generalisation is possible and the domain/problem semantics can 'absorb' discarding some attributes!) and cost of misclassification is high. 

If your data set is mostly continuous and it is hard to draw 'discrete rules' on attribute values for learning, then perhaps going to feature space transformations (like PCA) may be tried. If you have a well defined global measure of accuracy, then methods like forward or backward feature selection would make sense. 

In any case you do need a 'ranking' of the features based on their separability of the data set, which can be roughly taken to be the value that maximizes the ratio of the information content in the retained attrbutes to that in all the attributes when taken together. A good measure would the information gain metric or the gini index. This is available in weka.

I hope this helps,

Subrat



-----Original Message-----
[[hidden email]]On Behalf Of Katrin
Tomanek
Sent: Tuesday, June 28, 2005 2:21 PM
Subject: [Wekalist] attribute selection


Hi,

I have a question about attribute selection. I have a dataset with about 15
attributes. I was thinking about performing some attribute selection, not for
the sake of dimensionality reduction (since complexitiy and speed doesnt
seem to be a problem) but to remove irrelevant and "bad" attributes and to
increase classifier effectivity by this.

My question is now: would you try to perform attribute selection on such a
rather small dataset or would you let it to the classifier to find and take
the best attributes?

And are there classifiers in weka that do already do attribute selection
(can the linear regression module do that?) as part of their learning and
optimization process (such as p.e. forward selection) ?

Thanx for your help,
Katrin
-- 
1024D/2E3AEDE3 2005-03-20 Katrin Tomanek <[hidden email]>
Key fingerprint = 368D 2CCC F659 3F54 768E  A709 37BC 79E1 2E3A EDE3
public key available at http://www.katrintomanek.de/pubkey

_______________________________________________
Wekalist mailing list

_______________________________________________
Wekalist mailing list


Dr. Belinda Thom                                                                   

---------------------------------------------------------------------------------------------------

http://www.cs.hmc.edu/~bthom                                                   909-607-9662

Asst. Professor,  Computer Science                                           fax  607-8364

Harvey Mudd College                                                                  1241 Olin Hall

1250 Dartmouth Ave,  Claremont,  CA,  91711                   



_______________________________________________
Wekalist mailing list
[hidden email]
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist