Bag of keywords for text classification

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Bag of keywords for text classification

Daniel
Hi,

I'm having trouble improving my classfier, i can't pass over the 70%
barrier. It's the proyect needed to complete my bachelors degree and I was
giving a specific dataset. That dataset had all the files to use and an
extra file with keywords in each file to look for when classifiying.

I have tried but I haven't found anything I could use in weka. I'm
considering mixing another tool like OpenNLP but maybe someone here has an
idea of how to use them.



--
Sent from: https://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Bag of keywords for text classification

rik ghosh
Well you see, there is no one algorithm that fits all kind of data( look up "no free lunch theorem"). It depends on the data and problem you have in hand. Is it regression or classification or clustering problem ? Is there imbalance in data? What is the size of the dataset? All these matters while choosing an algorithm. Also at times you might need to preprocess your data before applying an algorithm.  Give some more details and people might help you more efficiently.

Cheers 
- Rik

On Sat, 20 Jun 2020 at 08:10, Daniel <[hidden email]> wrote:
Hi,

I'm having trouble improving my classfier, i can't pass over the 70%
barrier. It's the proyect needed to complete my bachelors degree and I was
giving a specific dataset. That dataset had all the files to use and an
extra file with keywords in each file to look for when classifiying.

I have tried but I haven't found anything I could use in weka. I'm
considering mixing another tool like OpenNLP but maybe someone here has an
idea of how to use them.



--
Sent from: https://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Bag of keywords for text classification

Daniel
Well starting off its a classification problem, but could do clustering too,
im free in that way, but feel its worse.

There is a big class inbalance: {522, 1, 178, 59, 17, 12, 4, 2, 2}. This
classification might not be correct at all, hence maybe being better using
clustering, with Xmeans for example.

The dataset is 800 files, all travis logs of various sizes.

I've preprocessed the data a bit, first splitted for train and test (70/30),
a StringToWordVector, then a RemoveByName to delete all non utf-8
characters, and finally I was working on an attributeselection filter, with
not much success, as it only has improved it by 3%.

For classification, I'm doing: AdaBoosting -> Bagging -> Staking -> Voting
and getting a 72% of correct instances when compared with the test set.

Thanks



--
Sent from: https://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html