outputwordcounts option in StringToWordVector

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

outputwordcounts option in StringToWordVector

Bill Claster
Hello. Where are the output word counts outputted when using the
StringToWordVector? Is there a way to get the word counts and perhaps
to select only those words that occur with frequency "n" or greater?
I am using Weka Explorer 3.7 on Windows Vista.

Thank you.

Bill

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: outputwordcounts option in StringToWordVector

Peter Reutemann-3
> Hello. Where are the output word counts outputted when using the
> StringToWordVector?

STRING attributes get converted into new word/term count attributes
(with word/term as attribute name).

> Is there a way to get the word counts and perhaps
> to select only those words that occur with frequency "n" or greater?

minTermFreq(uency)/-M - the default is 1

> I am using Weka Explorer 3.7 on Windows Vista.

Cheers, Peter
--
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/           Ph. +64 (7) 858-5174

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: outputwordcounts option in StringToWordVector

Jenna
Hello . Thank you. I see minTermFrequency as an option/parameter in
the StringToWordVector filter and I see that it is set to 1 but is it
possible to display in Explorer a list of only the words with a
minimum frequency of say 8? I looked at the main display (under the
buttons "all" "none" "invert" "pattern") but that list did not seem to
change. I also clicked the edit button but that display did not seem
to change either.
Thank you.

On 12/14/09, Peter Reutemann <[hidden email]> wrote:

> > Hello. Where are the output word counts outputted when using the
>  > StringToWordVector?
>
>
> STRING attributes get converted into new word/term count attributes
>  (with word/term as attribute name).
>
>
>  > Is there a way to get the word counts and perhaps
>  > to select only those words that occur with frequency "n" or greater?
>
>
> minTermFreq(uency)/-M - the default is 1
>
>
>  > I am using Weka Explorer 3.7 on Windows Vista.
>
>
> Cheers, Peter
>  --
>  Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
>  http://www.cs.waikato.ac.nz/~fracpete/           Ph. +64 (7) 858-5174
>
>  _______________________________________________
>  Wekalist mailing list
>  Send posts to: [hidden email]
>  List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>  List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>


--
Best wishes,
Bill

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: outputwordcounts option in StringToWordVector

Peter Reutemann-3
Please no top-posting, see mailing list etiquette why
(http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html).

> Hello . Thank you. I see minTermFrequency as an option/parameter in
> the StringToWordVector filter and I see that it is set to 1 but is it
> possible to display in Explorer a list of only the words with a
> minimum frequency of say 8? I looked at the main display (under the
> buttons "all" "none" "invert" "pattern") but that list did not seem to
> change. I also clicked the edit button but that display did not seem
> to change either.

It works fine for me.

Did you not just only change the parameter but then also apply the
filter to the original dataset with the STRING attributes?

OK, here's what I've done:

1. minTermFreq=1
- loaded original test dataset
- selected StringToWordVector filter
- applied filter
=> 698 words as new attributes

2. minTermFreq=8
- loaded original test dataset
- selected StringToWordVector filter
- set minTermFreq to 8
- applied filter
=> 12 words as new attributes

Cheers, Peter
--
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/           Ph. +64 (7) 858-5174

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: outputwordcounts option in StringToWordVector

rexer24
This post has NOT been accepted by the mailing list yet.
In reply to this post by Bill Claster
hi boys , my solution is :
fill 'dictionaryFileToSaveTo' field in stringToWordVector config.
khamenei
you can also set the field outputWordCounts to true and tokenizer without delimiter
Loading...