Quantcast

Obtaining the top N words

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Obtaining the top N words

Edward Wiskers
Hi Weka list, 

In document classification, how can one obtain the top N words that describe each class (category)?

Thank you.
Edward

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Obtaining the top N words

Eibe Frank-3
You mean the N most frequent words? Use the RemoveWithValues filter to delete all instances not pertaining to the class you are interested in and then run the StringToWordVector filter with an appropriate value for the wordsToKeep parameter.

Cheers,
Eibe

On Fri, Jan 6, 2017 at 7:01 AM, Edward Wiskers <[hidden email]> wrote:
Hi Weka list, 

In document classification, how can one obtain the top N words that describe each class (category)?

Thank you.
Edward

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Obtaining the top N words

Edward Wiskers

Hi Eibe,

Thanks for your answer.

By saying "Use the RemoveWithValues filter to delete all instances not pertaining to the class you are interested", does that mean removing instances that can be outliers? If not, could you please provide an example that clarifies this point?

Thanks once again.
Edward

On 7 Jan 2017 3:29 pm, "Eibe Frank" <[hidden email]> wrote:
You mean the N most frequent words? Use the RemoveWithValues filter to delete all instances not pertaining to the class you are interested in and then run the StringToWordVector filter with an appropriate value for the wordsToKeep parameter.

Cheers,
Eibe

On Fri, Jan 6, 2017 at 7:01 AM, Edward Wiskers <[hidden email]> wrote:
Hi Weka list, 

In document classification, how can one obtain the top N words that describe each class (category)?

Thank you.
Edward

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Obtaining the top N words

Edward Wiskers
In reply to this post by Eibe Frank-3
In addition, how about the "minTermFreq" parameter of StringToWordVector, can be applied for selecting the N words in each class?

Edward

On Sat, Jan 7, 2017 at 3:29 PM, Eibe Frank <[hidden email]> wrote:
You mean the N most frequent words? Use the RemoveWithValues filter to delete all instances not pertaining to the class you are interested in and then run the StringToWordVector filter with an appropriate value for the wordsToKeep parameter.

Cheers,
Eibe

On Fri, Jan 6, 2017 at 7:01 AM, Edward Wiskers <[hidden email]> wrote:
Hi Weka list, 

In document classification, how can one obtain the top N words that describe each class (category)?

Thank you.
Edward

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Obtaining the top N words

Eibe Frank-3
In reply to this post by Edward Wiskers
No. Your question was how to get the top-N words for each class. For each class, you can do that by deleting all instances not in the class of interest. This is done by applying RemoveWithValues to delete the instances belonging to the other classes.

Cheers,
Eibe

On Sat, Jan 7, 2017 at 10:55 PM, Edward Wiskers <[hidden email]> wrote:

Hi Eibe,

Thanks for your answer.

By saying "Use the RemoveWithValues filter to delete all instances not pertaining to the class you are interested", does that mean removing instances that can be outliers? If not, could you please provide an example that clarifies this point?

Thanks once again.
Edward

On 7 Jan 2017 3:29 pm, "Eibe Frank" <[hidden email]> wrote:
You mean the N most frequent words? Use the RemoveWithValues filter to delete all instances not pertaining to the class you are interested in and then run the StringToWordVector filter with an appropriate value for the wordsToKeep parameter.

Cheers,
Eibe

On Fri, Jan 6, 2017 at 7:01 AM, Edward Wiskers <[hidden email]> wrote:
Hi Weka list, 

In document classification, how can one obtain the top N words that describe each class (category)?

Thank you.
Edward

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Obtaining the top N words

Eibe Frank-3
In reply to this post by Edward Wiskers
This parameter determines the minimum frequency of a term. Only terms that occur at least minTermFreq times will be turned into attributes in the transformed data. All terms with smaller frequency will be discarded.

You can use this parameter to find all terms that occur at least N times, where N is the parameter value.

Cheers,
Eibe

On Sat, Jan 7, 2017 at 11:18 PM, Edward Wiskers <[hidden email]> wrote:
In addition, how about the "minTermFreq" parameter of StringToWordVector, can be applied for selecting the N words in each class?

Edward

On Sat, Jan 7, 2017 at 3:29 PM, Eibe Frank <[hidden email]> wrote:
You mean the N most frequent words? Use the RemoveWithValues filter to delete all instances not pertaining to the class you are interested in and then run the StringToWordVector filter with an appropriate value for the wordsToKeep parameter.

Cheers,
Eibe

On Fri, Jan 6, 2017 at 7:01 AM, Edward Wiskers <[hidden email]> wrote:
Hi Weka list, 

In document classification, how can one obtain the top N words that describe each class (category)?

Thank you.
Edward

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Loading...