Word Embeddings in Weka

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Word Embeddings in Weka

joncarv
Hi, All!

I’m trying to figure out what word embeddings is...

Is it possible to use the dense feature representation generated by this technique with learning algorithms such as SVM? Or with neural networks only?

Does Weka support word embeddings?

Thanks!
Cheers!
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Jonnathan Carvalho
PhD Student
Machine Learning Group
Universidade Federal Fluminense (Brazil)
http://www.ic.uff.br
Reply | Threaded
Open this post in threaded view
|

Re: Word Embeddings in Weka

Felipe Bravo
Hi,
Yes you can get a document-level representation from pre-trained embeddings using the AffectiveTweets package (https://github.com/felipebravom/AffectiveTweets) or can even train your own embeddings using the deeplearning package (https://deeplearning.cms.waikato.ac.nz/). 
Cheers,
Felipe

On Wed, Aug 22, 2018 at 11:03 AM Jonnathan Carvalho <[hidden email]> wrote:
Hi, All!

I’m trying to figure out what word embeddings is...

Is it possible to use the dense feature representation generated by this technique with learning algorithms such as SVM? Or with neural networks only?

Does Weka support word embeddings?

Thanks!
Cheers!
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Cheers,
Felipe

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Word Embeddings in Weka

joncarv
Thanks a lot, Felipe!

Cheers!

On Wed, 22 Aug 2018 at 01:17, Felipe Bravo <[hidden email]> wrote:
Hi,
Yes you can get a document-level representation from pre-trained embeddings using the AffectiveTweets package (https://github.com/felipebravom/AffectiveTweets) or can even train your own embeddings using the deeplearning package (https://deeplearning.cms.waikato.ac.nz/). 
Cheers,
Felipe

On Wed, Aug 22, 2018 at 11:03 AM Jonnathan Carvalho <[hidden email]> wrote:
Hi, All!

I’m trying to figure out what word embeddings is...

Is it possible to use the dense feature representation generated by this technique with learning algorithms such as SVM? Or with neural networks only?

Does Weka support word embeddings?

Thanks!
Cheers!
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Cheers,
Felipe
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Jonnathan Carvalho
Instituto Federal de Educação, Ciência e Tecnologia Fluminense (RJ)

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Jonnathan Carvalho
PhD Student
Machine Learning Group
Universidade Federal Fluminense (Brazil)
http://www.ic.uff.br
Reply | Threaded
Open this post in threaded view
|

Re: Word Embeddings in Weka

joncarv
In reply to this post by Felipe Bravo
Hi Felipe,

As you have suggested, I used the AffectiveTweets package to get the word embeddings for tweets in Weka, using its default parameters, but I couldn’t understand what the generated dimensions mean (from embedding-0 to embedding-99)...

Do you recommend any reading?

Thanks a lot!

Cheers,
Jonnathan.

On 22 Aug 2018, at 01:16, Felipe Bravo <[hidden email]> wrote:

Hi,
Yes you can get a document-level representation from pre-trained embeddings using the AffectiveTweets package (https://github.com/felipebravom/AffectiveTweets) or can even train your own embeddings using the deeplearning package (https://deeplearning.cms.waikato.ac.nz/). 
Cheers,
Felipe

On Wed, Aug 22, 2018 at 11:03 AM Jonnathan Carvalho <[hidden email]> wrote:
Hi, All!

I’m trying to figure out what word embeddings is...

Is it possible to use the dense feature representation generated by this technique with learning algorithms such as SVM? Or with neural networks only?

Does Weka support word embeddings?

Thanks!
Cheers!
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Cheers,
Felipe
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Jonnathan Carvalho
PhD Student
Machine Learning Group
Universidade Federal Fluminense (Brazil)
http://www.ic.uff.br
Reply | Threaded
Open this post in threaded view
|

Re: Word Embeddings in Weka

Felipe Bravo
Hi Jonathan,
Word embeddings project discrete words into high-dimensional dense vectors with the aim of preserving word meaning into the embedding space. More details here: http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/
The TweetToEmbeddingFeatureVector creates a sentence-level representation by aggregating the embedding values of the words within a sentence. Aggregation can be done by averaging, adding, or concatenation. The default configuration of the filter uses pre-trained word vectors of 100 dimensions  and averages the word vectors withing a sentence. This is because you are getting 100 attributes (e.g., embedding-0, embedding-1, etc).
You can also train your own word embeddings using the filters provided by the WekaDeepLearning4j package.
I hope this helps.
Cheers,
Felipe

On Thu, Aug 23, 2018 at 3:16 PM Jonnathan Carvalho <[hidden email]> wrote:
Hi Felipe,

As you have suggested, I used the AffectiveTweets package to get the word embeddings for tweets in Weka, using its default parameters, but I couldn’t understand what the generated dimensions mean (from embedding-0 to embedding-99)...

Do you recommend any reading?

Thanks a lot!

Cheers,
Jonnathan.

On 22 Aug 2018, at 01:16, Felipe Bravo <[hidden email]> wrote:

Hi,
Yes you can get a document-level representation from pre-trained embeddings using the AffectiveTweets package (https://github.com/felipebravom/AffectiveTweets) or can even train your own embeddings using the deeplearning package (https://deeplearning.cms.waikato.ac.nz/). 
Cheers,
Felipe

On Wed, Aug 22, 2018 at 11:03 AM Jonnathan Carvalho <[hidden email]> wrote:
Hi, All!

I’m trying to figure out what word embeddings is...

Is it possible to use the dense feature representation generated by this technique with learning algorithms such as SVM? Or with neural networks only?

Does Weka support word embeddings?

Thanks!
Cheers!
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Cheers,
Felipe
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Cheers,
Felipe

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Word Embeddings in Weka

joncarv
Hi Felipe,

Thanks a lot for the explanation!!

Considering that when we extract word n-grams from a corpus, the vocabulary is based on the words that appear in the training instances, how does it work when we are using word embeddings?
More specifically, I have one dataset of tweets divided into training and test instances, and I need to extract word embeddings to train a SVM classifier.
How can I achieve this using the TweetToEmbeddingFeatureVector? Any example using Java code?

Cheers!
Jonnathan

On Thu, 23 Aug 2018 at 19:50, Felipe Bravo <[hidden email]> wrote:
Hi Jonathan,
Word embeddings project discrete words into high-dimensional dense vectors with the aim of preserving word meaning into the embedding space. More details here: http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/
The TweetToEmbeddingFeatureVector creates a sentence-level representation by aggregating the embedding values of the words within a sentence. Aggregation can be done by averaging, adding, or concatenation. The default configuration of the filter uses pre-trained word vectors of 100 dimensions  and averages the word vectors withing a sentence. This is because you are getting 100 attributes (e.g., embedding-0, embedding-1, etc).
You can also train your own word embeddings using the filters provided by the WekaDeepLearning4j package.
I hope this helps.
Cheers,
Felipe

On Thu, Aug 23, 2018 at 3:16 PM Jonnathan Carvalho <[hidden email]> wrote:
Hi Felipe,

As you have suggested, I used the AffectiveTweets package to get the word embeddings for tweets in Weka, using its default parameters, but I couldn’t understand what the generated dimensions mean (from embedding-0 to embedding-99)...

Do you recommend any reading?

Thanks a lot!

Cheers,
Jonnathan.

On 22 Aug 2018, at 01:16, Felipe Bravo <[hidden email]> wrote:

Hi,
Yes you can get a document-level representation from pre-trained embeddings using the AffectiveTweets package (https://github.com/felipebravom/AffectiveTweets) or can even train your own embeddings using the deeplearning package (https://deeplearning.cms.waikato.ac.nz/). 
Cheers,
Felipe

On Wed, Aug 22, 2018 at 11:03 AM Jonnathan Carvalho <[hidden email]> wrote:
Hi, All!

I’m trying to figure out what word embeddings is...

Is it possible to use the dense feature representation generated by this technique with learning algorithms such as SVM? Or with neural networks only?

Does Weka support word embeddings?

Thanks!
Cheers!
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Cheers,
Felipe
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Cheers,
Felipe
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Jonnathan Carvalho
Instituto Federal de Educação, Ciência e Tecnologia Fluminense (RJ)

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Jonnathan Carvalho
PhD Student
Machine Learning Group
Universidade Federal Fluminense (Brazil)
http://www.ic.uff.br
Reply | Threaded
Open this post in threaded view
|

Re: Word Embeddings in Weka

Felipe Bravo
Hi Jonnathan,
Word embeddings are usually trained from large copora.  The TweetToEmbeddingFeatureVector will calculate features from pretrained word embeddings in CSV format. Words from the training and testing sets not included in the embedding file will be discarded. The AffectiveTweets package provides embeddings trained from a big corpus of tweets that can be downloaded on the following link: https://github.com/felipebravom/AffectiveTweets/releases/download/1.0.0/w2v.twitter.edinburgh10M.400d.csv.gz

You can also train your own embeddings using the WekaDeepLearning4j package.

Cheers,
Felipe

On Sun, Sep 16, 2018 at 10:49 AM Jonnathan Carvalho <[hidden email]> wrote:
Hi Felipe,

Thanks a lot for the explanation!!

Considering that when we extract word n-grams from a corpus, the vocabulary is based on the words that appear in the training instances, how does it work when we are using word embeddings?
More specifically, I have one dataset of tweets divided into training and test instances, and I need to extract word embeddings to train a SVM classifier.
How can I achieve this using the TweetToEmbeddingFeatureVector? Any example using Java code?

Cheers!
Jonnathan

On Thu, 23 Aug 2018 at 19:50, Felipe Bravo <[hidden email]> wrote:
Hi Jonathan,
Word embeddings project discrete words into high-dimensional dense vectors with the aim of preserving word meaning into the embedding space. More details here: http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/
The TweetToEmbeddingFeatureVector creates a sentence-level representation by aggregating the embedding values of the words within a sentence. Aggregation can be done by averaging, adding, or concatenation. The default configuration of the filter uses pre-trained word vectors of 100 dimensions  and averages the word vectors withing a sentence. This is because you are getting 100 attributes (e.g., embedding-0, embedding-1, etc).
You can also train your own word embeddings using the filters provided by the WekaDeepLearning4j package.
I hope this helps.
Cheers,
Felipe

On Thu, Aug 23, 2018 at 3:16 PM Jonnathan Carvalho <[hidden email]> wrote:
Hi Felipe,

As you have suggested, I used the AffectiveTweets package to get the word embeddings for tweets in Weka, using its default parameters, but I couldn’t understand what the generated dimensions mean (from embedding-0 to embedding-99)...

Do you recommend any reading?

Thanks a lot!

Cheers,
Jonnathan.

On 22 Aug 2018, at 01:16, Felipe Bravo <[hidden email]> wrote:

Hi,
Yes you can get a document-level representation from pre-trained embeddings using the AffectiveTweets package (https://github.com/felipebravom/AffectiveTweets) or can even train your own embeddings using the deeplearning package (https://deeplearning.cms.waikato.ac.nz/). 
Cheers,
Felipe

On Wed, Aug 22, 2018 at 11:03 AM Jonnathan Carvalho <[hidden email]> wrote:
Hi, All!

I’m trying to figure out what word embeddings is...

Is it possible to use the dense feature representation generated by this technique with learning algorithms such as SVM? Or with neural networks only?

Does Weka support word embeddings?

Thanks!
Cheers!
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Cheers,
Felipe
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Cheers,
Felipe
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Jonnathan Carvalho
Instituto Federal de Educação, Ciência e Tecnologia Fluminense (RJ)
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Cheers,
Felipe

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Word Embeddings in Weka

joncarv
Hi Felipe,

So, I can use the TweetToEmbeddingFeatureVector to calculate features for the tweets in the test set independently from the training set...
Is that correct?

Thanks a lot!

Cheers,
Jonnathan

On Mon, 17 Sep 2018 at 00:07, Felipe Bravo <[hidden email]> wrote:
Hi Jonnathan,
Word embeddings are usually trained from large copora.  The TweetToEmbeddingFeatureVector will calculate features from pretrained word embeddings in CSV format. Words from the training and testing sets not included in the embedding file will be discarded. The AffectiveTweets package provides embeddings trained from a big corpus of tweets that can be downloaded on the following link: https://github.com/felipebravom/AffectiveTweets/releases/download/1.0.0/w2v.twitter.edinburgh10M.400d.csv.gz

You can also train your own embeddings using the WekaDeepLearning4j package.

Cheers,
Felipe

On Sun, Sep 16, 2018 at 10:49 AM Jonnathan Carvalho <[hidden email]> wrote:
Hi Felipe,

Thanks a lot for the explanation!!

Considering that when we extract word n-grams from a corpus, the vocabulary is based on the words that appear in the training instances, how does it work when we are using word embeddings?
More specifically, I have one dataset of tweets divided into training and test instances, and I need to extract word embeddings to train a SVM classifier.
How can I achieve this using the TweetToEmbeddingFeatureVector? Any example using Java code?

Cheers!
Jonnathan

On Thu, 23 Aug 2018 at 19:50, Felipe Bravo <[hidden email]> wrote:
Hi Jonathan,
Word embeddings project discrete words into high-dimensional dense vectors with the aim of preserving word meaning into the embedding space. More details here: http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/
The TweetToEmbeddingFeatureVector creates a sentence-level representation by aggregating the embedding values of the words within a sentence. Aggregation can be done by averaging, adding, or concatenation. The default configuration of the filter uses pre-trained word vectors of 100 dimensions  and averages the word vectors withing a sentence. This is because you are getting 100 attributes (e.g., embedding-0, embedding-1, etc).
You can also train your own word embeddings using the filters provided by the WekaDeepLearning4j package.
I hope this helps.
Cheers,
Felipe

On Thu, Aug 23, 2018 at 3:16 PM Jonnathan Carvalho <[hidden email]> wrote:
Hi Felipe,

As you have suggested, I used the AffectiveTweets package to get the word embeddings for tweets in Weka, using its default parameters, but I couldn’t understand what the generated dimensions mean (from embedding-0 to embedding-99)...

Do you recommend any reading?

Thanks a lot!

Cheers,
Jonnathan.

On 22 Aug 2018, at 01:16, Felipe Bravo <[hidden email]> wrote:

Hi,
Yes you can get a document-level representation from pre-trained embeddings using the AffectiveTweets package (https://github.com/felipebravom/AffectiveTweets) or can even train your own embeddings using the deeplearning package (https://deeplearning.cms.waikato.ac.nz/). 
Cheers,
Felipe

On Wed, Aug 22, 2018 at 11:03 AM Jonnathan Carvalho <[hidden email]> wrote:
Hi, All!

I’m trying to figure out what word embeddings is...

Is it possible to use the dense feature representation generated by this technique with learning algorithms such as SVM? Or with neural networks only?

Does Weka support word embeddings?

Thanks!
Cheers!
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Cheers,
Felipe
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Cheers,
Felipe
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Jonnathan Carvalho
Instituto Federal de Educação, Ciência e Tecnologia Fluminense (RJ)
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Cheers,
Felipe
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Jonnathan Carvalho
Instituto Federal de Educação, Ciência e Tecnologia Fluminense (RJ)

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Jonnathan Carvalho
PhD Student
Machine Learning Group
Universidade Federal Fluminense (Brazil)
http://www.ic.uff.br
Reply | Threaded
Open this post in threaded view
|

Re: Word Embeddings in Weka

Felipe Bravo
Yes, you should get the same attributes on both the training and testing sets.
Cheers

On Thu, Sep 20, 2018 at 6:57 AM Jonnathan Carvalho <[hidden email]> wrote:
Hi Felipe,

So, I can use the TweetToEmbeddingFeatureVector to calculate features for the tweets in the test set independently from the training set...
Is that correct?

Thanks a lot!

Cheers,
Jonnathan

On Mon, 17 Sep 2018 at 00:07, Felipe Bravo <[hidden email]> wrote:
Hi Jonnathan,
Word embeddings are usually trained from large copora.  The TweetToEmbeddingFeatureVector will calculate features from pretrained word embeddings in CSV format. Words from the training and testing sets not included in the embedding file will be discarded. The AffectiveTweets package provides embeddings trained from a big corpus of tweets that can be downloaded on the following link: https://github.com/felipebravom/AffectiveTweets/releases/download/1.0.0/w2v.twitter.edinburgh10M.400d.csv.gz

You can also train your own embeddings using the WekaDeepLearning4j package.

Cheers,
Felipe

On Sun, Sep 16, 2018 at 10:49 AM Jonnathan Carvalho <[hidden email]> wrote:
Hi Felipe,

Thanks a lot for the explanation!!

Considering that when we extract word n-grams from a corpus, the vocabulary is based on the words that appear in the training instances, how does it work when we are using word embeddings?
More specifically, I have one dataset of tweets divided into training and test instances, and I need to extract word embeddings to train a SVM classifier.
How can I achieve this using the TweetToEmbeddingFeatureVector? Any example using Java code?

Cheers!
Jonnathan

On Thu, 23 Aug 2018 at 19:50, Felipe Bravo <[hidden email]> wrote:
Hi Jonathan,
Word embeddings project discrete words into high-dimensional dense vectors with the aim of preserving word meaning into the embedding space. More details here: http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/
The TweetToEmbeddingFeatureVector creates a sentence-level representation by aggregating the embedding values of the words within a sentence. Aggregation can be done by averaging, adding, or concatenation. The default configuration of the filter uses pre-trained word vectors of 100 dimensions  and averages the word vectors withing a sentence. This is because you are getting 100 attributes (e.g., embedding-0, embedding-1, etc).
You can also train your own word embeddings using the filters provided by the WekaDeepLearning4j package.
I hope this helps.
Cheers,
Felipe

On Thu, Aug 23, 2018 at 3:16 PM Jonnathan Carvalho <[hidden email]> wrote:
Hi Felipe,

As you have suggested, I used the AffectiveTweets package to get the word embeddings for tweets in Weka, using its default parameters, but I couldn’t understand what the generated dimensions mean (from embedding-0 to embedding-99)...

Do you recommend any reading?

Thanks a lot!

Cheers,
Jonnathan.

On 22 Aug 2018, at 01:16, Felipe Bravo <[hidden email]> wrote:

Hi,
Yes you can get a document-level representation from pre-trained embeddings using the AffectiveTweets package (https://github.com/felipebravom/AffectiveTweets) or can even train your own embeddings using the deeplearning package (https://deeplearning.cms.waikato.ac.nz/). 
Cheers,
Felipe

On Wed, Aug 22, 2018 at 11:03 AM Jonnathan Carvalho <[hidden email]> wrote:
Hi, All!

I’m trying to figure out what word embeddings is...

Is it possible to use the dense feature representation generated by this technique with learning algorithms such as SVM? Or with neural networks only?

Does Weka support word embeddings?

Thanks!
Cheers!
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Cheers,
Felipe
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Cheers,
Felipe
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Jonnathan Carvalho
Instituto Federal de Educação, Ciência e Tecnologia Fluminense (RJ)
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Cheers,
Felipe
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Jonnathan Carvalho
Instituto Federal de Educação, Ciência e Tecnologia Fluminense (RJ)
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Cheers,
Felipe

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Word Embeddings in Weka

joncarv
Hi Felipe,

Could you please help me one more time?

I'm extracting word embedding vector using the pre-trained embeddings from w2v.twitter.edinburgh10M.400d.csv.gz, as recommended in the AffectiveTweets documentation:

...
TweetToEmbeddingsFeatureVector weFilter = new TweetToEmbeddingsFeatureVector();
weFilter.setInputFormat(instances);
CSVEmbeddingHandler handler = new CSVEmbeddingHandler();
handler.setEmbeddingsFile(new File("/home/joncarv/test-embeddings/w2v.twitter.edinburgh10M.400d.csv.gz"));
weFilter.setEmbeddingHandler(handler);
instances = Filter.useFilter(instances, weFilter);
...

Is this correct?

Thanks a lot!

On Wed, 19 Sep 2018 at 22:34, Felipe Bravo <[hidden email]> wrote:
Yes, you should get the same attributes on both the training and testing sets.
Cheers

On Thu, Sep 20, 2018 at 6:57 AM Jonnathan Carvalho <[hidden email]> wrote:
Hi Felipe,

So, I can use the TweetToEmbeddingFeatureVector to calculate features for the tweets in the test set independently from the training set...
Is that correct?

Thanks a lot!

Cheers,
Jonnathan

On Mon, 17 Sep 2018 at 00:07, Felipe Bravo <[hidden email]> wrote:
Hi Jonnathan,
Word embeddings are usually trained from large copora.  The TweetToEmbeddingFeatureVector will calculate features from pretrained word embeddings in CSV format. Words from the training and testing sets not included in the embedding file will be discarded. The AffectiveTweets package provides embeddings trained from a big corpus of tweets that can be downloaded on the following link: https://github.com/felipebravom/AffectiveTweets/releases/download/1.0.0/w2v.twitter.edinburgh10M.400d.csv.gz

You can also train your own embeddings using the WekaDeepLearning4j package.

Cheers,
Felipe

On Sun, Sep 16, 2018 at 10:49 AM Jonnathan Carvalho <[hidden email]> wrote:
Hi Felipe,

Thanks a lot for the explanation!!

Considering that when we extract word n-grams from a corpus, the vocabulary is based on the words that appear in the training instances, how does it work when we are using word embeddings?
More specifically, I have one dataset of tweets divided into training and test instances, and I need to extract word embeddings to train a SVM classifier.
How can I achieve this using the TweetToEmbeddingFeatureVector? Any example using Java code?

Cheers!
Jonnathan

On Thu, 23 Aug 2018 at 19:50, Felipe Bravo <[hidden email]> wrote:
Hi Jonathan,
Word embeddings project discrete words into high-dimensional dense vectors with the aim of preserving word meaning into the embedding space. More details here: http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/
The TweetToEmbeddingFeatureVector creates a sentence-level representation by aggregating the embedding values of the words within a sentence. Aggregation can be done by averaging, adding, or concatenation. The default configuration of the filter uses pre-trained word vectors of 100 dimensions  and averages the word vectors withing a sentence. This is because you are getting 100 attributes (e.g., embedding-0, embedding-1, etc).
You can also train your own word embeddings using the filters provided by the WekaDeepLearning4j package.
I hope this helps.
Cheers,
Felipe

On Thu, Aug 23, 2018 at 3:16 PM Jonnathan Carvalho <[hidden email]> wrote:
Hi Felipe,

As you have suggested, I used the AffectiveTweets package to get the word embeddings for tweets in Weka, using its default parameters, but I couldn’t understand what the generated dimensions mean (from embedding-0 to embedding-99)...

Do you recommend any reading?

Thanks a lot!

Cheers,
Jonnathan.

On 22 Aug 2018, at 01:16, Felipe Bravo <[hidden email]> wrote:

Hi,
Yes you can get a document-level representation from pre-trained embeddings using the AffectiveTweets package (https://github.com/felipebravom/AffectiveTweets) or can even train your own embeddings using the deeplearning package (https://deeplearning.cms.waikato.ac.nz/). 
Cheers,
Felipe

On Wed, Aug 22, 2018 at 11:03 AM Jonnathan Carvalho <[hidden email]> wrote:
Hi, All!

I’m trying to figure out what word embeddings is...

Is it possible to use the dense feature representation generated by this technique with learning algorithms such as SVM? Or with neural networks only?

Does Weka support word embeddings?

Thanks!
Cheers!
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Cheers,
Felipe
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Cheers,
Felipe
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Jonnathan Carvalho
Instituto Federal de Educação, Ciência e Tecnologia Fluminense (RJ)
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Cheers,
Felipe
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Jonnathan Carvalho
Instituto Federal de Educação, Ciência e Tecnologia Fluminense (RJ)
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Cheers,
Felipe
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Jonnathan Carvalho
Instituto Federal de Educação, Ciência e Tecnologia Fluminense (RJ)

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Jonnathan Carvalho
PhD Student
Machine Learning Group
Universidade Federal Fluminense (Brazil)
http://www.ic.uff.br
Reply | Threaded
Open this post in threaded view
|

Re: Word Embeddings in Weka

Felipe Bravo
Hi Jonnathan,
It looks good to me.
Cheers,
Felipe

On Wed, Sep 26, 2018 at 10:19 PM Jonnathan Carvalho <[hidden email]> wrote:
Hi Felipe,

Could you please help me one more time?

I'm extracting word embedding vector using the pre-trained embeddings from w2v.twitter.edinburgh10M.400d.csv.gz, as recommended in the AffectiveTweets documentation:

...
TweetToEmbeddingsFeatureVector weFilter = new TweetToEmbeddingsFeatureVector();
weFilter.setInputFormat(instances);
CSVEmbeddingHandler handler = new CSVEmbeddingHandler();
handler.setEmbeddingsFile(new File("/home/joncarv/test-embeddings/w2v.twitter.edinburgh10M.400d.csv.gz"));
weFilter.setEmbeddingHandler(handler);
instances = Filter.useFilter(instances, weFilter);
...

Is this correct?

Thanks a lot!

On Wed, 19 Sep 2018 at 22:34, Felipe Bravo <[hidden email]> wrote:
Yes, you should get the same attributes on both the training and testing sets.
Cheers

On Thu, Sep 20, 2018 at 6:57 AM Jonnathan Carvalho <[hidden email]> wrote:
Hi Felipe,

So, I can use the TweetToEmbeddingFeatureVector to calculate features for the tweets in the test set independently from the training set...
Is that correct?

Thanks a lot!

Cheers,
Jonnathan

On Mon, 17 Sep 2018 at 00:07, Felipe Bravo <[hidden email]> wrote:
Hi Jonnathan,
Word embeddings are usually trained from large copora.  The TweetToEmbeddingFeatureVector will calculate features from pretrained word embeddings in CSV format. Words from the training and testing sets not included in the embedding file will be discarded. The AffectiveTweets package provides embeddings trained from a big corpus of tweets that can be downloaded on the following link: https://github.com/felipebravom/AffectiveTweets/releases/download/1.0.0/w2v.twitter.edinburgh10M.400d.csv.gz

You can also train your own embeddings using the WekaDeepLearning4j package.

Cheers,
Felipe

On Sun, Sep 16, 2018 at 10:49 AM Jonnathan Carvalho <[hidden email]> wrote:
Hi Felipe,

Thanks a lot for the explanation!!

Considering that when we extract word n-grams from a corpus, the vocabulary is based on the words that appear in the training instances, how does it work when we are using word embeddings?
More specifically, I have one dataset of tweets divided into training and test instances, and I need to extract word embeddings to train a SVM classifier.
How can I achieve this using the TweetToEmbeddingFeatureVector? Any example using Java code?

Cheers!
Jonnathan

On Thu, 23 Aug 2018 at 19:50, Felipe Bravo <[hidden email]> wrote:
Hi Jonathan,
Word embeddings project discrete words into high-dimensional dense vectors with the aim of preserving word meaning into the embedding space. More details here: http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/
The TweetToEmbeddingFeatureVector creates a sentence-level representation by aggregating the embedding values of the words within a sentence. Aggregation can be done by averaging, adding, or concatenation. The default configuration of the filter uses pre-trained word vectors of 100 dimensions  and averages the word vectors withing a sentence. This is because you are getting 100 attributes (e.g., embedding-0, embedding-1, etc).
You can also train your own word embeddings using the filters provided by the WekaDeepLearning4j package.
I hope this helps.
Cheers,
Felipe

On Thu, Aug 23, 2018 at 3:16 PM Jonnathan Carvalho <[hidden email]> wrote:
Hi Felipe,

As you have suggested, I used the AffectiveTweets package to get the word embeddings for tweets in Weka, using its default parameters, but I couldn’t understand what the generated dimensions mean (from embedding-0 to embedding-99)...

Do you recommend any reading?

Thanks a lot!

Cheers,
Jonnathan.

On 22 Aug 2018, at 01:16, Felipe Bravo <[hidden email]> wrote:

Hi,
Yes you can get a document-level representation from pre-trained embeddings using the AffectiveTweets package (https://github.com/felipebravom/AffectiveTweets) or can even train your own embeddings using the deeplearning package (https://deeplearning.cms.waikato.ac.nz/). 
Cheers,
Felipe

On Wed, Aug 22, 2018 at 11:03 AM Jonnathan Carvalho <[hidden email]> wrote:
Hi, All!

I’m trying to figure out what word embeddings is...

Is it possible to use the dense feature representation generated by this technique with learning algorithms such as SVM? Or with neural networks only?

Does Weka support word embeddings?

Thanks!
Cheers!
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Cheers,
Felipe
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Cheers,
Felipe
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Jonnathan Carvalho
Instituto Federal de Educação, Ciência e Tecnologia Fluminense (RJ)
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Cheers,
Felipe
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Jonnathan Carvalho
Instituto Federal de Educação, Ciência e Tecnologia Fluminense (RJ)
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Cheers,
Felipe
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Jonnathan Carvalho
Instituto Federal de Educação, Ciência e Tecnologia Fluminense (RJ)
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Cheers,
Felipe

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Word Embeddings in Weka

joncarv
Hi, Felipe!

Hope you can help me one more time!

When using the embedding vectors (extracted from pre-trained embeddings) to train a SVM classifier, do you think it is necessary/correct to normalise the data?

Thanks a lot!
Jonnathan.

On Thu, 27 Sep 2018 at 19:21, Felipe Bravo <[hidden email]> wrote:
Hi Jonnathan,
It looks good to me.
Cheers,
Felipe

On Wed, Sep 26, 2018 at 10:19 PM Jonnathan Carvalho <[hidden email]> wrote:
Hi Felipe,

Could you please help me one more time?

I'm extracting word embedding vector using the pre-trained embeddings from w2v.twitter.edinburgh10M.400d.csv.gz, as recommended in the AffectiveTweets documentation:

...
TweetToEmbeddingsFeatureVector weFilter = new TweetToEmbeddingsFeatureVector();
weFilter.setInputFormat(instances);
CSVEmbeddingHandler handler = new CSVEmbeddingHandler();
handler.setEmbeddingsFile(new File("/home/joncarv/test-embeddings/w2v.twitter.edinburgh10M.400d.csv.gz"));
weFilter.setEmbeddingHandler(handler);
instances = Filter.useFilter(instances, weFilter);
...

Is this correct?

Thanks a lot!

On Wed, 19 Sep 2018 at 22:34, Felipe Bravo <[hidden email]> wrote:
Yes, you should get the same attributes on both the training and testing sets.
Cheers

On Thu, Sep 20, 2018 at 6:57 AM Jonnathan Carvalho <[hidden email]> wrote:
Hi Felipe,

So, I can use the TweetToEmbeddingFeatureVector to calculate features for the tweets in the test set independently from the training set...
Is that correct?

Thanks a lot!

Cheers,
Jonnathan

On Mon, 17 Sep 2018 at 00:07, Felipe Bravo <[hidden email]> wrote:
Hi Jonnathan,
Word embeddings are usually trained from large copora.  The TweetToEmbeddingFeatureVector will calculate features from pretrained word embeddings in CSV format. Words from the training and testing sets not included in the embedding file will be discarded. The AffectiveTweets package provides embeddings trained from a big corpus of tweets that can be downloaded on the following link: https://github.com/felipebravom/AffectiveTweets/releases/download/1.0.0/w2v.twitter.edinburgh10M.400d.csv.gz

You can also train your own embeddings using the WekaDeepLearning4j package.

Cheers,
Felipe

On Sun, Sep 16, 2018 at 10:49 AM Jonnathan Carvalho <[hidden email]> wrote:
Hi Felipe,

Thanks a lot for the explanation!!

Considering that when we extract word n-grams from a corpus, the vocabulary is based on the words that appear in the training instances, how does it work when we are using word embeddings?
More specifically, I have one dataset of tweets divided into training and test instances, and I need to extract word embeddings to train a SVM classifier.
How can I achieve this using the TweetToEmbeddingFeatureVector? Any example using Java code?

Cheers!
Jonnathan

On Thu, 23 Aug 2018 at 19:50, Felipe Bravo <[hidden email]> wrote:
Hi Jonathan,
Word embeddings project discrete words into high-dimensional dense vectors with the aim of preserving word meaning into the embedding space. More details here: http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/
The TweetToEmbeddingFeatureVector creates a sentence-level representation by aggregating the embedding values of the words within a sentence. Aggregation can be done by averaging, adding, or concatenation. The default configuration of the filter uses pre-trained word vectors of 100 dimensions  and averages the word vectors withing a sentence. This is because you are getting 100 attributes (e.g., embedding-0, embedding-1, etc).
You can also train your own word embeddings using the filters provided by the WekaDeepLearning4j package.
I hope this helps.
Cheers,
Felipe

On Thu, Aug 23, 2018 at 3:16 PM Jonnathan Carvalho <[hidden email]> wrote:
Hi Felipe,

As you have suggested, I used the AffectiveTweets package to get the word embeddings for tweets in Weka, using its default parameters, but I couldn’t understand what the generated dimensions mean (from embedding-0 to embedding-99)...

Do you recommend any reading?

Thanks a lot!

Cheers,
Jonnathan.

On 22 Aug 2018, at 01:16, Felipe Bravo <[hidden email]> wrote:

Hi,
Yes you can get a document-level representation from pre-trained embeddings using the AffectiveTweets package (https://github.com/felipebravom/AffectiveTweets) or can even train your own embeddings using the deeplearning package (https://deeplearning.cms.waikato.ac.nz/). 
Cheers,
Felipe

On Wed, Aug 22, 2018 at 11:03 AM Jonnathan Carvalho <[hidden email]> wrote:
Hi, All!

I’m trying to figure out what word embeddings is...

Is it possible to use the dense feature representation generated by this technique with learning algorithms such as SVM? Or with neural networks only?

Does Weka support word embeddings?

Thanks!
Cheers!
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Cheers,
Felipe
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Cheers,
Felipe
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Jonnathan Carvalho
Instituto Federal de Educação, Ciência e Tecnologia Fluminense (RJ)
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Cheers,
Felipe
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Jonnathan Carvalho
Instituto Federal de Educação, Ciência e Tecnologia Fluminense (RJ)
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Cheers,
Felipe
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Jonnathan Carvalho
Instituto Federal de Educação, Ciência e Tecnologia Fluminense (RJ)
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Cheers,
Felipe
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Jonnathan Carvalho
Instituto Federal de Educação, Ciência e Tecnologia Fluminense (RJ)

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Jonnathan Carvalho
PhD Student
Machine Learning Group
Universidade Federal Fluminense (Brazil)
http://www.ic.uff.br
Reply | Threaded
Open this post in threaded view
|

Re: Word Embeddings in Weka

Felipe Bravo
Hi Jonnathan,
It may be worth running a comparison of both schemes and find that out. My guess is that results will be very similar.
Cheers,
Felipe

On Mon, Nov 5, 2018 at 9:53 PM Jonnathan Carvalho <[hidden email]> wrote:
Hi, Felipe!

Hope you can help me one more time!

When using the embedding vectors (extracted from pre-trained embeddings) to train a SVM classifier, do you think it is necessary/correct to normalise the data?

Thanks a lot!
Jonnathan.

On Thu, 27 Sep 2018 at 19:21, Felipe Bravo <[hidden email]> wrote:
Hi Jonnathan,
It looks good to me.
Cheers,
Felipe

On Wed, Sep 26, 2018 at 10:19 PM Jonnathan Carvalho <[hidden email]> wrote:
Hi Felipe,

Could you please help me one more time?

I'm extracting word embedding vector using the pre-trained embeddings from w2v.twitter.edinburgh10M.400d.csv.gz, as recommended in the AffectiveTweets documentation:

...
TweetToEmbeddingsFeatureVector weFilter = new TweetToEmbeddingsFeatureVector();
weFilter.setInputFormat(instances);
CSVEmbeddingHandler handler = new CSVEmbeddingHandler();
handler.setEmbeddingsFile(new File("/home/joncarv/test-embeddings/w2v.twitter.edinburgh10M.400d.csv.gz"));
weFilter.setEmbeddingHandler(handler);
instances = Filter.useFilter(instances, weFilter);
...

Is this correct?

Thanks a lot!

On Wed, 19 Sep 2018 at 22:34, Felipe Bravo <[hidden email]> wrote:
Yes, you should get the same attributes on both the training and testing sets.
Cheers

On Thu, Sep 20, 2018 at 6:57 AM Jonnathan Carvalho <[hidden email]> wrote:
Hi Felipe,

So, I can use the TweetToEmbeddingFeatureVector to calculate features for the tweets in the test set independently from the training set...
Is that correct?

Thanks a lot!

Cheers,
Jonnathan

On Mon, 17 Sep 2018 at 00:07, Felipe Bravo <[hidden email]> wrote:
Hi Jonnathan,
Word embeddings are usually trained from large copora.  The TweetToEmbeddingFeatureVector will calculate features from pretrained word embeddings in CSV format. Words from the training and testing sets not included in the embedding file will be discarded. The AffectiveTweets package provides embeddings trained from a big corpus of tweets that can be downloaded on the following link: https://github.com/felipebravom/AffectiveTweets/releases/download/1.0.0/w2v.twitter.edinburgh10M.400d.csv.gz

You can also train your own embeddings using the WekaDeepLearning4j package.

Cheers,
Felipe

On Sun, Sep 16, 2018 at 10:49 AM Jonnathan Carvalho <[hidden email]> wrote:
Hi Felipe,

Thanks a lot for the explanation!!

Considering that when we extract word n-grams from a corpus, the vocabulary is based on the words that appear in the training instances, how does it work when we are using word embeddings?
More specifically, I have one dataset of tweets divided into training and test instances, and I need to extract word embeddings to train a SVM classifier.
How can I achieve this using the TweetToEmbeddingFeatureVector? Any example using Java code?

Cheers!
Jonnathan

On Thu, 23 Aug 2018 at 19:50, Felipe Bravo <[hidden email]> wrote:
Hi Jonathan,
Word embeddings project discrete words into high-dimensional dense vectors with the aim of preserving word meaning into the embedding space. More details here: http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/
The TweetToEmbeddingFeatureVector creates a sentence-level representation by aggregating the embedding values of the words within a sentence. Aggregation can be done by averaging, adding, or concatenation. The default configuration of the filter uses pre-trained word vectors of 100 dimensions  and averages the word vectors withing a sentence. This is because you are getting 100 attributes (e.g., embedding-0, embedding-1, etc).
You can also train your own word embeddings using the filters provided by the WekaDeepLearning4j package.
I hope this helps.
Cheers,
Felipe

On Thu, Aug 23, 2018 at 3:16 PM Jonnathan Carvalho <[hidden email]> wrote:
Hi Felipe,

As you have suggested, I used the AffectiveTweets package to get the word embeddings for tweets in Weka, using its default parameters, but I couldn’t understand what the generated dimensions mean (from embedding-0 to embedding-99)...

Do you recommend any reading?

Thanks a lot!

Cheers,
Jonnathan.

On 22 Aug 2018, at 01:16, Felipe Bravo <[hidden email]> wrote:

Hi,
Yes you can get a document-level representation from pre-trained embeddings using the AffectiveTweets package (https://github.com/felipebravom/AffectiveTweets) or can even train your own embeddings using the deeplearning package (https://deeplearning.cms.waikato.ac.nz/). 
Cheers,
Felipe

On Wed, Aug 22, 2018 at 11:03 AM Jonnathan Carvalho <[hidden email]> wrote:
Hi, All!

I’m trying to figure out what word embeddings is...

Is it possible to use the dense feature representation generated by this technique with learning algorithms such as SVM? Or with neural networks only?

Does Weka support word embeddings?

Thanks!
Cheers!
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Cheers,
Felipe
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Cheers,
Felipe
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Jonnathan Carvalho
Instituto Federal de Educação, Ciência e Tecnologia Fluminense (RJ)
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Cheers,
Felipe
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Jonnathan Carvalho
Instituto Federal de Educação, Ciência e Tecnologia Fluminense (RJ)
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Cheers,
Felipe
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Jonnathan Carvalho
Instituto Federal de Educação, Ciência e Tecnologia Fluminense (RJ)
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Cheers,
Felipe
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Jonnathan Carvalho
Instituto Federal de Educação, Ciência e Tecnologia Fluminense (RJ)


--
Cheers,
Felipe

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Word Embeddings in Weka

mcbenly
Hi Felipe,
I see AffectiveTweet package comes with 100-D. Do you know where could I
could 300,500-D repository to use in AffectiveTweet package in Weka?


Thanks, Ben



--
Sent from: https://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Word Embeddings in Weka

Felipe Bravo
Hi,
The package allows using any pre-trained word embeddings file as long as it is in CSV format.
Cheers,
Felipe

On Fri, Aug 30, 2019 at 5:53 PM mcbenly <[hidden email]> wrote:
Hi Felipe,
I see AffectiveTweet package comes with 100-D. Do you know where could I
could 300,500-D repository to use in AffectiveTweet package in Weka?


Thanks, Ben



--
Sent from: https://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html