Character Level CNN

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Character Level CNN

matveyryabov
Hello,

I am trying to try character level text classification in Weka. I plan on
using wekadeeplearning4j. From what I found I could use the
.CnnTextEmbeddingInstanceCreator to use string as an input for convolution
layers. However, I cannot figure out how to get it to work with characters
rather than embeddings or words and such. I was wondering if anyone had any
experience in character level classification in Weka and whether they would
be able to explain/point me in the right direction. I am still relatively
new to the weka environment, and data science in general, so I might be
missing something that is right in front of my nose.



--
Sent from: https://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Character Level CNN

Eibe Frank-2
Administrator
The most flexible way of processing sequences using WekaDeeplearning4j is to use the RelationalInstancesIterator mentioned close to the bottom of the page at

  https://deeplearning.cms.waikato.ac.nz/user-guide/data/

For example, you could make a one-dimensional “time series" from your text by giving the sequence of ASCII codes in the text and provide that to WEKA in the form of a value of a relation-valued attribute. (Note that this may not be a particularly sensible approach.) Or you could embed each character into a vector and make a multivariate “time series” (again, providing this multivariate time series as a value of a corresponding relation-valued attribute).

Relational attributes were originally introduced in WEKA to represent multiple instance data, but they are now also used to represent (multi-variate) time series. The only difference is in the interpretation of the data in a relational value: in the latter case, the individual instances in the relation are assumed to be ordered by “time”. An example multi-instance dataset containing only two relation-valued instances can be found at

  https://weka.8497.n7.nabble.com/Relational-attributes-td8117.html

Cheers,
Eibe

> On 17/11/2019, at 5:31 PM, matveyryabov <[hidden email]> wrote:
>
> Hello,
>
> I am trying to try character level text classification in Weka. I plan on
> using wekadeeplearning4j. From what I found I could use the
> .CnnTextEmbeddingInstanceCreator to use string as an input for convolution
> layers. However, I cannot figure out how to get it to work with characters
> rather than embeddings or words and such. I was wondering if anyone had any
> experience in character level classification in Weka and whether they would
> be able to explain/point me in the right direction. I am still relatively
> new to the weka environment, and data science in general, so I might be
> missing something that is right in front of my nose.
>
>
>
> --
> Sent from: https://weka.8497.n7.nabble.com/
> _______________________________________________
> Wekalist mailing list -- [hidden email]
> Send posts to: To unsubscribe send an email to [hidden email]
> To subscribe, unsubscribe, etc., visit
> https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Character Level CNN

matveyryabov
Hello,

Thanks for the reply! I ended up using the convolutional instance iterator and then have each column of the arrf file representing an index in the document and then each row representing a document. Then the character in that document at that index is then represented by a number - 1-37. Then by making the convolutional instance iterator interpret each row as a image (1 row x 3740 columns) I was able to get the input into a convolutional layer. However I now face another problem. My computer is a little weak for the task on hand so I was trying to get weka to work with the google cloud platform. I was able to get weka up and working, and was able to run the default configuration of the dl4j multilayered perceptron on the data. However this was not very useful - it was more a proof of concept. I was now wondering (not sure whether I need to make a separate email for this) why weka hangs after seemingly loading Blas MLK. I understand that Blas is the mathematical library behind wekadeeplearning4j but I am unsure whether it is having issues with that large of a dataset. Usually I get a message such as "Building model on training set" and then that can take a second, but in this case it just hangs at the declaration that "Blas Vendor: [MLK]". I guess I am wondering whether this is what happens with large datasets or whether there might be a log I can check, to make sure it's not another issue. If anyone knows, it would be greatly appreciated. 

Thanks,

Matvey Ryabov 

On Sun, Nov 17, 2019 at 10:31 PM Eibe Frank <[hidden email]> wrote:
The most flexible way of processing sequences using WekaDeeplearning4j is to use the RelationalInstancesIterator mentioned close to the bottom of the page at

  https://deeplearning.cms.waikato.ac.nz/user-guide/data/

For example, you could make a one-dimensional “time series" from your text by giving the sequence of ASCII codes in the text and provide that to WEKA in the form of a value of a relation-valued attribute. (Note that this may not be a particularly sensible approach.) Or you could embed each character into a vector and make a multivariate “time series” (again, providing this multivariate time series as a value of a corresponding relation-valued attribute).

Relational attributes were originally introduced in WEKA to represent multiple instance data, but they are now also used to represent (multi-variate) time series. The only difference is in the interpretation of the data in a relational value: in the latter case, the individual instances in the relation are assumed to be ordered by “time”. An example multi-instance dataset containing only two relation-valued instances can be found at

  https://weka.8497.n7.nabble.com/Relational-attributes-td8117.html

Cheers,
Eibe

> On 17/11/2019, at 5:31 PM, matveyryabov <[hidden email]> wrote:
>
> Hello,
>
> I am trying to try character level text classification in Weka. I plan on
> using wekadeeplearning4j. From what I found I could use the
> .CnnTextEmbeddingInstanceCreator to use string as an input for convolution
> layers. However, I cannot figure out how to get it to work with characters
> rather than embeddings or words and such. I was wondering if anyone had any
> experience in character level classification in Weka and whether they would
> be able to explain/point me in the right direction. I am still relatively
> new to the weka environment, and data science in general, so I might be
> missing something that is right in front of my nose.
>
>
>
> --
> Sent from: https://weka.8497.n7.nabble.com/
> _______________________________________________
> Wekalist mailing list -- [hidden email]
> Send posts to: To unsubscribe send an email to [hidden email]
> To subscribe, unsubscribe, etc., visit
> https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html