weka deep-learning with arabic nlp text

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

weka deep-learning with arabic nlp text

aalmhirat
This post was updated on .
Hi ,
I'm a beginner in weka; my research is classification text in Arabic tweets using CNN and RNN deep learning, I already install the package wekadeeplearning4j in weka,  
Set Option:
1- (Dataset) I collect a dataset from Twitter about 10000 tweets in the Arabic language to classification text in multi-label into 4 class (c,r.d,o)
2- (preprocess tabe) I didn't know what option I used in the preprocess tabe especially filters(DL4JSTRING TO WORD2 VECTOR OR AFFECTIVE TWEET TO EMBEDDING FEATURE VECTOR ) before going to the classifier and what I uploaded embedding file in embedding handler for Arabic  text language.  
3-(classify tabe) I choose dl4jmllpclassifier
 3.1 (layer specification) I choose 4 layers in orderly
 Convolution layer (activation function: activationRelu, convolutionmode: same)
 Convolution layer (activation function: activationRelu, convolutionmode: same)
 Pooling layer
 Output layer (output: 4 number of label)
3.2 instance iterator I choose (sequnces/text/cnntextembeddinginstanceiterator)
in this option (LOCATION OF WORD VECTOR)I didn’t know from where I downloaded     the file for (Arabic language) , I already choose
(googlenews-vector-negative-300.bin.gz)/ is there a list Arabic embedding here in googlenews?
       
THE PROBLEM :
1- Scenario one: when i apply the DL4JSTRING TO WORD 2VECTORIN PREPROCESS then go to tabe classify and set option cnntextembedding immediately the (START) is hidden.
2- Scenario two: when neglected the preprocess and go direct to classify tabe and set option as in the above-mentioned and choose the word embeddings as follwes :
 2.1 (polyglot-ar.csv.gz) from (https://sites.google.com/site/rmyeid/projects/polyglot)
      the message error show: problem evaluating classifier: null.
2.2 (googlenews-vector-negative-300.bin.gz): NO ERROR but in the bottom
       (Building model in training data)
In the console give me:
Caused by: java.lang.OutOfMemoryError: Physical memory usage is too high: physic
alBytes (1789M) > maxPhysicalBytes (1789M)
        at org.bytedeco.javacpp.Pointer.deallocator(Pointer.java:682)
        at org.bytedeco.javacpp.Pointer.init(Pointer.java:127)
        at org.bytedeco.javacpp.FloatPointer.allocateArray(Native Method)
        at org.bytedeco.javacpp.FloatPointer.<init>(FloatPointer.java:80)
        ... 17 more
2.3 (glove.twitter.27B.zip) from the page (https://nlp.stanford.edu/projects/glove/)
      The message error show: UNABLE TO GUESS INPUT FILE FORMAT