Text Classification on DistrbutedWekaSpark is failing

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Text Classification on DistrbutedWekaSpark is failing

Dear Weka authors,

I'm struggling to get my classifier running for a not very large dataset
*198,000* records but huge feature set *range from 11,000 to 2 million*
features depending on N-grams. Using the package manager I downloaded the
DistributedWekaSpark and tried to run two classifiers in the Knowledge flow,
however I'm getting the popular error of <<Size exceeds Integer.MAX_VALUE>>.
I tried changing the minInputslices from 4 to 6. Tried also setting the
Memory fraction from 0.6 to 1. But no luck :(
I'm very familiar with WEKA but new to the Spark world!

Any guidance would be greatly appreciated..

Sent from: https://weka.8497.n7.nabble.com/
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html