Text Classification on DistrbutedWekaSpark is failing
Dear Weka authors,
I'm struggling to get my classifier running for a not very large dataset
*198,000* records but huge feature set *range from 11,000 to 2 million*
features depending on N-grams. Using the package manager I downloaded the
DistributedWekaSpark and tried to run two classifiers in the Knowledge flow,
however I'm getting the popular error of <<Size exceeds Integer.MAX_VALUE>>.
I tried changing the minInputslices from 4 to 6. Tried also setting the
Memory fraction from 0.6 to 1. But no luck :(
I'm very familiar with WEKA but new to the Spark world!