Quantcast

Weka 3.9 - DistributedWekaSpark

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Weka 3.9 - DistributedWekaSpark

Desai Ankit
Hi Weka Group,

How to run distributedWekaSpark jobs using EMR yarn cluster?

what should be the values of master and port?

It works fine on local[*]. 

Does distributedWekaSpark support Spark 2.0.1? 

Any help is appreciated. 

Thanks in advance. 
--
Ankit Desai

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Weka 3.9 - DistributedWekaSpark

Mark Hall
I haven't tried running distributedWekaSpark under AWS/EMR, but have on YARN-based Cloudera and Hortonworks clusters. You need to replace all the spark libraries in ~/wekafiles/packages/distributedWekaSpark/lib with the spark assembly jar from the cluster that you want to run against. In order for client-side Spark to pick up important Hadoop cluster settings (such as resource manager host etc.) you will also need the config dir of the Hadoop cluster to be in Weka's CLASSPATH. distributedWekaSpark only supports yarn-client mode for running on YARN clusters. This is the mode where the driver program executes on the local machine and the YARN resource manager is simply used to provision worker nodes in the Hadoop cluster for Spark to use. So, you would enter "yarn-client" in the master property when configuring the Weka job. The port field can be left blank (from memory) as Spark picks up all pertinent settings from the Hadoop config files. Depending on how much pain is involve
 d with opening ports/services of AWS hosts to the outside world (and Spark uses quite a few for comms), you would probably be best to install Weka on an AWS node and run it from the command line.

distributedWekaSpark does not support Spark 2.x yet. There are breaking API changes between Spark 1 and Spark 2 that will require fair amount of work (and probably a separate Weka package) to support.

Cheers,
Mark.

On 11/01/17, 10:13 PM, "Desai Ankit" <[hidden email] on behalf of [hidden email]> wrote:

    Hi Weka Group,
    How to run distributedWekaSpark jobs using EMR yarn cluster?
   
    what should be the values of master and port?
   
    It works fine on local[*].
   
    Does distributedWekaSpark support Spark 2.0.1?
   
    Any help is appreciated.
   
    Thanks in advance.
    --
    Ankit Desai
   
   
   
   
   
    _______________________________________________
    Wekalist mailing list
    Send posts to: [hidden email]
    List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
    List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
   


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Loading...