Distributed K-means Spark

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Distributed K-means Spark

Desai Ankit
Hi mark, 

I am trying to run a sample provided under distributedWekaSpark for distributed K-Means. 

I end up with following error. 

 java.lang.UnsupportedOperationException
at java.util.AbstractList.set(AbstractList.java:132)
at weka.distributed.spark.KMeansClustererSparkJob.initializeWithRandomCenters(KMeansClustererSparkJob.java:1577)
at weka.distributed.spark.KMeansClustererSparkJob.initializeWithKMeansParallel(KMeansClustererSparkJob.java:1209)
at weka.distributed.spark.KMeansClustererSparkJob.buildClusterer(KMeansClustererSparkJob.java:896)
at weka.distributed.spark.KMeansClustererSparkJob.runJobWithContext(KMeansClustererSparkJob.java:1702)
at weka.knowledgeflow.steps.AbstractSparkJob.runJob(AbstractSparkJob.java:264)
at weka.knowledgeflow.steps.AbstractSparkJob.processIncoming(AbstractSparkJob.java:232)
at weka.knowledgeflow.StepManagerImpl.processIncoming(StepManagerImpl.java:1045)
at weka.knowledgeflow.BaseExecutionEnvironment$6.run(BaseExecutionEnvironment.java:493)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)

Can you please help to get rid of this?

Thanks.
--

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Distributed K-means Spark

Mark Hall
I'm guessing that you are running against a Spark cluster using a recent version of Spark 1.x. At some point, Spark's JavaRDD.takeWithSample() method starting returning a list implementation that prevented the set() operation from being used. A work-around for this is implemented in KMeansClustererSpark job in the new distributedWekaSparkDev package, but I should probably back-port that to distributedWekaSpark and make a new release of that package too.

Cheers,
Mark.

On 2/08/17, 7:19 PM, "Desai Ankit" <[hidden email] on behalf of [hidden email]> wrote:

    Hi mark,
    I am trying to run a sample provided under distributedWekaSpark for distributed K-Means.
   
    I end up with following error.
   
     java.lang.UnsupportedOperationException
    at java.util.AbstractList.set(AbstractList.java:132)
    at weka.distributed.spark.KMeansClustererSparkJob.initializeWithRandomCenters(KMeansClustererSparkJob.java:1577)
    at weka.distributed.spark.KMeansClustererSparkJob.initializeWithKMeansParallel(KMeansClustererSparkJob.java:1209)
    at weka.distributed.spark.KMeansClustererSparkJob.buildClusterer(KMeansClustererSparkJob.java:896)
    at weka.distributed.spark.KMeansClustererSparkJob.runJobWithContext(KMeansClustererSparkJob.java:1702)
    at weka.knowledgeflow.steps.AbstractSparkJob.runJob(AbstractSparkJob.java:264)
    at weka.knowledgeflow.steps.AbstractSparkJob.processIncoming(AbstractSparkJob.java:232)
    at weka.knowledgeflow.StepManagerImpl.processIncoming(StepManagerImpl.java:1045)
    at weka.knowledgeflow.BaseExecutionEnvironment$6.run(BaseExecutionEnvironment.java:493)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:748)
   
    Can you please help to get rid of this?
   
    Thanks.
    --
    Ankit Desai
   
    desaiankitb.tk <http://desaiankitb.tk>
    http://ankitbdesai.blogspot.in/ 
   
   
   
   
   
   
   
   
   
    _______________________________________________
    Wekalist mailing list
    Send posts to: [hidden email]
    List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
    List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
   


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Distributed K-means Spark

Desai Ankit
my appologies. I should have mentioned that I am running this sample on local[*] so it uses default spark that is available in weka dependancy, if i am not wrong. 

On Wed, Aug 2, 2017 at 8:58 PM, Mark Hall <[hidden email]> wrote:
I'm guessing that you are running against a Spark cluster using a recent version of Spark 1.x. At some point, Spark's JavaRDD.takeWithSample() method starting returning a list implementation that prevented the set() operation from being used. A work-around for this is implemented in KMeansClustererSpark job in the new distributedWekaSparkDev package, but I should probably back-port that to distributedWekaSpark and make a new release of that package too.

Cheers,
Mark.

On 2/08/17, 7:19 PM, "Desai Ankit" <[hidden email] on behalf of [hidden email]> wrote:

    Hi mark,
    I am trying to run a sample provided under distributedWekaSpark for distributed K-Means.

    I end up with following error.

     java.lang.UnsupportedOperationException
        at java.util.AbstractList.set(AbstractList.java:132)
        at weka.distributed.spark.KMeansClustererSparkJob.initializeWithRandomCenters(KMeansClustererSparkJob.java:1577)
        at weka.distributed.spark.KMeansClustererSparkJob.initializeWithKMeansParallel(KMeansClustererSparkJob.java:1209)
        at weka.distributed.spark.KMeansClustererSparkJob.buildClusterer(KMeansClustererSparkJob.java:896)
        at weka.distributed.spark.KMeansClustererSparkJob.runJobWithContext(KMeansClustererSparkJob.java:1702)
        at weka.knowledgeflow.steps.AbstractSparkJob.runJob(AbstractSparkJob.java:264)
        at weka.knowledgeflow.steps.AbstractSparkJob.processIncoming(AbstractSparkJob.java:232)
        at weka.knowledgeflow.StepManagerImpl.processIncoming(StepManagerImpl.java:1045)
        at weka.knowledgeflow.BaseExecutionEnvironment$6.run(BaseExecutionEnvironment.java:493)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:748)

    Can you please help to get rid of this?

    Thanks.
    --
    Ankit Desai

    desaiankitb.tk <http://desaiankitb.tk>
    http://ankitbdesai.blogspot.in/









    _______________________________________________
    Wekalist mailing list
    Send posts to: [hidden email]
    List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
    List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



--

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Distributed K-means Spark

Mark Hall
Hmm, given that the default Spark dependency (1.1) hasn't changed, and it works fine on my machines, I'm not sure what is going wrong in your case. Which JVM are you using? Are you sure that the libraries in ~/wekafiles/packages/distributedWekaSpark/lib are the ones that came with the package?

Cheers,
Mark.

On 3/08/17, 7:08 PM, "Desai Ankit" <[hidden email] on behalf of [hidden email]> wrote:

    my appologies. I should have mentioned that I am running this sample on local[*] so it uses default spark that is available in weka dependancy, if i am not wrong.
   
    On Wed, Aug 2, 2017 at 8:58 PM, Mark Hall <[hidden email]> wrote:
   
    I'm guessing that you are running against a Spark cluster using a recent version of Spark 1.x. At some point, Spark's JavaRDD.takeWithSample() method starting returning a list implementation that prevented the set() operation from being used. A work-around for this is implemented in KMeansClustererSpark job in the new distributedWekaSparkDev package, but I should probably back-port that to distributedWekaSpark and make a new release of that package too.
   
    Cheers,
    Mark.
   
    On 2/08/17, 7:19 PM, "Desai Ankit" <[hidden email] on behalf of [hidden email]> wrote:
   
        Hi mark,
        I am trying to run a sample provided under distributedWekaSpark for distributed K-Means.
   
        I end up with following error.
   
         java.lang.UnsupportedOperationException
            at java.util.AbstractList.set(AbstractList.java:132)
            at weka.distributed.spark.KMeansClustererSparkJob.initializeWithRandomCenters(KMeansClustererSparkJob.java:1577)
            at weka.distributed.spark.KMeansClustererSparkJob.initializeWithKMeansParallel(KMeansClustererSparkJob.java:1209)
            at weka.distributed.spark.KMeansClustererSparkJob.buildClusterer(KMeansClustererSparkJob.java:896)
            at weka.distributed.spark.KMeansClustererSparkJob.runJobWithContext(KMeansClustererSparkJob.java:1702)
            at weka.knowledgeflow.steps.AbstractSparkJob.runJob(AbstractSparkJob.java:264)
            at weka.knowledgeflow.steps.AbstractSparkJob.processIncoming(AbstractSparkJob.java:232)
            at weka.knowledgeflow.StepManagerImpl.processIncoming(StepManagerImpl.java:1045)
            at weka.knowledgeflow.BaseExecutionEnvironment$6.run(BaseExecutionEnvironment.java:493)
            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
            at java.util.concurrent.FutureTask.run(FutureTask.java:266)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
            at java.lang.Thread.run(Thread.java:748)
   
        Can you please help to get rid of this?
   
        Thanks.
        --
        Ankit Desai
   
        desaiankitb.tk <http://desaiankitb.tk> <http://desaiankitb.tk>
        http://ankitbdesai.blogspot.in/
   
   
   
   
   
   
   
   
   
        _______________________________________________
        Wekalist mailing list
        Send posts to: [hidden email]
        List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
        List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
   
   
   
    _______________________________________________
    Wekalist mailing list
    Send posts to: [hidden email]
    List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
    List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
   
   
   
   
   
   
    --
    Ankit Desai
   
    desaiankitb.tk <http://desaiankitb.tk>
    http://ankitbdesai.blogspot.in/ 
   
   
   
   
   
   
   
   
    _______________________________________________
    Wekalist mailing list
    Send posts to: [hidden email]
    List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
    List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
   


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Distributed K-means Spark

Desai Ankit
my java is Oracle 

java version "1.8.0_131"
Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)

Using ubuntu 14.04 lte 

wekafiles/packages/distributedWekaSpark/lib contains all jars that comes by default with package install. (snapshot attached) 



On Thu, Aug 3, 2017 at 8:12 AM, Mark Hall <[hidden email]> wrote:
Hmm, given that the default Spark dependency (1.1) hasn't changed, and it works fine on my machines, I'm not sure what is going wrong in your case. Which JVM are you using? Are you sure that the libraries in ~/wekafiles/packages/distributedWekaSpark/lib are the ones that came with the package?

Cheers,
Mark.

On 3/08/17, 7:08 PM, "Desai Ankit" <[hidden email] on behalf of [hidden email]> wrote:

    my appologies. I should have mentioned that I am running this sample on local[*] so it uses default spark that is available in weka dependancy, if i am not wrong.

    On Wed, Aug 2, 2017 at 8:58 PM, Mark Hall <[hidden email]> wrote:

    I'm guessing that you are running against a Spark cluster using a recent version of Spark 1.x. At some point, Spark's JavaRDD.takeWithSample() method starting returning a list implementation that prevented the set() operation from being used. A work-around for this is implemented in KMeansClustererSpark job in the new distributedWekaSparkDev package, but I should probably back-port that to distributedWekaSpark and make a new release of that package too.

    Cheers,
    Mark.

    On 2/08/17, 7:19 PM, "Desai Ankit" <[hidden email] on behalf of [hidden email]> wrote:

        Hi mark,
        I am trying to run a sample provided under distributedWekaSpark for distributed K-Means.

        I end up with following error.

         java.lang.UnsupportedOperationException
            at java.util.AbstractList.set(AbstractList.java:132)
            at weka.distributed.spark.KMeansClustererSparkJob.initializeWithRandomCenters(KMeansClustererSparkJob.java:1577)
            at weka.distributed.spark.KMeansClustererSparkJob.initializeWithKMeansParallel(KMeansClustererSparkJob.java:1209)
            at weka.distributed.spark.KMeansClustererSparkJob.buildClusterer(KMeansClustererSparkJob.java:896)
            at weka.distributed.spark.KMeansClustererSparkJob.runJobWithContext(KMeansClustererSparkJob.java:1702)
            at weka.knowledgeflow.steps.AbstractSparkJob.runJob(AbstractSparkJob.java:264)
            at weka.knowledgeflow.steps.AbstractSparkJob.processIncoming(AbstractSparkJob.java:232)
            at weka.knowledgeflow.StepManagerImpl.processIncoming(StepManagerImpl.java:1045)
            at weka.knowledgeflow.BaseExecutionEnvironment$6.run(BaseExecutionEnvironment.java:493)
            at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
            at java.util.concurrent.FutureTask.run(FutureTask.java:266)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
            at java.lang.Thread.run(Thread.java:748)

        Can you please help to get rid of this?

        Thanks.
        --
        Ankit Desai

        desaiankitb.tk <http://desaiankitb.tk> <http://desaiankitb.tk>
        http://ankitbdesai.blogspot.in/









        _______________________________________________
        Wekalist mailing list
        Send posts to: [hidden email]
        List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
        List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



    _______________________________________________
    Wekalist mailing list
    Send posts to: [hidden email]
    List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
    List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html






    --
    Ankit Desai

    desaiankitb.tk <http://desaiankitb.tk>
    http://ankitbdesai.blogspot.in/








    _______________________________________________
    Wekalist mailing list
    Send posts to: [hidden email]
    List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
    List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



--

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

weka libs.png (103K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Distributed K-means Spark

Mark Hall
I've just tested the distributedWekaSpark k-means clustering template flow, without any problems occurring, under Ubuntu 15.10 using Weka 3.8.1 and Oracle Java 1.8.0_91.

Are you sure that there are not any other Spark-related libraries in your CLASSPATH? Take a look at the java.class.path entry in GUIChooser-->Help-->SystemInfo.

Cheers,
Mark.

On 3/08/17, 8:41 PM, "Desai Ankit" <[hidden email] on behalf of [hidden email]> wrote:

    my java is Oracle
    java version "1.8.0_131"
    Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
    Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
   
   
    Using ubuntu 14.04 lte
   
    wekafiles/packages/distributedWekaSpark/lib contains all jars that comes by default with package install. (snapshot attached)
   
   
   
   
    On Thu, Aug 3, 2017 at 8:12 AM, Mark Hall <[hidden email]> wrote:
   
    Hmm, given that the default Spark dependency (1.1) hasn't changed, and it works fine on my machines, I'm not sure what is going wrong in your case. Which JVM are you using? Are you sure that the libraries in ~/wekafiles/packages/distributedWekaSpark/lib are the ones that came with the package?
   
    Cheers,
    Mark.
   
    On 3/08/17, 7:08 PM, "Desai Ankit" <[hidden email] on behalf of [hidden email]> wrote:
   
        my appologies. I should have mentioned that I am running this sample on local[*] so it uses default spark that is available in weka dependancy, if i am not wrong.
   
        On Wed, Aug 2, 2017 at 8:58 PM, Mark Hall <[hidden email]> wrote:
   
        I'm guessing that you are running against a Spark cluster using a recent version of Spark 1.x. At some point, Spark's JavaRDD.takeWithSample() method starting returning a list implementation that prevented the set() operation from being used. A work-around for this is implemented in KMeansClustererSpark job in the new distributedWekaSparkDev package, but I should probably back-port that to distributedWekaSpark and make a new release of that package too.
   
        Cheers,
        Mark.
   
        On 2/08/17, 7:19 PM, "Desai Ankit" <[hidden email] on behalf of [hidden email]> wrote:
   
            Hi mark,
            I am trying to run a sample provided under distributedWekaSpark for distributed K-Means.
   
            I end up with following error.
   
             java.lang.UnsupportedOperationException
                at java.util.AbstractList.set(AbstractList.java:132)
                at weka.distributed.spark.KMeansClustererSparkJob.initializeWithRandomCenters(KMeansClustererSparkJob.java:1577)
                at weka.distributed.spark.KMeansClustererSparkJob.initializeWithKMeansParallel(KMeansClustererSparkJob.java:1209)
                at weka.distributed.spark.KMeansClustererSparkJob.buildClusterer(KMeansClustererSparkJob.java:896)
                at weka.distributed.spark.KMeansClustererSparkJob.runJobWithContext(KMeansClustererSparkJob.java:1702)
                at weka.knowledgeflow.steps.AbstractSparkJob.runJob(AbstractSparkJob.java:264)
                at weka.knowledgeflow.steps.AbstractSparkJob.processIncoming(AbstractSparkJob.java:232)
                at weka.knowledgeflow.StepManagerImpl.processIncoming(StepManagerImpl.java:1045)
                at weka.knowledgeflow.BaseExecutionEnvironment$6.run(BaseExecutionEnvironment.java:493)
                at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
                at java.util.concurrent.FutureTask.run(FutureTask.java:266)
                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
                at java.lang.Thread.run(Thread.java:748)
   
            Can you please help to get rid of this?
   
            Thanks.
            --
            Ankit Desai
   
   
   
            desaiankitb.tk <http://desaiankitb.tk> <http://desaiankitb.tk> <http://desaiankitb.tk>
            http://ankitbdesai.blogspot.in/
   
   
   
   
   
   
   
   
   
            _______________________________________________
            Wekalist mailing list
            Send posts to: [hidden email]
            List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
            List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
   
   
   
        _______________________________________________
        Wekalist mailing list
        Send posts to: [hidden email]
        List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
        List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
   
   
   
   
   
   
        --
        Ankit Desai
   
        desaiankitb.tk <http://desaiankitb.tk> <http://desaiankitb.tk>
        http://ankitbdesai.blogspot.in/
   
   
   
   
   
   
   
   
        _______________________________________________
        Wekalist mailing list
        Send posts to: [hidden email]
        List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
        List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
   
   
   
    _______________________________________________
    Wekalist mailing list
    Send posts to: [hidden email]
    List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
    List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
   
   
   
   
   
   
   
   
    --
    Ankit Desai
   
    desaiankitb.tk <http://desaiankitb.tk>
    http://ankitbdesai.blogspot.in/ 
   
   
   
   
   
   
   
   
    _______________________________________________
    Wekalist mailing list
    Send posts to: [hidden email]
    List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
    List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
   


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Distributed K-means Spark

Desai Ankit
java.class.path contains only one entry... i.e. weka.jar 

On Thu, Aug 3, 2017 at 9:46 PM, Mark Hall <[hidden email]> wrote:
I've just tested the distributedWekaSpark k-means clustering template flow, without any problems occurring, under Ubuntu 15.10 using Weka 3.8.1 and Oracle Java 1.8.0_91.

Are you sure that there are not any other Spark-related libraries in your CLASSPATH? Take a look at the java.class.path entry in GUIChooser-->Help-->SystemInfo.

Cheers,
Mark.

On 3/08/17, 8:41 PM, "Desai Ankit" <[hidden email] on behalf of [hidden email]> wrote:

    my java is Oracle
    java version "1.8.0_131"
    Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
    Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)


    Using ubuntu 14.04 lte

    wekafiles/packages/distributedWekaSpark/lib contains all jars that comes by default with package install. (snapshot attached)




    On Thu, Aug 3, 2017 at 8:12 AM, Mark Hall <[hidden email]> wrote:

    Hmm, given that the default Spark dependency (1.1) hasn't changed, and it works fine on my machines, I'm not sure what is going wrong in your case. Which JVM are you using? Are you sure that the libraries in ~/wekafiles/packages/distributedWekaSpark/lib are the ones that came with the package?

    Cheers,
    Mark.

    On 3/08/17, 7:08 PM, "Desai Ankit" <[hidden email] on behalf of [hidden email]> wrote:

        my appologies. I should have mentioned that I am running this sample on local[*] so it uses default spark that is available in weka dependancy, if i am not wrong.

        On Wed, Aug 2, 2017 at 8:58 PM, Mark Hall <[hidden email]> wrote:

        I'm guessing that you are running against a Spark cluster using a recent version of Spark 1.x. At some point, Spark's JavaRDD.takeWithSample() method starting returning a list implementation that prevented the set() operation from being used. A work-around for this is implemented in KMeansClustererSpark job in the new distributedWekaSparkDev package, but I should probably back-port that to distributedWekaSpark and make a new release of that package too.

        Cheers,
        Mark.

        On 2/08/17, 7:19 PM, "Desai Ankit" <[hidden email] on behalf of [hidden email]> wrote:

            Hi mark,
            I am trying to run a sample provided under distributedWekaSpark for distributed K-Means.

            I end up with following error.

             java.lang.UnsupportedOperationException
                at java.util.AbstractList.set(AbstractList.java:132)
                at weka.distributed.spark.KMeansClustererSparkJob.initializeWithRandomCenters(KMeansClustererSparkJob.java:1577)
                at weka.distributed.spark.KMeansClustererSparkJob.initializeWithKMeansParallel(KMeansClustererSparkJob.java:1209)
                at weka.distributed.spark.KMeansClustererSparkJob.buildClusterer(KMeansClustererSparkJob.java:896)
                at weka.distributed.spark.KMeansClustererSparkJob.runJobWithContext(KMeansClustererSparkJob.java:1702)
                at weka.knowledgeflow.steps.AbstractSparkJob.runJob(AbstractSparkJob.java:264)
                at weka.knowledgeflow.steps.AbstractSparkJob.processIncoming(AbstractSparkJob.java:232)
                at weka.knowledgeflow.StepManagerImpl.processIncoming(StepManagerImpl.java:1045)
                at weka.knowledgeflow.BaseExecutionEnvironment$6.run(BaseExecutionEnvironment.java:493)
                at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
                at java.util.concurrent.FutureTask.run(FutureTask.java:266)
                at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
                at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
                at java.lang.Thread.run(Thread.java:748)

            Can you please help to get rid of this?

            Thanks.
            --
            Ankit Desai



            desaiankitb.tk <http://desaiankitb.tk> <http://desaiankitb.tk> <http://desaiankitb.tk>
            http://ankitbdesai.blogspot.in/









            _______________________________________________
            Wekalist mailing list
            Send posts to: [hidden email]
            List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
            List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



        _______________________________________________
        Wekalist mailing list
        Send posts to: [hidden email]
        List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
        List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html






        --
        Ankit Desai

        desaiankitb.tk <http://desaiankitb.tk> <http://desaiankitb.tk>
        http://ankitbdesai.blogspot.in/








        _______________________________________________
        Wekalist mailing list
        Send posts to: [hidden email]
        List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
        List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



    _______________________________________________
    Wekalist mailing list
    Send posts to: [hidden email]
    List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
    List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html








    --
    Ankit Desai

    desaiankitb.tk <http://desaiankitb.tk>
    http://ankitbdesai.blogspot.in/








    _______________________________________________
    Wekalist mailing list
    Send posts to: [hidden email]
    List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
    List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



--

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Distributed K-means Spark

Mark Hall
I'm out of suggestions now I'm afraid. I can't reproduce the problem using the libraries that ship with distributedWekaSpark. However, as I said, I did see this issue using a later version of the Spark 1.x libraries, and the new distributedWekaSparkDev has a fix for it.

Cheers,
Mark.

On 4/08/17, 6:52 PM, "Desai Ankit" <[hidden email] on behalf of [hidden email]> wrote:

    java.class.path contains only one entry... i.e. weka.jar
   
    On Thu, Aug 3, 2017 at 9:46 PM, Mark Hall <[hidden email]> wrote:
   
    I've just tested the distributedWekaSpark k-means clustering template flow, without any problems occurring, under Ubuntu 15.10 using Weka 3.8.1 and Oracle Java 1.8.0_91.
   
    Are you sure that there are not any other Spark-related libraries in your CLASSPATH? Take a look at the java.class.path entry in GUIChooser-->Help-->SystemInfo.
   
    Cheers,
    Mark.
   
    On 3/08/17, 8:41 PM, "Desai Ankit" <[hidden email] on behalf of [hidden email]> wrote:
   
        my java is Oracle
        java version "1.8.0_131"
        Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
        Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
   
   
        Using ubuntu 14.04 lte
   
        wekafiles/packages/distributedWekaSpark/lib contains all jars that comes by default with package install. (snapshot attached)
   
   
   
   
        On Thu, Aug 3, 2017 at 8:12 AM, Mark Hall <[hidden email]> wrote:
   
        Hmm, given that the default Spark dependency (1.1) hasn't changed, and it works fine on my machines, I'm not sure what is going wrong in your case. Which JVM are you using? Are you sure that the libraries in ~/wekafiles/packages/distributedWekaSpark/lib are the ones that came with the package?
   
        Cheers,
        Mark.
   
        On 3/08/17, 7:08 PM, "Desai Ankit" <[hidden email] on behalf of [hidden email]> wrote:
   
            my appologies. I should have mentioned that I am running this sample on local[*] so it uses default spark that is available in weka dependancy, if i am not wrong.
   
            On Wed, Aug 2, 2017 at 8:58 PM, Mark Hall <[hidden email]> wrote:
   
            I'm guessing that you are running against a Spark cluster using a recent version of Spark 1.x. At some point, Spark's JavaRDD.takeWithSample() method starting returning a list implementation that prevented the set() operation from being used. A work-around for this is implemented in KMeansClustererSpark job in the new distributedWekaSparkDev package, but I should probably back-port that to distributedWekaSpark and make a new release of that package too.
   
            Cheers,
            Mark.
   
            On 2/08/17, 7:19 PM, "Desai Ankit" <[hidden email] on behalf of [hidden email]> wrote:
   
                Hi mark,
                I am trying to run a sample provided under distributedWekaSpark for distributed K-Means.
   
                I end up with following error.
   
                 java.lang.UnsupportedOperationException
                    at java.util.AbstractList.set(AbstractList.java:132)
                    at weka.distributed.spark.KMeansClustererSparkJob.initializeWithRandomCenters(KMeansClustererSparkJob.java:1577)
                    at weka.distributed.spark.KMeansClustererSparkJob.initializeWithKMeansParallel(KMeansClustererSparkJob.java:1209)
                    at weka.distributed.spark.KMeansClustererSparkJob.buildClusterer(KMeansClustererSparkJob.java:896)
                    at weka.distributed.spark.KMeansClustererSparkJob.runJobWithContext(KMeansClustererSparkJob.java:1702)
                    at weka.knowledgeflow.steps.AbstractSparkJob.runJob(AbstractSparkJob.java:264)
                    at weka.knowledgeflow.steps.AbstractSparkJob.processIncoming(AbstractSparkJob.java:232)
                    at weka.knowledgeflow.StepManagerImpl.processIncoming(StepManagerImpl.java:1045)
                    at weka.knowledgeflow.BaseExecutionEnvironment$6.run(BaseExecutionEnvironment.java:493)
                    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
                    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
                    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
                    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
                    at java.lang.Thread.run(Thread.java:748)
   
                Can you please help to get rid of this?
   
                Thanks.
                --
                Ankit Desai
   
   
   
   
   
                desaiankitb.tk <http://desaiankitb.tk> <http://desaiankitb.tk> <http://desaiankitb.tk> <http://desaiankitb.tk>
                http://ankitbdesai.blogspot.in/
   
   
   
   
   
   
   
   
   
                _______________________________________________
                Wekalist mailing list
                Send posts to: [hidden email]
                List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
                List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
   
   
   
            _______________________________________________
            Wekalist mailing list
            Send posts to: [hidden email]
            List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
            List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
   
   
   
   
   
   
            --
            Ankit Desai
   
            desaiankitb.tk <http://desaiankitb.tk> <http://desaiankitb.tk> <http://desaiankitb.tk>
            http://ankitbdesai.blogspot.in/
   
   
   
   
   
   
   
   
            _______________________________________________
            Wekalist mailing list
            Send posts to: [hidden email]
            List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
            List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
   
   
   
        _______________________________________________
        Wekalist mailing list
        Send posts to: [hidden email]
        List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
        List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
   
   
   
   
   
   
   
   
        --
        Ankit Desai
   
        desaiankitb.tk <http://desaiankitb.tk> <http://desaiankitb.tk>
        http://ankitbdesai.blogspot.in/
   
   
   
   
   
   
   
   
        _______________________________________________
        Wekalist mailing list
        Send posts to: [hidden email]
        List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
        List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
   
   
   
    _______________________________________________
    Wekalist mailing list
    Send posts to: [hidden email]
    List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
    List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
   
   
   
   
   
   
   
   
    --
    Ankit Desai
   
    desaiankitb.tk <http://desaiankitb.tk>
    http://ankitbdesai.blogspot.in/ 
   
   
   
   
   
   
   
   
    _______________________________________________
    Wekalist mailing list
    Send posts to: [hidden email]
    List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
    List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
   


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Loading...