Weka-Spark

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Weka-Spark

MOHAMMED KAMAL

Hi all,

i made this spark job on my dataset to learn classifier and test performance i made output to 2 files one for the learned model and the other one for classifier performance , i have some questions?


1-  how to draw ROC curve for the output ?   ( multiclass dataset)

2-  Can i draw ROC for more than one classifier  (if i add another weka classifier spark job for another classifier ??? 

3- according above job how can i use sampling here (SMOTE or class balancer or subsample) ??
because when i used it with filtered classifier inside weka spark classifier job , i have error ??
please told me exactly where can i use sampling and with what component in knowledge flow interface?
cheers,

M.Kamal





_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

spark.jpg (95K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Weka-Spark

Mark Hall
Unfortunately, there is not a way to plot an ROC curve from the output generated by the WekaClassifierEvaluationSparkJob. Additional functionality could be added to the job to write an ARFF file that contains the same information that gets plotted when you visualize a threshold curve in the Explorer's Classify panel. I'll add this to the to-do list for that package.

What error did you receive when using the SMOTE filter? Note that you will need to enter "SMOTE" (without quotes) into the "wekaPackages" field in the "Spark configuration" tab of the ArffHeaderSparkJob. This ensures that the SMOTE package's jar files get into the classpath for the Spark workers. If you are using an Aggregateable base classifier (such as NaiveBayes) then you will also need to ensure that "forceBatchLearningForUpdateableClassifiers" and "forceVotedEnsembleCreation" are set to True in the WekaClassifierSparkJob and WekaClassifierEvaluationSparkJob. You will also want want to ensure that a RandomlyShuffleDataSpark job is used first, before the classifier or evaluation job, so that the data is stratified (with respect to the class), resulting in the SMOTE process having approximately the same effect on the data within each partition of the RDD.

Cheers,
Mark.

On 21/05/17, 8:43 AM, "ENGMohammed kamal" <[hidden email] on behalf of [hidden email]> wrote:

    Hi all,
    i made this spark job on my dataset to learn classifier and test performance i made output to 2 files one for the learned model and the other one for classifier performance , i have some questions?
   
   
    1-  how to draw ROC curve for the output ?   ( multiclass dataset)
    2-  Can i draw ROC for more than one classifier  (if i add another weka classifier spark job for another classifier ???
    3- according above job how can i use sampling here (SMOTE or class balancer or subsample) ??
    because when i used it with filtered classifier inside weka spark classifier job , i have error ??
    please told me exactly where can i use sampling and with what component in knowledge flow interface?
    cheers,
    M.Kamal
   
   
    ________________________________________
   
   
   
   
   
   
   
    _______________________________________________
    Wekalist mailing list
    Send posts to: [hidden email]
    List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
    List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
   


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Weka-Spark

MOHAMMED KAMAL
In reply to this post by MOHAMMED KAMAL


Hi Ebie,

Thanks for support , it is working now 
i tried it with class balancer, resample and SMOTE 

i think 2 true setting for forceBatchLearningForUpdateableClassifiers" and "forceVotedEnsembleCreation" solve the problem because it's working without and with write the name of backage in arff config. 

but there's a problem  that when i select SMOTE and customized to work on class no.4  +   500%   increase 
i found in the evaluation it gives 100% only   (  original  dataset 4 instances in this class )     i found only 8 in o/p  confusion matrix , i ought to have 24 instances
cheers,

Mohammed Kamal






From: [hidden email] <[hidden email]> on behalf of [hidden email] <[hidden email]>
Sent: Monday, May 22, 2017 3:00 AM
To: [hidden email]
Subject: Wekalist Digest, Vol 171, Issue 83
 
Send Wekalist mailing list submissions to
        [hidden email]

To subscribe or unsubscribe via the World Wide Web, visit
        https://list.waikato.ac.nz/mailman/listinfo/wekalist


or, via email, send a message with subject or body 'help' to
        [hidden email]

You can reach the person managing the list at
        [hidden email]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Wekalist digest..."


Today's Topics:

   1. Re: Exception with package percentageErrorMetrics Evaluation
      metrics - Weka 3.8.1 (Eibe Frank)
   2. Re: Weka-Spark (Mark Hall)
   3. Re: Output modification (Eibe Frank)


----------------------------------------------------------------------

Message: 1
Date: Mon, 22 May 2017 09:52:22 +1200
From: Eibe Frank <[hidden email]>
To: "Weka machine learning workbench list."
        <[hidden email]>
Subject: Re: [Wekalist] Exception with package percentageErrorMetrics
        Evaluation metrics - Weka 3.8.1
Message-ID: <[hidden email]>
Content-Type: text/plain; charset=utf-8

Make sure that you have percentageErrorMetrics version 1.0.2 (the latest version). Some changes were necessary to make it work with WEKA 3.8.1/3.9.1.

Cheers,
Eibe

> On 22/05/2017, at 4:53 AM, Michael Hall <[hidden email]> wrote:
>
> Trying to classify some new data Explorer started failing.
>
> Checking weka.log I saw?
>
> Exception in thread "Thread-1198" java.lang.IllegalAccessError: tried to access field weka.classifiers.evaluation.Evaluation.m_WithClass from class weka.classifiers.evaluation.MeanAbsolutePercentageError
>        weka.classifiers.evaluation.MeanAbsolutePercentageError.getStatistic(MeanAbsolutePercentageError.java:127)
>        weka.classifiers.evaluation.MeanAbsolutePercentageError.toSummaryString(MeanAbsolutePercentageError.java:113)
>        weka.classifiers.evaluation.Evaluation.toSummaryString(Evaluation.java:2873)
>        weka.classifiers.evaluation.Evaluation.toSummaryString(Evaluation.java:2757)
>        weka.classifiers.Evaluation.toSummaryString(Evaluation.java:1115)
>        weka.gui.explorer.ClassifierPanel$18.run(ClassifierPanel.java:1871)
>
>        at weka.classifiers.evaluation.MeanAbsolutePercentageError.getStatistic(MeanAbsolutePercentageError.java:127)
>        at weka.classifiers.evaluation.MeanAbsolutePercentageError.toSummaryString(MeanAbsolutePercentageError.java:113)
>        at weka.classifiers.evaluation.Evaluation.toSummaryString(Evaluation.java:2873)
>        at weka.classifiers.evaluation.Evaluation.toSummaryString(Evaluation.java:2757)
>        at weka.classifiers.Evaluation.toSummaryString(Evaluation.java:1115)
>        at weka.gui.explorer.ClassifierPanel$18.run(ClassifierPanel.java:1871)
>
> The same with weka.classifiers.evaluation.RootMeanSquarePercentageError after I turned off MAPE
>
> Michael Hall
>
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist


> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html





------------------------------

Message: 2
Date: Mon, 22 May 2017 09:56:58 +1200
From: Mark Hall <[hidden email]>
To: "Weka machine learning workbench list."
        <[hidden email]>
Subject: Re: [Wekalist] Weka-Spark
Message-ID: <[hidden email]>
Content-Type: text/plain;       charset="UTF-8"

Unfortunately, there is not a way to plot an ROC curve from the output generated by the WekaClassifierEvaluationSparkJob. Additional functionality could be added to the job to write an ARFF file that contains the same information that gets plotted when you visualize a threshold curve in the Explorer's Classify panel. I'll add this to the to-do list for that package.

What error did you receive when using the SMOTE filter? Note that you will need to enter "SMOTE" (without quotes) into the "wekaPackages" field in the "Spark configuration" tab of the ArffHeaderSparkJob. This ensures that the SMOTE package's jar files get into the classpath for the Spark workers. If you are using an Aggregateable base classifier (such as NaiveBayes) then you will also need to ensure that "forceBatchLearningForUpdateableClassifiers" and "forceVotedEnsembleCreation" are set to True in the WekaClassifierSparkJob and WekaClassifierEvaluationSparkJob. You will also want want to ensure that a RandomlyShuffleDataSpark job is used first, before the classifier or evaluation job, so that the data is stratified (with respect to the class), resulting in the SMOTE process having approximately the same effect on the data within each partition of the RDD.

Cheers,
Mark.

On 21/05/17, 8:43 AM, "ENGMohammed kamal" <[hidden email] on behalf of [hidden email]> wrote:

    Hi all,
    i made this spark job on my dataset to learn classifier and test performance i made output to 2 files one for the learned model and the other one for classifier performance , i have some questions?
   
   
    1-  how to draw ROC curve for the output ?   ( multiclass dataset)
    2-  Can i draw ROC for more than one classifier  (if i add another weka classifier spark job for another classifier ???
    3- according above job how can i use sampling here (SMOTE or class balancer or subsample) ??
    because when i used it with filtered classifier inside weka spark classifier job , i have error ??
    please told me exactly where can i use sampling and with what component in knowledge flow interface?
    cheers,
    M.Kamal
   
   
    ________________________________________
   
   
   
   
   
   
   
    _______________________________________________
    Wekalist mailing list
    Send posts to: [hidden email]
    List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist


    List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


   




------------------------------

Message: 3
Date: Mon, 22 May 2017 10:00:06 +1200
From: Eibe Frank <[hidden email]>
To: "Weka machine learning workbench list."
        <[hidden email]>
Subject: Re: [Wekalist] Output modification
Message-ID: <[hidden email]>
Content-Type: text/plain; charset="utf-8"

Yes, it?s possible to add additional plug-in evaluation measures. However, there is currently no plug-in for the straight average of the precision/recall values across classes.

Note that WEKA outputs the *weighted* average (weighted by class size) by default. Here is the output for the iris data:

=== Detailed Accuracy By Class ===

                 TP Rate  FP Rate  Precision  Recall   F-Measure  MCC      ROC Area  PRC Area  Class
                 0.980    0.000    1.000      0.980    0.990      0.985    0.990     0.987     Iris-setosa
                 0.940    0.030    0.940      0.940    0.940      0.910    0.952     0.880     Iris-versicolor
                 0.960    0.030    0.941      0.960    0.950      0.925    0.961     0.905     Iris-virginica
Weighted Avg.    0.960    0.020    0.960      0.960    0.960      0.940    0.968     0.924    

The six per-class precision and recall values are shown, along with their weighted average at the bottom. Because all classes are equally populous in the iris data, the weighted average is equal to the straight average in this case.

Cheers,
Eibe


> On 22/05/2017, at 2:30 AM, Alexander Osherenko <[hidden email]> wrote:
>
> I wonder, is it possible to modify the classifier output presented on the WEKA console. For instance, is it possible to add the classwise precision P and recall R values after the calculated confusion matrix
>
> <image.png>
> where N is the number of outcome values of classification.
>
> Best, Alexander
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist


> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.waikato.ac.nz/pipermail/wekalist/attachments/20170522/b4c83e33/attachment-0001.html>

------------------------------

_______________________________________________
Wekalist mailing list
[hidden email]
https://list.waikato.ac.nz/mailman/listinfo/wekalist




End of Wekalist Digest, Vol 171, Issue 83
*****************************************

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Weka-Spark

Eibe Frank-2
Administrator
It sounds like you are modifying the test data as well as the training data. You should avoid doing that. Use the FilteredClassifier instead so that only the training data is modified.

Cheers,
Eibe

> On 24/05/2017, at 8:31 PM, ENGMohammed kamal <[hidden email]> wrote:
>
>
> Hi Ebie,
> Thanks for support , it is working now
> i tried it with class balancer, resample and SMOTE
> i think 2 true setting for forceBatchLearningForUpdateableClassifiers" and "forceVotedEnsembleCreation" solve the problem because it's working without and with write the name of backage in arff config.
> but there's a problem  that when i select SMOTE and customized to work on class no.4  +   500%   increase
> i found in the evaluation it gives 100% only   (  original  dataset 4 instances in this class )     i found only 8 in o/p  confusion matrix , i ought to have 24 instances
> cheers,
> Mohammed Kamal
>
>
>
> From: [hidden email] <[hidden email]> on behalf of [hidden email]<[hidden email]>
> Sent: Monday, May 22, 2017 3:00 AM
> To: [hidden email]
> Subject: Wekalist Digest, Vol 171, Issue 83
>  
> Send Wekalist mailing list submissions to
>         [hidden email]
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://list.waikato.ac.nz/mailman/listinfo/wekalist
> Wekalist Info Page - University of Waikato
> list.waikato.ac.nz
> The Weka mailing list ([hidden email]) is for discussions pertaining to the use of the Weka machine learning workbench. Weka is a collection of machine ...
>
>
> or, via email, send a message with subject or body 'help' to
>         [hidden email]
>
> You can reach the person managing the list at
>         [hidden email]
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Wekalist digest..."
>
>
> Today's Topics:
>
>    1. Re: Exception with package percentageErrorMetrics Evaluation
>       metrics - Weka 3.8.1 (Eibe Frank)
>    2. Re: Weka-Spark (Mark Hall)
>    3. Re: Output modification (Eibe Frank)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 22 May 2017 09:52:22 +1200
> From: Eibe Frank <[hidden email]>
> To: "Weka machine learning workbench list."
>         <[hidden email]>
> Subject: Re: [Wekalist] Exception with package percentageErrorMetrics
>         Evaluation metrics - Weka 3.8.1
> Message-ID: <[hidden email]>
> Content-Type: text/plain; charset=utf-8
>
> Make sure that you have percentageErrorMetrics version 1.0.2 (the latest version). Some changes were necessary to make it work with WEKA 3.8.1/3.9.1.
>
> Cheers,
> Eibe
>
> > On 22/05/2017, at 4:53 AM, Michael Hall <[hidden email]> wrote:
> >
> > Trying to classify some new data Explorer started failing.
> >
> > Checking weka.log I saw?
> >
> > Exception in thread "Thread-1198" java.lang.IllegalAccessError: tried to access field weka.classifiers.evaluation.Evaluation.m_WithClass from class weka.classifiers.evaluation.MeanAbsolutePercentageError
> >        weka.classifiers.evaluation.MeanAbsolutePercentageError.getStatistic(MeanAbsolutePercentageError.java:127)
> >        weka.classifiers.evaluation.MeanAbsolutePercentageError.toSummaryString(MeanAbsolutePercentageError.java:113)
> >        weka.classifiers.evaluation.Evaluation.toSummaryString(Evaluation.java:2873)
> >        weka.classifiers.evaluation.Evaluation.toSummaryString(Evaluation.java:2757)
> >        weka.classifiers.Evaluation.toSummaryString(Evaluation.java:1115)
> >        weka.gui.explorer.ClassifierPanel$18.run(ClassifierPanel.java:1871)
> >
> >        at weka.classifiers.evaluation.MeanAbsolutePercentageError.getStatistic(MeanAbsolutePercentageError.java:127)
> >        at weka.classifiers.evaluation.MeanAbsolutePercentageError.toSummaryString(MeanAbsolutePercentageError.java:113)
> >        at weka.classifiers.evaluation.Evaluation.toSummaryString(Evaluation.java:2873)
> >        at weka.classifiers.evaluation.Evaluation.toSummaryString(Evaluation.java:2757)
> >        at weka.classifiers.Evaluation.toSummaryString(Evaluation.java:1115)
> >        at weka.gui.explorer.ClassifierPanel$18.run(ClassifierPanel.java:1871)
> >
> > The same with weka.classifiers.evaluation.RootMeanSquarePercentageError after I turned off MAPE
> >
> > Michael Hall
> >
> >
> >
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> Wekalist Info Page - University of Waikato
> list.waikato.ac.nz
> The Weka mailing list ([hidden email]) is for discussions pertaining to the use of the Weka machine learning workbench. Weka is a collection of machine ...
>
>
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> Weka Mailing List Etiquette - University of Waikato
> www.cs.waikato.ac.nz
> Please do NOT send conference announcements, call for papers, etc., to the mailing list. Please do NOT use the mailing list to advertise your publications.
>
>
>
>
>
> ------------------------------
>
> Message: 2
> Date: Mon, 22 May 2017 09:56:58 +1200
> From: Mark Hall <[hidden email]>
> To: "Weka machine learning workbench list."
>         <[hidden email]>
> Subject: Re: [Wekalist] Weka-Spark
> Message-ID: <[hidden email]>
> Content-Type: text/plain;       charset="UTF-8"
>
> Unfortunately, there is not a way to plot an ROC curve from the output generated by the WekaClassifierEvaluationSparkJob. Additional functionality could be added to the job to write an ARFF file that contains the same information that gets plotted when you visualize a threshold curve in the Explorer's Classify panel. I'll add this to the to-do list for that package.
>
> What error did you receive when using the SMOTE filter? Note that you will need to enter "SMOTE" (without quotes) into the "wekaPackages" field in the "Spark configuration" tab of the ArffHeaderSparkJob. This ensures that the SMOTE package's jar files get into the classpath for the Spark workers. If you are using an Aggregateable base classifier (such as NaiveBayes) then you will also need to ensure that "forceBatchLearningForUpdateableClassifiers" and "forceVotedEnsembleCreation" are set to True in the WekaClassifierSparkJob and WekaClassifierEvaluationSparkJob. You will also want want to ensure that a RandomlyShuffleDataSpark job is used first, before the classifier or evaluation job, so that the data is stratified (with respect to the class), resulting in the SMOTE process having approximately the same effect on the data within each partition of the RDD.
>
> Cheers,
> Mark.
>
> On 21/05/17, 8:43 AM, "ENGMohammed kamal" <[hidden email] on behalf of [hidden email]> wrote:
>
>     Hi all,
>     i made this spark job on my dataset to learn classifier and test performance i made output to 2 files one for the learned model and the other one for classifier performance , i have some questions?
>    
>    
>     1-  how to draw ROC curve for the output ?   ( multiclass dataset)
>     2-  Can i draw ROC for more than one classifier  (if i add another weka classifier spark job for another classifier ???
>     3- according above job how can i use sampling here (SMOTE or class balancer or subsample) ??
>     because when i used it with filtered classifier inside weka spark classifier job , i have error ??
>     please told me exactly where can i use sampling and with what component in knowledge flow interface?
>     cheers,
>     M.Kamal
>    
>    
>     ________________________________________
>    
>    
>    
>    
>    
>    
>    
>     _______________________________________________
>     Wekalist mailing list
>     Send posts to: [hidden email]
>     List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> Wekalist Info Page - University of Waikato
> list.waikato.ac.nz
> The Weka mailing list ([hidden email]) is for discussions pertaining to the use of the Weka machine learning workbench. Weka is a collection of machine ...
>
>
>     List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> Weka Mailing List Etiquette - University of Waikato
> www.cs.waikato.ac.nz
> Please do NOT send conference announcements, call for papers, etc., to the mailing list. Please do NOT use the mailing list to advertise your publications.
>
>
>    
>
>
>
>
> ------------------------------
>
> Message: 3
> Date: Mon, 22 May 2017 10:00:06 +1200
> From: Eibe Frank <[hidden email]>
> To: "Weka machine learning workbench list."
>         <[hidden email]>
> Subject: Re: [Wekalist] Output modification
> Message-ID: <[hidden email]>
> Content-Type: text/plain; charset="utf-8"
>
> Yes, it?s possible to add additional plug-in evaluation measures. However, there is currently no plug-in for the straight average of the precision/recall values across classes.
>
> Note that WEKA outputs the *weighted* average (weighted by class size) by default. Here is the output for the iris data:
>
> === Detailed Accuracy By Class ===
>
>                  TP Rate  FP Rate  Precision  Recall   F-Measure  MCC      ROC Area  PRC Area  Class
>                  0.980    0.000    1.000      0.980    0.990      0.985    0.990     0.987     Iris-setosa
>                  0.940    0.030    0.940      0.940    0.940      0.910    0.952     0.880     Iris-versicolor
>                  0.960    0.030    0.941      0.960    0.950      0.925    0.961     0.905     Iris-virginica
> Weighted Avg.    0.960    0.020    0.960      0.960    0.960      0.940    0.968     0.924    
>
> The six per-class precision and recall values are shown, along with their weighted average at the bottom. Because all classes are equally populous in the iris data, the weighted average is equal to the straight average in this case.
>
> Cheers,
> Eibe
>
>
> > On 22/05/2017, at 2:30 AM, Alexander Osherenko <[hidden email]> wrote:
> >
> > I wonder, is it possible to modify the classifier output presented on the WEKA console. For instance, is it possible to add the classwise precision P and recall R values after the calculated confusion matrix
> >
> > <image.png>
> > where N is the number of outcome values of classification.
> >
> > Best, Alexander
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> Wekalist Info Page - University of Waikato
> list.waikato.ac.nz
> The Weka mailing list ([hidden email]) is for discussions pertaining to the use of the Weka machine learning workbench. Weka is a collection of machine ...
>
>
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> Weka Mailing List Etiquette - University of Waikato
> www.cs.waikato.ac.nz
> Please do NOT send conference announcements, call for papers, etc., to the mailing list. Please do NOT use the mailing list to advertise your publications.
>
>
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://list.waikato.ac.nz/pipermail/wekalist/attachments/20170522/b4c83e33/attachment-0001.html>
>
> ------------------------------
>
> _______________________________________________
> Wekalist mailing list
> [hidden email]
> https://list.waikato.ac.nz/mailman/listinfo/wekalist
> Wekalist Info Page - University of Waikato
> list.waikato.ac.nz
> The Weka mailing list ([hidden email]) is for discussions pertaining to the use of the Weka machine learning workbench. Weka is a collection of machine ...
>
>
>
>
> End of Wekalist Digest, Vol 171, Issue 83
> *****************************************
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Weka-Spark

MOHAMMED KAMAL
In reply to this post by MOHAMMED KAMAL

Hi Eibe,

please see attached files

I tried to put filtered classifier to both train and test give the same result identically to use classifier with external filter ...  
Then i tried what you suggest to use filtered classifier in train stage only, it gives me the same results as J48 without any filter..
i attach also the job 
i want to see also that no or records are 35232 in all versions , it seems that SMOTE didn't work because minority class 174 , and i use 500% , therefore instances in emergency class (minority) should be 1044 not the same 174 
cheers

M.Kamal






From: [hidden email] <[hidden email]> on behalf of [hidden email] <[hidden email]>
Sent: Thursday, May 25, 2017 3:00 AM
To: [hidden email]
Subject: Wekalist Digest, Vol 171, Issue 90
 
Send Wekalist mailing list submissions to
        [hidden email]

To subscribe or unsubscribe via the World Wide Web, visit
        https://list.waikato.ac.nz/mailman/listinfo/wekalist


or, via email, send a message with subject or body 'help' to
        [hidden email]

You can reach the person managing the list at
        [hidden email]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Wekalist digest..."


Today's Topics:

   1. Re: Weka-Spark (Eibe Frank)


----------------------------------------------------------------------

Message: 1
Date: Thu, 25 May 2017 10:38:13 +1200
From: Eibe Frank <[hidden email]>
To: "Weka machine learning workbench list."
        <[hidden email]>
Subject: Re: [Wekalist] Weka-Spark
Message-ID: <[hidden email]>
Content-Type: text/plain; charset=iso-8859-1

It sounds like you are modifying the test data as well as the training data. You should avoid doing that. Use the FilteredClassifier instead so that only the training data is modified.

Cheers,
Eibe

> On 24/05/2017, at 8:31 PM, ENGMohammed kamal <[hidden email]> wrote:
>
>
> Hi Ebie,
> Thanks for support , it is working now
> i tried it with class balancer, resample and SMOTE
> i think 2 true setting for forceBatchLearningForUpdateableClassifiers" and "forceVotedEnsembleCreation" solve the problem because it's working without and with write the name of backage in arff config.
> but there's a problem  that when i select SMOTE and customized to work on class no.4  +   500%   increase
> i found in the evaluation it gives 100% only   (  original  dataset 4 instances in this class )     i found only 8 in o/p  confusion matrix , i ought to have 24 instances
> cheers,
> Mohammed Kamal
>
>
>
> From: [hidden email] <[hidden email]> on behalf of [hidden email]<[hidden email]>
> Sent: Monday, May 22, 2017 3:00 AM
> To: [hidden email]
> Subject: Wekalist Digest, Vol 171, Issue 83

> Send Wekalist mailing list submissions to
>         [hidden email]
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://list.waikato.ac.nz/mailman/listinfo/wekalist


> Wekalist Info Page - University of Waikato
> list.waikato.ac.nz
> The Weka mailing list ([hidden email]) is for discussions pertaining to the use of the Weka machine learning workbench. Weka is a collection of machine ...
>
>
> or, via email, send a message with subject or body 'help' to
>         [hidden email]
>
> You can reach the person managing the list at
>         [hidden email]
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Wekalist digest..."
>
>
> Today's Topics:
>
>    1. Re: Exception with package percentageErrorMetrics Evaluation
>       metrics - Weka 3.8.1 (Eibe Frank)
>    2. Re: Weka-Spark (Mark Hall)
>    3. Re: Output modification (Eibe Frank)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 22 May 2017 09:52:22 +1200
> From: Eibe Frank <[hidden email]>
> To: "Weka machine learning workbench list."
>         <[hidden email]>
> Subject: Re: [Wekalist] Exception with package percentageErrorMetrics
>         Evaluation metrics - Weka 3.8.1
> Message-ID: <[hidden email]>
> Content-Type: text/plain; charset=utf-8
>
> Make sure that you have percentageErrorMetrics version 1.0.2 (the latest version). Some changes were necessary to make it work with WEKA 3.8.1/3.9.1.
>
> Cheers,
> Eibe
>
> > On 22/05/2017, at 4:53 AM, Michael Hall <[hidden email]> wrote:
> >
> > Trying to classify some new data Explorer started failing.
> >
> > Checking weka.log I saw?
> >
> > Exception in thread "Thread-1198" java.lang.IllegalAccessError: tried to access field weka.classifiers.evaluation.Evaluation.m_WithClass from class weka.classifiers.evaluation.MeanAbsolutePercentageError
> >        weka.classifiers.evaluation.MeanAbsolutePercentageError.getStatistic(MeanAbsolutePercentageError.java:127)
> >        weka.classifiers.evaluation.MeanAbsolutePercentageError.toSummaryString(MeanAbsolutePercentageError.java:113)
> >        weka.classifiers.evaluation.Evaluation.toSummaryString(Evaluation.java:2873)
> >        weka.classifiers.evaluation.Evaluation.toSummaryString(Evaluation.java:2757)
> >        weka.classifiers.Evaluation.toSummaryString(Evaluation.java:1115)
> >        weka.gui.explorer.ClassifierPanel$18.run(ClassifierPanel.java:1871)
> >
> >        at weka.classifiers.evaluation.MeanAbsolutePercentageError.getStatistic(MeanAbsolutePercentageError.java:127)
> >        at weka.classifiers.evaluation.MeanAbsolutePercentageError.toSummaryString(MeanAbsolutePercentageError.java:113)
> >        at weka.classifiers.evaluation.Evaluation.toSummaryString(Evaluation.java:2873)
> >        at weka.classifiers.evaluation.Evaluation.toSummaryString(Evaluation.java:2757)
> >        at weka.classifiers.Evaluation.toSummaryString(Evaluation.java:1115)
> >        at weka.gui.explorer.ClassifierPanel$18.run(ClassifierPanel.java:1871)
> >
> > The same with weka.classifiers.evaluation.RootMeanSquarePercentageError after I turned off MAPE
> >
> > Michael Hall
> >
> >
> >
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist


> Wekalist Info Page - University of Waikato
> list.waikato.ac.nz
> The Weka mailing list ([hidden email]) is for discussions pertaining to the use of the Weka machine learning workbench. Weka is a collection of machine ...
>
>
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


> Weka Mailing List Etiquette - University of Waikato
> www.cs.waikato.ac.nz


> Please do NOT send conference announcements, call for papers, etc., to the mailing list. Please do NOT use the mailing list to advertise your publications.
>
>
>
>
>
> ------------------------------
>
> Message: 2
> Date: Mon, 22 May 2017 09:56:58 +1200
> From: Mark Hall <[hidden email]>
> To: "Weka machine learning workbench list."
>         <[hidden email]>
> Subject: Re: [Wekalist] Weka-Spark
> Message-ID: <[hidden email]>
> Content-Type: text/plain;       charset="UTF-8"
>
> Unfortunately, there is not a way to plot an ROC curve from the output generated by the WekaClassifierEvaluationSparkJob. Additional functionality could be added to the job to write an ARFF file that contains the same information that gets plotted when you visualize a threshold curve in the Explorer's Classify panel. I'll add this to the to-do list for that package.
>
> What error did you receive when using the SMOTE filter? Note that you will need to enter "SMOTE" (without quotes) into the "wekaPackages" field in the "Spark configuration" tab of the ArffHeaderSparkJob. This ensures that the SMOTE package's jar files get into the classpath for the Spark workers. If you are using an Aggregateable base classifier (such as NaiveBayes) then you will also need to ensure that "forceBatchLearningForUpdateableClassifiers" and "forceVotedEnsembleCreation" are set to True in the WekaClassifierSparkJob and WekaClassifierEvaluationSparkJob. You will also want want to ensure that a RandomlyShuffleDataSpark job is used first, before the classifier or evaluation job, so that the data is stratified (with respect to the class), resulting in the SMOTE process having approximately the same effect on the data within each partition of the RDD.
>
> Cheers,
> Mark.
>
> On 21/05/17, 8:43 AM, "ENGMohammed kamal" <[hidden email] on behalf of [hidden email]> wrote:
>
>     Hi all,
>     i made this spark job on my dataset to learn classifier and test performance i made output to 2 files one for the learned model and the other one for classifier performance , i have some questions?
>    
>    
>     1-  how to draw ROC curve for the output ?   ( multiclass dataset)
>     2-  Can i draw ROC for more than one classifier  (if i add another weka classifier spark job for another classifier ???
>     3- according above job how can i use sampling here (SMOTE or class balancer or subsample) ??
>     because when i used it with filtered classifier inside weka spark classifier job , i have error ??
>     please told me exactly where can i use sampling and with what component in knowledge flow interface?
>     cheers,
>     M.Kamal
>    
>    
>     ________________________________________
>    
>    
>    
>    
>    
>    
>    
>     _______________________________________________
>     Wekalist mailing list
>     Send posts to: [hidden email]
>     List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist


> Wekalist Info Page - University of Waikato
> list.waikato.ac.nz
> The Weka mailing list ([hidden email]) is for discussions pertaining to the use of the Weka machine learning workbench. Weka is a collection of machine ...
>
>
>     List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


> Weka Mailing List Etiquette - University of Waikato
> www.cs.waikato.ac.nz


> Please do NOT send conference announcements, call for papers, etc., to the mailing list. Please do NOT use the mailing list to advertise your publications.
>
>
>    
>
>
>
>
> ------------------------------
>
> Message: 3
> Date: Mon, 22 May 2017 10:00:06 +1200
> From: Eibe Frank <[hidden email]>
> To: "Weka machine learning workbench list."
>         <[hidden email]>
> Subject: Re: [Wekalist] Output modification
> Message-ID: <[hidden email]>
> Content-Type: text/plain; charset="utf-8"
>
> Yes, it?s possible to add additional plug-in evaluation measures. However, there is currently no plug-in for the straight average of the precision/recall values across classes.
>
> Note that WEKA outputs the *weighted* average (weighted by class size) by default. Here is the output for the iris data:
>
> === Detailed Accuracy By Class ===
>
>                  TP Rate  FP Rate  Precision  Recall   F-Measure  MCC      ROC Area  PRC Area  Class
>                  0.980    0.000    1.000      0.980    0.990      0.985    0.990     0.987     Iris-setosa
>                  0.940    0.030    0.940      0.940    0.940      0.910    0.952     0.880     Iris-versicolor
>                  0.960    0.030    0.941      0.960    0.950      0.925    0.961     0.905     Iris-virginica
> Weighted Avg.    0.960    0.020    0.960      0.960    0.960      0.940    0.968     0.924    
>
> The six per-class precision and recall values are shown, along with their weighted average at the bottom. Because all classes are equally populous in the iris data, the weighted average is equal to the straight average in this case.
>
> Cheers,
> Eibe
>
>
> > On 22/05/2017, at 2:30 AM, Alexander Osherenko <[hidden email]> wrote:
> >
> > I wonder, is it possible to modify the classifier output presented on the WEKA console. For instance, is it possible to add the classwise precision P and recall R values after the calculated confusion matrix
> >
> > <image.png>
> > where N is the number of outcome values of classification.
> >
> > Best, Alexander
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist


> Wekalist Info Page - University of Waikato
> list.waikato.ac.nz
> The Weka mailing list ([hidden email]) is for discussions pertaining to the use of the Weka machine learning workbench. Weka is a collection of machine ...
>
>
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


> Weka Mailing List Etiquette - University of Waikato
> www.cs.waikato.ac.nz


> Please do NOT send conference announcements, call for papers, etc., to the mailing list. Please do NOT use the mailing list to advertise your publications.
>
>
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://list.waikato.ac.nz/pipermail/wekalist/attachments/20170522/b4c83e33/attachment-0001.html>
>
> ------------------------------
>
> _______________________________________________
> Wekalist mailing list
> [hidden email]
> https://list.waikato.ac.nz/mailman/listinfo/wekalist


> Wekalist Info Page - University of Waikato
> list.waikato.ac.nz
> The Weka mailing list ([hidden email]) is for discussions pertaining to the use of the Weka machine learning workbench. Weka is a collection of machine ...
>
>
>
>
> End of Wekalist Digest, Vol 171, Issue 83
> *****************************************
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist


> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html





------------------------------

_______________________________________________
Wekalist mailing list
[hidden email]
https://list.waikato.ac.nz/mailman/listinfo/wekalist




End of Wekalist Digest, Vol 171, Issue 90
*****************************************

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

evaluation J48.txt (2K) Download Attachment
evaluation J48 FC Train only.txt (2K) Download Attachment
evaluation J48 classifier and filters both train test .txt (2K) Download Attachment
evaluation J48 Filtered classifier both train test.txt (2K) Download Attachment
IHCAMjob.kf (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Weka-Spark

Eibe Frank-2
Administrator
Yes, it looks like it's not necessary to specify FilteredClassifier when you use the -filter option in distributed WEKA:

"normal Weka filters to be used for pre-processing in each map (the task takes care of using various special subclasses of FilteredClassifier for wrapping the base classifier and filters depending on whether the base learner is Aggregateable and/or incremental)" (http://markahall.blogspot.co.nz/2013/10/weka-and-hadoop-part-1.html)

The filter should not modify the test data, so minority should stay at 174 in your evaluation. This behaviour is correct. SMOTE will only affect the training set when used correctly.

Cheers,
Eibe

> On 26 May 2017, at 06:24, ENGMohammed kamal <[hidden email]> wrote:
>
> Hi Eibe,
> please see attached files
> I tried to put filtered classifier to both train and test give the same result identically to use classifier with external filter ...  
> Then i tried what you suggest to use filtered classifier in train stage only, it gives me the same results as J48 without any filter..
> i attach also the job
> i want to see also that no or records are 35232 in all versions , it seems that SMOTE didn't work because minority class 174 , and i use 500% , therefore instances in emergency class (minority) should be 1044 not the same 174
> cheers
> M.Kamal
>
>
>
> From: [hidden email] <[hidden email]> on behalf of [hidden email]<[hidden email]>
> Sent: Thursday, May 25, 2017 3:00 AM
> To: [hidden email]
> Subject: Wekalist Digest, Vol 171, Issue 90
>  
> Send Wekalist mailing list submissions to
>         [hidden email]
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://list.waikato.ac.nz/mailman/listinfo/wekalist
> Wekalist Info Page - University of Waikato
> list.waikato.ac.nz
> The Weka mailing list ([hidden email]) is for discussions pertaining to the use of the Weka machine learning workbench. Weka is a collection of machine ...
>
>
> or, via email, send a message with subject or body 'help' to
>         [hidden email]
>
> You can reach the person managing the list at
>         [hidden email]
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Wekalist digest..."
>
>
> Today's Topics:
>
>    1. Re: Weka-Spark (Eibe Frank)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Thu, 25 May 2017 10:38:13 +1200
> From: Eibe Frank <[hidden email]>
> To: "Weka machine learning workbench list."
>         <[hidden email]>
> Subject: Re: [Wekalist] Weka-Spark
> Message-ID: <[hidden email]>
> Content-Type: text/plain; charset=iso-8859-1
>
> It sounds like you are modifying the test data as well as the training data. You should avoid doing that. Use the FilteredClassifier instead so that only the training data is modified.
>
> Cheers,
> Eibe
>
> > On 24/05/2017, at 8:31 PM, ENGMohammed kamal <[hidden email]> wrote:
> >
> >
> > Hi Ebie,
> > Thanks for support , it is working now
> > i tried it with class balancer, resample and SMOTE
> > i think 2 true setting for forceBatchLearningForUpdateableClassifiers" and "forceVotedEnsembleCreation" solve the problem because it's working without and with write the name of backage in arff config.
> > but there's a problem  that when i select SMOTE and customized to work on class no.4  +   500%   increase
> > i found in the evaluation it gives 100% only   (  original  dataset 4 instances in this class )     i found only 8 in o/p  confusion matrix , i ought to have 24 instances
> > cheers,
> > Mohammed Kamal
> >
> >
> >
> > From: [hidden email] <[hidden email]> on behalf of [hidden email]<[hidden email]>
> > Sent: Monday, May 22, 2017 3:00 AM
> > To: [hidden email]
> > Subject: Wekalist Digest, Vol 171, Issue 83
> >  
> > Send Wekalist mailing list submissions to
> >         [hidden email]
> >
> > To subscribe or unsubscribe via the World Wide Web, visit
> >         https://list.waikato.ac.nz/mailman/listinfo/wekalist
> Wekalist Info Page - University of Waikato
> list.waikato.ac.nz
> The Weka mailing list ([hidden email]) is for discussions pertaining to the use of the Weka machine learning workbench. Weka is a collection of machine ...
>
>
> > Wekalist Info Page - University of Waikato
> > list.waikato.ac.nz
> > The Weka mailing list ([hidden email]) is for discussions pertaining to the use of the Weka machine learning workbench. Weka is a collection of machine ...
> >
> >
> > or, via email, send a message with subject or body 'help' to
> >         [hidden email]
> >
> > You can reach the person managing the list at
> >         [hidden email]
> >
> > When replying, please edit your Subject line so it is more specific
> > than "Re: Contents of Wekalist digest..."
> >
> >
> > Today's Topics:
> >
> >    1. Re: Exception with package percentageErrorMetrics Evaluation
> >       metrics - Weka 3.8.1 (Eibe Frank)
> >    2. Re: Weka-Spark (Mark Hall)
> >    3. Re: Output modification (Eibe Frank)
> >
> >
> > ----------------------------------------------------------------------
> >
> > Message: 1
> > Date: Mon, 22 May 2017 09:52:22 +1200
> > From: Eibe Frank <[hidden email]>
> > To: "Weka machine learning workbench list."
> >         <[hidden email]>
> > Subject: Re: [Wekalist] Exception with package percentageErrorMetrics
> >         Evaluation metrics - Weka 3.8.1
> > Message-ID: <[hidden email]>
> > Content-Type: text/plain; charset=utf-8
> >
> > Make sure that you have percentageErrorMetrics version 1.0.2 (the latest version). Some changes were necessary to make it work with WEKA 3.8.1/3.9.1.
> >
> > Cheers,
> > Eibe
> >
> > > On 22/05/2017, at 4:53 AM, Michael Hall <[hidden email]> wrote:
> > >
> > > Trying to classify some new data Explorer started failing.
> > >
> > > Checking weka.log I saw?
> > >
> > > Exception in thread "Thread-1198" java.lang.IllegalAccessError: tried to access field weka.classifiers.evaluation.Evaluation.m_WithClass from class weka.classifiers.evaluation.MeanAbsolutePercentageError
> > >        weka.classifiers.evaluation.MeanAbsolutePercentageError.getStatistic(MeanAbsolutePercentageError.java:127)
> > >        weka.classifiers.evaluation.MeanAbsolutePercentageError.toSummaryString(MeanAbsolutePercentageError.java:113)
> > >        weka.classifiers.evaluation.Evaluation.toSummaryString(Evaluation.java:2873)
> > >        weka.classifiers.evaluation.Evaluation.toSummaryString(Evaluation.java:2757)
> > >        weka.classifiers.Evaluation.toSummaryString(Evaluation.java:1115)
> > >        weka.gui.explorer.ClassifierPanel$18.run(ClassifierPanel.java:1871)
> > >
> > >        at weka.classifiers.evaluation.MeanAbsolutePercentageError.getStatistic(MeanAbsolutePercentageError.java:127)
> > >        at weka.classifiers.evaluation.MeanAbsolutePercentageError.toSummaryString(MeanAbsolutePercentageError.java:113)
> > >        at weka.classifiers.evaluation.Evaluation.toSummaryString(Evaluation.java:2873)
> > >        at weka.classifiers.evaluation.Evaluation.toSummaryString(Evaluation.java:2757)
> > >        at weka.classifiers.Evaluation.toSummaryString(Evaluation.java:1115)
> > >        at weka.gui.explorer.ClassifierPanel$18.run(ClassifierPanel.java:1871)
> > >
> > > The same with weka.classifiers.evaluation.RootMeanSquarePercentageError after I turned off MAPE
> > >
> > > Michael Hall
> > >
> > >
> > >
> > > _______________________________________________
> > > Wekalist mailing list
> > > Send posts to: [hidden email]
> > > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> Wekalist Info Page - University of Waikato
> list.waikato.ac.nz
> The Weka mailing list ([hidden email]) is for discussions pertaining to the use of the Weka machine learning workbench. Weka is a collection of machine ...
>
>
> > Wekalist Info Page - University of Waikato
> > list.waikato.ac.nz
> > The Weka mailing list ([hidden email]) is for discussions pertaining to the use of the Weka machine learning workbench. Weka is a collection of machine ...
> >
> >
> > > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> Weka Mailing List Etiquette - University of Waikato
> www.cs.waikato.ac.nz
> Please do NOT send conference announcements, call for papers, etc., to the mailing list. Please do NOT use the mailing list to advertise your publications.
>
>
> > Weka Mailing List Etiquette - University of Waikato
> > www.cs.waikato.ac.nz
> Department of Computer Science : University of Waikato
> www.cs.waikato.ac.nz
> Information on the department. Includes scholarships, programming competitions, and events.
>
>
> > Please do NOT send conference announcements, call for papers, etc., to the mailing list. Please do NOT use the mailing list to advertise your publications.
> >
> >
> >
> >
> >
> > ------------------------------
> >
> > Message: 2
> > Date: Mon, 22 May 2017 09:56:58 +1200
> > From: Mark Hall <[hidden email]>
> > To: "Weka machine learning workbench list."
> >         <[hidden email]>
> > Subject: Re: [Wekalist] Weka-Spark
> > Message-ID: <[hidden email]>
> > Content-Type: text/plain;       charset="UTF-8"
> >
> > Unfortunately, there is not a way to plot an ROC curve from the output generated by the WekaClassifierEvaluationSparkJob. Additional functionality could be added to the job to write an ARFF file that contains the same information that gets plotted when you visualize a threshold curve in the Explorer's Classify panel. I'll add this to the to-do list for that package.
> >
> > What error did you receive when using the SMOTE filter? Note that you will need to enter "SMOTE" (without quotes) into the "wekaPackages" field in the "Spark configuration" tab of the ArffHeaderSparkJob. This ensures that the SMOTE package's jar files get into the classpath for the Spark workers. If you are using an Aggregateable base classifier (such as NaiveBayes) then you will also need to ensure that "forceBatchLearningForUpdateableClassifiers" and "forceVotedEnsembleCreation" are set to True in the WekaClassifierSparkJob and WekaClassifierEvaluationSparkJob. You will also want want to ensure that a RandomlyShuffleDataSpark job is used first, before the classifier or evaluation job, so that the data is stratified (with respect to the class), resulting in the SMOTE process having approximately the same effect on the data within each partition of the RDD.
> >
> > Cheers,
> > Mark.
> >
> > On 21/05/17, 8:43 AM, "ENGMohammed kamal" <[hidden email] on behalf of [hidden email]> wrote:
> >
> >     Hi all,
> >     i made this spark job on my dataset to learn classifier and test performance i made output to 2 files one for the learned model and the other one for classifier performance , i have some questions?
> >    
> >    
> >     1-  how to draw ROC curve for the output ?   ( multiclass dataset)
> >     2-  Can i draw ROC for more than one classifier  (if i add another weka classifier spark job for another classifier ???
> >     3- according above job how can i use sampling here (SMOTE or class balancer or subsample) ??
> >     because when i used it with filtered classifier inside weka spark classifier job , i have error ??
> >     please told me exactly where can i use sampling and with what component in knowledge flow interface?
> >     cheers,
> >     M.Kamal
> >    
> >    
> >     ________________________________________
> >    
> >    
> >    
> >    
> >    
> >    
> >    
> >     _______________________________________________
> >     Wekalist mailing list
> >     Send posts to: [hidden email]
> >     List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> Wekalist Info Page - University of Waikato
> list.waikato.ac.nz
> The Weka mailing list ([hidden email]) is for discussions pertaining to the use of the Weka machine learning workbench. Weka is a collection of machine ...
>
>
> > Wekalist Info Page - University of Waikato
> > list.waikato.ac.nz
> > The Weka mailing list ([hidden email]) is for discussions pertaining to the use of the Weka machine learning workbench. Weka is a collection of machine ...
> >
> >
> >     List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> Weka Mailing List Etiquette - University of Waikato
> www.cs.waikato.ac.nz
> Please do NOT send conference announcements, call for papers, etc., to the mailing list. Please do NOT use the mailing list to advertise your publications.
>
>
> > Weka Mailing List Etiquette - University of Waikato
> > www.cs.waikato.ac.nz
> Department of Computer Science : University of Waikato
> www.cs.waikato.ac.nz
> Information on the department. Includes scholarships, programming competitions, and events.
>
>
> > Please do NOT send conference announcements, call for papers, etc., to the mailing list. Please do NOT use the mailing list to advertise your publications.
> >
> >
> >    
> >
> >
> >
> >
> > ------------------------------
> >
> > Message: 3
> > Date: Mon, 22 May 2017 10:00:06 +1200
> > From: Eibe Frank <[hidden email]>
> > To: "Weka machine learning workbench list."
> >         <[hidden email]>
> > Subject: Re: [Wekalist] Output modification
> > Message-ID: <[hidden email]>
> > Content-Type: text/plain; charset="utf-8"
> >
> > Yes, it?s possible to add additional plug-in evaluation measures. However, there is currently no plug-in for the straight average of the precision/recall values across classes.
> >
> > Note that WEKA outputs the *weighted* average (weighted by class size) by default. Here is the output for the iris data:
> >
> > === Detailed Accuracy By Class ===
> >
> >                  TP Rate  FP Rate  Precision  Recall   F-Measure  MCC      ROC Area  PRC Area  Class
> >                  0.980    0.000    1.000      0.980    0.990      0.985    0.990     0.987     Iris-setosa
> >                  0.940    0.030    0.940      0.940    0.940      0.910    0.952     0.880     Iris-versicolor
> >                  0.960    0.030    0.941      0.960    0.950      0.925    0.961     0.905     Iris-virginica
> > Weighted Avg.    0.960    0.020    0.960      0.960    0.960      0.940    0.968     0.924    
> >
> > The six per-class precision and recall values are shown, along with their weighted average at the bottom. Because all classes are equally populous in the iris data, the weighted average is equal to the straight average in this case.
> >
> > Cheers,
> > Eibe
> >
> >
> > > On 22/05/2017, at 2:30 AM, Alexander Osherenko <[hidden email]> wrote:
> > >
> > > I wonder, is it possible to modify the classifier output presented on the WEKA console. For instance, is it possible to add the classwise precision P and recall R values after the calculated confusion matrix
> > >
> > > <image.png>
> > > where N is the number of outcome values of classification.
> > >
> > > Best, Alexander
> > > _______________________________________________
> > > Wekalist mailing list
> > > Send posts to: [hidden email]
> > > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> Wekalist Info Page - University of Waikato
> list.waikato.ac.nz
> The Weka mailing list ([hidden email]) is for discussions pertaining to the use of the Weka machine learning workbench. Weka is a collection of machine ...
>
>
> > Wekalist Info Page - University of Waikato
> > list.waikato.ac.nz
> > The Weka mailing list ([hidden email]) is for discussions pertaining to the use of the Weka machine learning workbench. Weka is a collection of machine ...
> >
> >
> > > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> Weka Mailing List Etiquette - University of Waikato
> www.cs.waikato.ac.nz
> Please do NOT send conference announcements, call for papers, etc., to the mailing list. Please do NOT use the mailing list to advertise your publications.
>
>
> > Weka Mailing List Etiquette - University of Waikato
> > www.cs.waikato.ac.nz
> Department of Computer Science : University of Waikato
> www.cs.waikato.ac.nz
> Information on the department. Includes scholarships, programming competitions, and events.
>
>
> > Please do NOT send conference announcements, call for papers, etc., to the mailing list. Please do NOT use the mailing list to advertise your publications.
> >
> >
> >
> > -------------- next part --------------
> > An HTML attachment was scrubbed...
> > URL: <http://list.waikato.ac.nz/pipermail/wekalist/attachments/20170522/b4c83e33/attachment-0001.html>
> >
> > ------------------------------
> >
> > _______________________________________________
> > Wekalist mailing list
> > [hidden email]
> > https://list.waikato.ac.nz/mailman/listinfo/wekalist
> Wekalist Info Page - University of Waikato
> list.waikato.ac.nz
> The Weka mailing list ([hidden email]) is for discussions pertaining to the use of the Weka machine learning workbench. Weka is a collection of machine ...
>
>
> > Wekalist Info Page - University of Waikato
> > list.waikato.ac.nz
> > The Weka mailing list ([hidden email]) is for discussions pertaining to the use of the Weka machine learning workbench. Weka is a collection of machine ...
> >
> >
> >
> >
> > End of Wekalist Digest, Vol 171, Issue 83
> > *****************************************
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> Wekalist Info Page - University of Waikato
> list.waikato.ac.nz
> The Weka mailing list ([hidden email]) is for discussions pertaining to the use of the Weka machine learning workbench. Weka is a collection of machine ...
>
>
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> Weka Mailing List Etiquette - University of Waikato
> www.cs.waikato.ac.nz
> Please do NOT send conference announcements, call for papers, etc., to the mailing list. Please do NOT use the mailing list to advertise your publications.
>
>
>
>
>
> ------------------------------
>
> _______________________________________________
> Wekalist mailing list
> [hidden email]
> https://list.waikato.ac.nz/mailman/listinfo/wekalist
> Wekalist Info Page - University of Waikato
> list.waikato.ac.nz
> The Weka mailing list ([hidden email]) is for discussions pertaining to the use of the Weka machine learning workbench. Weka is a collection of machine ...
>
>
>
>
> End of Wekalist Digest, Vol 171, Issue 90
> *****************************************
> <evaluation J48.txt><evaluation J48 FC Train only.txt><evaluation J48 classifier and filters both train test .txt><evaluation J48 Filtered classifier both train test.txt><IHCAMjob.kf>_______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Weka-Spark

MOHAMMED KAMAL
In reply to this post by MOHAMMED KAMAL

Hi Eibe,

when i put sampling techniques only in weka spark job and remove it from weka spark evaluation job (no change in values ) .. why ?

i have more 3 questions

1- if i have 10 classifiers and iwant to draw ROC curve for them for comparison ( with best high resolution )  , you told me befor this is not available in knowledge flow , how can i do this using explorer or experminter ?  note : i try normal weka o/p  it's not clear and most of line overlapped . with sympols like + & * 


2-  i worked with vote classifiers vote ensemble in weka spark  , the output model for JRip ( contains 4 model) as i sliced my dataset to 4 partitions ( please see attached file) , but how can i know the best winning model between the 4 models by voting.


3-  if i reference my work in research paper for experiments which are done by weka -spark, it is enough these two references or i need more , specially for weka-spark ?? 

M.Hall, Mark, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. "The WEKA data mining software: an update." ACM SIGKDD explorations newsletter 11, no. 1 (2009): 10-18.


[69] Eibe Frank, Mark A. Hall, and Ian H. Witten (2016). The WEKA Workbench. Online Appendix for "Data Mining: Practical Machine Learning Tools and Techniques", Morgan Kaufmann, Fourth Edition, 2016.


Cheers

yours

M.kamal








From: ENGMohammed kamal <[hidden email]>
Sent: Thursday, May 25, 2017 9:24 PM
To: [hidden email]
Subject: Re: Weka-Spark
 

Hi Eibe,

please see attached files

I tried to put filtered classifier to both train and test give the same result identically to use classifier with external filter ...  
Then i tried what you suggest to use filtered classifier in train stage only, it gives me the same results as J48 without any filter..
i attach also the job 
i want to see also that no or records are 35232 in all versions , it seems that SMOTE didn't work because minority class 174 , and i use 500% , therefore instances in emergency class (minority) should be 1044 not the same 174 
cheers

M.Kamal






From: [hidden email] <[hidden email]> on behalf of [hidden email] <[hidden email]>
Sent: Thursday, May 25, 2017 3:00 AM
To: [hidden email]
Subject: Wekalist Digest, Vol 171, Issue 90
 
Send Wekalist mailing list submissions to
        [hidden email]

To subscribe or unsubscribe via the World Wide Web, visit
        https://list.waikato.ac.nz/mailman/listinfo/wekalist
list.waikato.ac.nz
The Weka mailing list ([hidden email]) is for discussions pertaining to the use of the Weka machine learning workbench. Weka is a collection of machine ...


or, via email, send a message with subject or body 'help' to
        [hidden email]

You can reach the person managing the list at
        [hidden email]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Wekalist digest..."


Today's Topics:

   1. Re: Weka-Spark (Eibe Frank)


----------------------------------------------------------------------

Message: 1
Date: Thu, 25 May 2017 10:38:13 +1200
From: Eibe Frank <[hidden email]>
To: "Weka machine learning workbench list."
        <[hidden email]>
Subject: Re: [Wekalist] Weka-Spark
Message-ID: <[hidden email]>
Content-Type: text/plain; charset=iso-8859-1

It sounds like you are modifying the test data as well as the training data. You should avoid doing that. Use the FilteredClassifier instead so that only the training data is modified.

Cheers,
Eibe

> On 24/05/2017, at 8:31 PM, ENGMohammed kamal <[hidden email]> wrote:
>
>
> Hi Ebie,
> Thanks for support , it is working now
> i tried it with class balancer, resample and SMOTE
> i think 2 true setting for forceBatchLearningForUpdateableClassifiers" and "forceVotedEnsembleCreation" solve the problem because it's working without and with write the name of backage in arff config.
> but there's a problem  that when i select SMOTE and customized to work on class no.4  +   500%   increase
> i found in the evaluation it gives 100% only   (  original  dataset 4 instances in this class )     i found only 8 in o/p  confusion matrix , i ought to have 24 instances
> cheers,
> Mohammed Kamal
>
>
>
> From: [hidden email] <[hidden email]> on behalf of [hidden email]<[hidden email]>
> Sent: Monday, May 22, 2017 3:00 AM
> To: [hidden email]
> Subject: Wekalist Digest, Vol 171, Issue 83

> Send Wekalist mailing list submissions to
>         [hidden email]
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://list.waikato.ac.nz/mailman/listinfo/wekalist
list.waikato.ac.nz
The Weka mailing list ([hidden email]) is for discussions pertaining to the use of the Weka machine learning workbench. Weka is a collection of machine ...


> Wekalist Info Page - University of Waikato
> list.waikato.ac.nz
> The Weka mailing list ([hidden email]) is for discussions pertaining to the use of the Weka machine learning workbench. Weka is a collection of machine ...
>
>
> or, via email, send a message with subject or body 'help' to
>         [hidden email]
>
> You can reach the person managing the list at
>         [hidden email]
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Wekalist digest..."
>
>
> Today's Topics:
>
>    1. Re: Exception with package percentageErrorMetrics Evaluation
>       metrics - Weka 3.8.1 (Eibe Frank)
>    2. Re: Weka-Spark (Mark Hall)
>    3. Re: Output modification (Eibe Frank)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 22 May 2017 09:52:22 +1200
> From: Eibe Frank <[hidden email]>
> To: "Weka machine learning workbench list."
>         <[hidden email]>
> Subject: Re: [Wekalist] Exception with package percentageErrorMetrics
>         Evaluation metrics - Weka 3.8.1
> Message-ID: <[hidden email]>
> Content-Type: text/plain; charset=utf-8
>
> Make sure that you have percentageErrorMetrics version 1.0.2 (the latest version). Some changes were necessary to make it work with WEKA 3.8.1/3.9.1.
>
> Cheers,
> Eibe
>
> > On 22/05/2017, at 4:53 AM, Michael Hall <[hidden email]> wrote:
> >
> > Trying to classify some new data Explorer started failing.
> >
> > Checking weka.log I saw?
> >
> > Exception in thread "Thread-1198" java.lang.IllegalAccessError: tried to access field weka.classifiers.evaluation.Evaluation.m_WithClass from class weka.classifiers.evaluation.MeanAbsolutePercentageError
> >        weka.classifiers.evaluation.MeanAbsolutePercentageError.getStatistic(MeanAbsolutePercentageError.java:127)
> >        weka.classifiers.evaluation.MeanAbsolutePercentageError.toSummaryString(MeanAbsolutePercentageError.java:113)
> >        weka.classifiers.evaluation.Evaluation.toSummaryString(Evaluation.java:2873)
> >        weka.classifiers.evaluation.Evaluation.toSummaryString(Evaluation.java:2757)
> >        weka.classifiers.Evaluation.toSummaryString(Evaluation.java:1115)
> >        weka.gui.explorer.ClassifierPanel$18.run(ClassifierPanel.java:1871)
> >
> >        at weka.classifiers.evaluation.MeanAbsolutePercentageError.getStatistic(MeanAbsolutePercentageError.java:127)
> >        at weka.classifiers.evaluation.MeanAbsolutePercentageError.toSummaryString(MeanAbsolutePercentageError.java:113)
> >        at weka.classifiers.evaluation.Evaluation.toSummaryString(Evaluation.java:2873)
> >        at weka.classifiers.evaluation.Evaluation.toSummaryString(Evaluation.java:2757)
> >        at weka.classifiers.Evaluation.toSummaryString(Evaluation.java:1115)
> >        at weka.gui.explorer.ClassifierPanel$18.run(ClassifierPanel.java:1871)
> >
> > The same with weka.classifiers.evaluation.RootMeanSquarePercentageError after I turned off MAPE
> >
> > Michael Hall
> >
> >
> >
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
list.waikato.ac.nz
The Weka mailing list ([hidden email]) is for discussions pertaining to the use of the Weka machine learning workbench. Weka is a collection of machine ...


> Wekalist Info Page - University of Waikato
> list.waikato.ac.nz
> The Weka mailing list ([hidden email]) is for discussions pertaining to the use of the Weka machine learning workbench. Weka is a collection of machine ...
>
>
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
www.cs.waikato.ac.nz
Please do NOT send conference announcements, call for papers, etc., to the mailing list. Please do NOT use the mailing list to advertise your publications.


> Weka Mailing List Etiquette - University of Waikato
> www.cs.waikato.ac.nz
www.cs.waikato.ac.nz
Information on the department. Includes scholarships, programming competitions, and events.


> Please do NOT send conference announcements, call for papers, etc., to the mailing list. Please do NOT use the mailing list to advertise your publications.
>
>
>
>
>
> ------------------------------
>
> Message: 2
> Date: Mon, 22 May 2017 09:56:58 +1200
> From: Mark Hall <[hidden email]>
> To: "Weka machine learning workbench list."
>         <[hidden email]>
> Subject: Re: [Wekalist] Weka-Spark
> Message-ID: <[hidden email]>
> Content-Type: text/plain;       charset="UTF-8"
>
> Unfortunately, there is not a way to plot an ROC curve from the output generated by the WekaClassifierEvaluationSparkJob. Additional functionality could be added to the job to write an ARFF file that contains the same information that gets plotted when you visualize a threshold curve in the Explorer's Classify panel. I'll add this to the to-do list for that package.
>
> What error did you receive when using the SMOTE filter? Note that you will need to enter "SMOTE" (without quotes) into the "wekaPackages" field in the "Spark configuration" tab of the ArffHeaderSparkJob. This ensures that the SMOTE package's jar files get into the classpath for the Spark workers. If you are using an Aggregateable base classifier (such as NaiveBayes) then you will also need to ensure that "forceBatchLearningForUpdateableClassifiers" and "forceVotedEnsembleCreation" are set to True in the WekaClassifierSparkJob and WekaClassifierEvaluationSparkJob. You will also want want to ensure that a RandomlyShuffleDataSpark job is used first, before the classifier or evaluation job, so that the data is stratified (with respect to the class), resulting in the SMOTE process having approximately the same effect on the data within each partition of the RDD.
>
> Cheers,
> Mark.
>
> On 21/05/17, 8:43 AM, "ENGMohammed kamal" <[hidden email] on behalf of [hidden email]> wrote:
>
>     Hi all,
>     i made this spark job on my dataset to learn classifier and test performance i made output to 2 files one for the learned model and the other one for classifier performance , i have some questions?
>    
>    
>     1-  how to draw ROC curve for the output ?   ( multiclass dataset)
>     2-  Can i draw ROC for more than one classifier  (if i add another weka classifier spark job for another classifier ???
>     3- according above job how can i use sampling here (SMOTE or class balancer or subsample) ??
>     because when i used it with filtered classifier inside weka spark classifier job , i have error ??
>     please told me exactly where can i use sampling and with what component in knowledge flow interface?
>     cheers,
>     M.Kamal
>    
>    
>     ________________________________________
>    
>    
>    
>    
>    
>    
>    
>     _______________________________________________
>     Wekalist mailing list
>     Send posts to: [hidden email]
>     List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
list.waikato.ac.nz
The Weka mailing list ([hidden email]) is for discussions pertaining to the use of the Weka machine learning workbench. Weka is a collection of machine ...


> Wekalist Info Page - University of Waikato
> list.waikato.ac.nz
> The Weka mailing list ([hidden email]) is for discussions pertaining to the use of the Weka machine learning workbench. Weka is a collection of machine ...
>
>
>     List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
www.cs.waikato.ac.nz
Please do NOT send conference announcements, call for papers, etc., to the mailing list. Please do NOT use the mailing list to advertise your publications.


> Weka Mailing List Etiquette - University of Waikato
> www.cs.waikato.ac.nz
www.cs.waikato.ac.nz
Information on the department. Includes scholarships, programming competitions, and events.


> Please do NOT send conference announcements, call for papers, etc., to the mailing list. Please do NOT use the mailing list to advertise your publications.
>
>
>    
>
>
>
>
> ------------------------------
>
> Message: 3
> Date: Mon, 22 May 2017 10:00:06 +1200
> From: Eibe Frank <[hidden email]>
> To: "Weka machine learning workbench list."
>         <[hidden email]>
> Subject: Re: [Wekalist] Output modification
> Message-ID: <[hidden email]>
> Content-Type: text/plain; charset="utf-8"
>
> Yes, it?s possible to add additional plug-in evaluation measures. However, there is currently no plug-in for the straight average of the precision/recall values across classes.
>
> Note that WEKA outputs the *weighted* average (weighted by class size) by default. Here is the output for the iris data:
>
> === Detailed Accuracy By Class ===
>
>                  TP Rate  FP Rate  Precision  Recall   F-Measure  MCC      ROC Area  PRC Area  Class
>                  0.980    0.000    1.000      0.980    0.990      0.985    0.990     0.987     Iris-setosa
>                  0.940    0.030    0.940      0.940    0.940      0.910    0.952     0.880     Iris-versicolor
>                  0.960    0.030    0.941      0.960    0.950      0.925    0.961     0.905     Iris-virginica
> Weighted Avg.    0.960    0.020    0.960      0.960    0.960      0.940    0.968     0.924    
>
> The six per-class precision and recall values are shown, along with their weighted average at the bottom. Because all classes are equally populous in the iris data, the weighted average is equal to the straight average in this case.
>
> Cheers,
> Eibe
>
>
> > On 22/05/2017, at 2:30 AM, Alexander Osherenko <[hidden email]> wrote:
> >
> > I wonder, is it possible to modify the classifier output presented on the WEKA console. For instance, is it possible to add the classwise precision P and recall R values after the calculated confusion matrix
> >
> > <image.png>
> > where N is the number of outcome values of classification.
> >
> > Best, Alexander
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
list.waikato.ac.nz
The Weka mailing list ([hidden email]) is for discussions pertaining to the use of the Weka machine learning workbench. Weka is a collection of machine ...


> Wekalist Info Page - University of Waikato
> list.waikato.ac.nz
> The Weka mailing list ([hidden email]) is for discussions pertaining to the use of the Weka machine learning workbench. Weka is a collection of machine ...
>
>
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
www.cs.waikato.ac.nz
Please do NOT send conference announcements, call for papers, etc., to the mailing list. Please do NOT use the mailing list to advertise your publications.


> Weka Mailing List Etiquette - University of Waikato
> www.cs.waikato.ac.nz
www.cs.waikato.ac.nz
Information on the department. Includes scholarships, programming competitions, and events.


> Please do NOT send conference announcements, call for papers, etc., to the mailing list. Please do NOT use the mailing list to advertise your publications.
>
>
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://list.waikato.ac.nz/pipermail/wekalist/attachments/20170522/b4c83e33/attachment-0001.html>
>
> ------------------------------
>
> _______________________________________________
> Wekalist mailing list
> [hidden email]
> https://list.waikato.ac.nz/mailman/listinfo/wekalist
list.waikato.ac.nz
The Weka mailing list ([hidden email]) is for discussions pertaining to the use of the Weka machine learning workbench. Weka is a collection of machine ...


> Wekalist Info Page - University of Waikato
> list.waikato.ac.nz
> The Weka mailing list ([hidden email]) is for discussions pertaining to the use of the Weka machine learning workbench. Weka is a collection of machine ...
>
>
>
>
> End of Wekalist Digest, Vol 171, Issue 83
> *****************************************
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
list.waikato.ac.nz
The Weka mailing list ([hidden email]) is for discussions pertaining to the use of the Weka machine learning workbench. Weka is a collection of machine ...


> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
www.cs.waikato.ac.nz
Please do NOT send conference announcements, call for papers, etc., to the mailing list. Please do NOT use the mailing list to advertise your publications.





------------------------------

_______________________________________________
Wekalist mailing list
[hidden email]
https://list.waikato.ac.nz/mailman/listinfo/wekalist
list.waikato.ac.nz
The Weka mailing list ([hidden email]) is for discussions pertaining to the use of the Weka machine learning workbench. Weka is a collection of machine ...




End of Wekalist Digest, Vol 171, Issue 90
*****************************************

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

jrip smote model.docx (17K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Weka-Spark

Eibe Frank-2
Administrator

> On 7/06/2017, at 8:34 AM, ENGMohammed kamal <[hidden email]> wrote:
>
> when i put sampling techniques only in weka spark job and remove it from weka spark evaluation job (no change in values ) .. why ?

Can you be more specific about what you did?

> i have more 3 questions
> 1- if i have 10 classifiers and iwant to draw ROC curve for them for comparison ( with best high resolution )  , you told me befor this is not available in knowledge flow , how can i do this using explorer or experminter ?  note : i try normal weka o/p  it's not clear and most of line overlapped . with sympols like + & *

The Explorer uses the same code as the KnowledgeFlow. Yes, you can use the KnowledgeFlow to plot ROC curves. You can also save the data for a plot into a file, using the Save… button in the plotting window. Then you can use an external tool to make a high-resolution plot.

> 2-  i worked with vote classifiers vote ensemble in weka spark  , the output model for JRip ( contains 4 model) as i sliced my dataset to 4 partitions ( please see attached file) , but how can i know the best winning model between the 4 models by voting.

Voting will use all the models. It’s does not use a single model for prediction.

> 3-  if i reference my work in research paper for experiments which are done by weka -spark, it is enough these two references or i need more , specially for weka-spark ??
> M.Hall, Mark, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. "The WEKA data mining software: an update." ACM SIGKDD explorations newsletter 11, no. 1 (2009): 10-18.
>
> [69] Eibe Frank, Mark A. Hall, and Ian H. Witten (2016). The WEKA Workbench. Online Appendix for "Data Mining: Practical Machine Learning Tools and Techniques", Morgan Kaufmann, Fourth Edition, 2016.

That sounds fine.

Cheers,
Eibe

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Loading...