Question on getting predictions from saved models from Weka

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Question on getting predictions from saved models from Weka

SanjayPethe
Hello,
First of all, Thank You to all Weka developers for having created this product. It has helped me tremendously come up the steep learning curve in this area. I am working on a text classification problem, and have models running in both the Explorer and Knowledge Flow with an InputMapped Classifier using J48. I intend to experiment with classifiers soon.
 
One area that I have found difficult in Weka is making predictions from saved models. I have done this in Explorer, but not had much success doing so in Knowledge Flow. The process for doing so however is arcane and really non intuitive. (I am referring to the methods mentioned at <a href="https://weka.wikispaces.com/Saving&#43;and&#43;loading&#43;models">https://weka.wikispaces.com/Saving+and+loading+models and https://weka.wikispaces.com/Making+predictions). This is such a basic step that I am wondering if I am missing some simpler way to do this. I tried the AddClassification filter mentioned in the second article and it did not work for me.
 
I would really like to know if there is a way to get this working in Knowledge flow in particular because that would allow me to perform other actions such as removing attributes as part of a single process. My initial trial set had about 30K instances that end up with about 8K-10K attributes after the StringToWordVector conversion. The dataset I eventually want to process (for prediction only) has about 4 million instances and I expect there to be about 10K attributes after the StringToWordVector conversion. The output I want is a csv or text file with the ID, predicted class and maybe some confidence measure, but certainly not all the 8K-10K attributes representing the words. Unfortunately the only way I know to get this is to use the Explorer to load a saved model, make the predictions, look at them in the visualizer and save all attributes to an arff file. Then I open the arff file remove the unwanted attributes and save as csv. Unfortunately, these arff files will be very large because of the thousands of extra word attributes.
 
I could use the output attributes option and copy – paste from the classifier output, if I could find a way to include the instance ID in the output. However, is there a way to save this file directly instead of copy-paste? Don’t know whether there will be any size limit issues with this volume of data with copy-paste.
 
Is there a better way to accomplish this than what I have outlined here? I am using Weka 3.8 on Windows 7.
 
Regards,
Sanjay Pethe
 

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Question on getting predictions from saved models from Weka

Manoj Agrawal

Hi Sanjay,


I am not an expert on this but here is what I had done for my data.


1. Choose FilteredClassifier as the main classifier.

2. Choose AttributeSelectedClassifier and StringToWordToVector as filter by clicking on FilteredClassifier space.

3. Choose 'NaiveBayesMultinomial' as classifier and 'InfoGainAttributeEval' as evaluator and  'Ranker' search function by clicking on AttributeSelectedClassifier space.

4. Choose your supplied test set and execute.


Please let me know if I understood the problem correctly and if it helps.


regards,


Manoj Agrawal



From: [hidden email] <[hidden email]> on behalf of Pethe, Sanjay <[hidden email]>
Sent: Tuesday, February 14, 2017 11:23:10 PM
To: [hidden email]
Subject: [Wekalist] Question on getting predictions from saved models from Weka
 
Hello,
First of all, Thank You to all Weka developers for having created this product. It has helped me tremendously come up the steep learning curve in this area. I am working on a text classification problem, and have models running in both the Explorer and Knowledge Flow with an InputMapped Classifier using J48. I intend to experiment with classifiers soon.
 
One area that I have found difficult in Weka is making predictions from saved models. I have done this in Explorer, but not had much success doing so in Knowledge Flow. The process for doing so however is arcane and really non intuitive. (I am referring to the methods mentioned at <a href="https://weka.wikispaces.com/Saving&#43;and&#43;loading&#43;models" id="LPlnk452885" previewremoved="true"> https://weka.wikispaces.com/Saving+and+loading+models and https://weka.wikispaces.com/Making+predictions). This is such a basic step that I am wondering if I am missing some simpler way to do this. I tried the AddClassification filter mentioned in the second article and it did not work for me.
weka.wikispaces.com
Command line The following sections show how to obtain predictions/classifications without writing your own Java code via the command line. Classifiers

<a id="LPUrlAnchor_14871345377300.7786136409616284" href="https://weka.wikispaces.com/Saving&#43;and&#43;loading&#43;models" target="_blank" style="text-decoration: none;">weka - Saving and loading models
weka.wikispaces.com
Classifiers Commandline You save a trained classifier with the -d option (dumping), e.g.: java weka.classifiers.trees.J48 -C 0.25-M 2-t / some / where / train.arff -d ...

 
I would really like to know if there is a way to get this working in Knowledge flow in particular because that would allow me to perform other actions such as removing attributes as part of a single process. My initial trial set had about 30K instances that end up with about 8K-10K attributes after the StringToWordVector conversion. The dataset I eventually want to process (for prediction only) has about 4 million instances and I expect there to be about 10K attributes after the StringToWordVector conversion. The output I want is a csv or text file with the ID, predicted class and maybe some confidence measure, but certainly not all the 8K-10K attributes representing the words. Unfortunately the only way I know to get this is to use the Explorer to load a saved model, make the predictions, look at them in the visualizer and save all attributes to an arff file. Then I open the arff file remove the unwanted attributes and save as csv. Unfortunately, these arff files will be very large because of the thousands of extra word attributes.
 
I could use the output attributes option and copy – paste from the classifier output, if I could find a way to include the instance ID in the output. However, is there a way to save this file directly instead of copy-paste? Don’t know whether there will be any size limit issues with this volume of data with copy-paste.
 
Is there a better way to accomplish this than what I have outlined here? I am using Weka 3.8 on Windows 7.
 
Regards,
Sanjay Pethe
 

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Question on getting predictions from saved models from Weka

Mark Hall
In reply to this post by SanjayPethe
I'm not too sure why this is proving difficult for you. Here are two examples:

1. Command line. Assuming I have a serialized model called test.model trained on the iris data

java weka.Run .AddClassification -serialized test.model -i ~/datasets/UCI/iris.arff -c last

add the –distribution flag if you want to see probability distributions instead of predicted labels

2. In the KnowledgeFlow

ArffLoader -dataset-> ClassAssigner -dataset-> TestSetMaker -testSet-> <classifier step that matches the saved classifier, configured with the path to test.model in "Classifier model to load" under "Additional options"> -batchClassifier-> PredictionAppender -testSet-> ArffSaver/CSVSaver/TextViewer etc.

The PredictionAppender can be configured to output probability distributions instead of labels too.

If you are using lots of filters for preprocessing, then you need to wrap these up with your chosen base classifier in a FilteredClassifier when building and saving your model. More than one filter can be conveniently specified by using a MultiFilter.

Cheers,
Mark.

On 15/02/17, 5:23 PM, "Pethe, Sanjay" <[hidden email] on behalf of [hidden email]> wrote:

    Hello,
    First of all, Thank You to all Weka developers for having created this product. It has helped me tremendously come up the steep learning curve in this area. I am working on a text classification problem, and have models running in both the Explorer and
    Knowledge Flow with an InputMapped Classifier using J48. I intend to experiment with classifiers soon.
     
    One area that I have found difficult in Weka is making predictions from saved models. I have done this in Explorer, but not had much success doing so in Knowledge Flow. The process for doing so however is arcane and really non intuitive. (I am referring
    to the methods mentioned at https://weka.wikispaces.com/Saving+and+loading+models and https://weka.wikispaces.com/Making+predictions).
    This is such a basic step that I am wondering if I am missing some simpler way to do this. I tried the AddClassification filter mentioned in the second article and it did not work for me.
     
    I would really like to know if there is a way to get this working in Knowledge flow in particular because that would allow me to perform other actions such as removing attributes as part of a single process. My initial trial set had about 30K instances
    that end up with about 8K-10K attributes after the StringToWordVector conversion. The dataset I eventually want to process (for prediction only) has about 4 million instances and I expect there to be about 10K attributes after the StringToWordVector conversion.
    The output I want is a csv or text file with the ID, predicted class and maybe some confidence measure, but certainly not all the 8K-10K attributes representing the words. Unfortunately the only way I know to get this is to use the Explorer to load a saved
    model, make the predictions, look at them in the visualizer and save all attributes to an arff file. Then I open the arff file remove the unwanted attributes and save as csv. Unfortunately, these arff files will be very large because of the thousands of extra
    word attributes.
     
    I could use the output attributes option and copy – paste from the classifier output, if I could find a way to include the instance ID in the output. However, is there a way to save this file directly instead of copy-paste? Don’t know whether there will
    be any size limit issues with this volume of data with copy-paste.
     
    Is there a better way to accomplish this than what I have outlined here? I am using Weka 3.8 on Windows 7.
     
    Regards,
    Sanjay Pethe
   
     
   
   
    _______________________________________________
    Wekalist mailing list
    Send posts to: [hidden email]
    List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
    List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
   


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: [EXTERNAL]Re: Question on getting predictions from saved models from Weka

SanjayPethe
Mark,
Thank you for the prompt response - had not noticed it because the response had been sent to spam and it just occurred to me to look there.

I have not tried command line operations in the past, may give that a shot. I have tried something similar to what you suggest for the KF, but not using a Filtered Classifier. I have done the filtering separately upfront and then used an InputMapped Classifier, and this did not work. I'll try what you recommend with the FilteredClassifier and let you know if that worked.

Regards,
Sanjay Pethe

-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of Mark Hall
Sent: Wednesday, February 15, 2017 2:06 AM
To: Weka machine learning workbench list. <[hidden email]>
Subject: [EXTERNAL]Re: [Wekalist] Question on getting predictions from saved models from Weka

I'm not too sure why this is proving difficult for you. Here are two examples:

1. Command line. Assuming I have a serialized model called test.model trained on the iris data

java weka.Run .AddClassification -serialized test.model -i ~/datasets/UCI/iris.arff -c last

add the –distribution flag if you want to see probability distributions instead of predicted labels

2. In the KnowledgeFlow

ArffLoader -dataset-> ClassAssigner -dataset-> TestSetMaker -testSet-> <classifier step that matches the saved classifier, configured with the path to test.model in "Classifier model to load" under "Additional options"> -batchClassifier-> PredictionAppender -testSet-> ArffSaver/CSVSaver/TextViewer etc.

The PredictionAppender can be configured to output probability distributions instead of labels too.

If you are using lots of filters for preprocessing, then you need to wrap these up with your chosen base classifier in a FilteredClassifier when building and saving your model. More than one filter can be conveniently specified by using a MultiFilter.

Cheers,
Mark.

On 15/02/17, 5:23 PM, "Pethe, Sanjay" <[hidden email] on behalf of [hidden email]> wrote:

    Hello,
    First of all, Thank You to all Weka developers for having created this product. It has helped me tremendously come up the steep learning curve in this area. I am working on a text classification problem, and have models running in both the Explorer and
    Knowledge Flow with an InputMapped Classifier using J48. I intend to experiment with classifiers soon.
     
    One area that I have found difficult in Weka is making predictions from saved models. I have done this in Explorer, but not had much success doing so in Knowledge Flow. The process for doing so however is arcane and really non intuitive. (I am referring
    to the methods mentioned at https://weka.wikispaces.com/Saving+and+loading+models and https://weka.wikispaces.com/Making+predictions).
    This is such a basic step that I am wondering if I am missing some simpler way to do this. I tried the AddClassification filter mentioned in the second article and it did not work for me.
     
    I would really like to know if there is a way to get this working in Knowledge flow in particular because that would allow me to perform other actions such as removing attributes as part of a single process. My initial trial set had about 30K instances
    that end up with about 8K-10K attributes after the StringToWordVector conversion. The dataset I eventually want to process (for prediction only) has about 4 million instances and I expect there to be about 10K attributes after the StringToWordVector conversion.
    The output I want is a csv or text file with the ID, predicted class and maybe some confidence measure, but certainly not all the 8K-10K attributes representing the words. Unfortunately the only way I know to get this is to use the Explorer to load a saved
    model, make the predictions, look at them in the visualizer and save all attributes to an arff file. Then I open the arff file remove the unwanted attributes and save as csv. Unfortunately, these arff files will be very large because of the thousands of extra
    word attributes.
     
    I could use the output attributes option and copy – paste from the classifier output, if I could find a way to include the instance ID in the output. However, is there a way to save this file directly instead of copy-paste? Don’t know whether there will
    be any size limit issues with this volume of data with copy-paste.
     
    Is there a better way to accomplish this than what I have outlined here? I am using Weka 3.8 on Windows 7.
     
    Regards,
    Sanjay Pethe
   
     
   
   
    _______________________________________________
    Wekalist mailing list
    Send posts to: [hidden email]
    List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
    List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
   


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Question on getting predictions from saved models from Weka

SanjayPethe
In reply to this post by Manoj Agrawal

Thanks Manoj.

 

I have not tried this approach, but have tried what Mark suggested and that seems to be working (Use InputMapped as main classifier, with FilteredClassifier as the classifier and the actual classifier I want to use specified in that. Use a MultiFilter as the filter in the FilteredClassifier and add the NominalToString and StringToWordVector filters in that. I am using SMO as the classifier – NaiveBayes gave poor results for me).

 

I am currently evaluating the results – I have been getting some funky output from the csv output that I have described in another post. I have never used some of the components you mention below, so will try to give that a shot (hopefully soon)  to learn more about them.

 

Regards,

Sanjay Pethe

 

From: [hidden email] [mailto:[hidden email]] On Behalf Of Manoj Agrawal
Sent: Tuesday, February 14, 2017 11:01 PM
To: [hidden email]
Subject: [EXTERNAL]Re: [Wekalist] Question on getting predictions from saved models from Weka

 

Hi Sanjay,

 

I am not an expert on this but here is what I had done for my data.

 

1. Choose FilteredClassifier as the main classifier.

2. Choose AttributeSelectedClassifier and StringToWordToVector as filter by clicking on FilteredClassifier space.

3. Choose 'NaiveBayesMultinomial' as classifier and 'InfoGainAttributeEval' as evaluator and  'Ranker' search function by clicking on AttributeSelectedClassifier space.

4. Choose your supplied test set and execute.

 

Please let me know if I understood the problem correctly and if it helps.

 

regards,

 

Manoj Agrawal

 


From: [hidden email] <[hidden email]> on behalf of Pethe, Sanjay <[hidden email]>
Sent: Tuesday, February 14, 2017 11:23:10 PM
To: [hidden email]
Subject: [Wekalist] Question on getting predictions from saved models from Weka

 

Hello,

First of all, Thank You to all Weka developers for having created this product. It has helped me tremendously come up the steep learning curve in this area. I am working on a text classification problem, and have models running in both the Explorer and Knowledge Flow with an InputMapped Classifier using J48. I intend to experiment with classifiers soon.

 

One area that I have found difficult in Weka is making predictions from saved models. I have done this in Explorer, but not had much success doing so in Knowledge Flow. The process for doing so however is arcane and really non intuitive. (I am referring to the methods mentioned at <a href="https://weka.wikispaces.com/Saving&#43;and&#43;loading&#43;models" id="LPlnk452885"> https://weka.wikispaces.com/Saving+and+loading+models and https://weka.wikispaces.com/Making+predictions). This is such a basic step that I am wondering if I am missing some simpler way to do this. I tried the AddClassification filter mentioned in the second article and it did not work for me.

weka.wikispaces.com

Command line The following sections show how to obtain predictions/classifications without writing your own Java code via the command line. Classifiers

 

<a href="https://weka.wikispaces.com/Saving&#43;and&#43;loading&#43;models" target="_blank">weka - Saving and loading models

weka.wikispaces.com

Classifiers Commandline You save a trained classifier with the -d option (dumping), e.g.: java weka.classifiers.trees.J48 -C 0.25-M 2-t / some / where / train.arff -d ...

 

 

I would really like to know if there is a way to get this working in Knowledge flow in particular because that would allow me to perform other actions such as removing attributes as part of a single process. My initial trial set had about 30K instances that end up with about 8K-10K attributes after the StringToWordVector conversion. The dataset I eventually want to process (for prediction only) has about 4 million instances and I expect there to be about 10K attributes after the StringToWordVector conversion. The output I want is a csv or text file with the ID, predicted class and maybe some confidence measure, but certainly not all the 8K-10K attributes representing the words. Unfortunately the only way I know to get this is to use the Explorer to load a saved model, make the predictions, look at them in the visualizer and save all attributes to an arff file. Then I open the arff file remove the unwanted attributes and save as csv. Unfortunately, these arff files will be very large because of the thousands of extra word attributes.

 

I could use the output attributes option and copy – paste from the classifier output, if I could find a way to include the instance ID in the output. However, is there a way to save this file directly instead of copy-paste? Don’t know whether there will be any size limit issues with this volume of data with copy-paste.

 

Is there a better way to accomplish this than what I have outlined here? I am using Weka 3.8 on Windows 7.

 

Regards,

Sanjay Pethe

 


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html