Hopefully simple question

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Hopefully simple question

Bradley E Harris

Hi, everyone.

This seems like the answer should be simple, but I can't seem to get it right.

My goal is to take a training and test set (in ARFF format) with one string attribute and a class attribute in the training set,
build a model of a classifier, and then use that model on an ARFF file that comes from TextDirectoryLoader with the file name included.
I need to know which file each prediction comes from when I evaluate with -p
This means that I need to filter out that file name when I test against the model I built, since it only knows about the single string attribute.
So, I've trained a model on the ARFF file with just one string attribute and a class.
I've prepared a test arff file from TextDirectoryLoader, commenting out the class attribute and replacing the class it inserts from the directory name with a ?
So, in theory, this is ready for testing.
The difference is, as I mentioned, that the test file has the file name of the source of the string in it.
I tried using FilteredClassifier with loading the model with -l and the test arff with -T, then using -F with "weka.unsupervised.attribute.Remove -R 2"
but I get an exception telling me that -F is an illegal option.

I searched the list and came across a similar question where the poster was using FilteredClassifier to train the model. My model is built from my classifier directly, since I didn't need to filter out any attributes in the training set.
Is this what I have to do anyway, and just leave out the -F ?? The poster in that question was removing attributes from the training set as well.
Should I put a dummy attribute in the training set and then remove the second attribute training the model with FilteredClassifier?
or is there a way to do what I'm hoping for?

I also ask because it might become necessary to use StringToWordVector and I'd like to know how to filter out the filename attribute from that call as well.

I have to work on the command line because my server is headless, which could further complicate things.

Thanks in advance to anybody who might be able to help,
Brad


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Hopefully simple question

Eibe Frank-3
A you have indicated, a possible way to proceed is to add the directory name attribute to the training data as well and then use the FilteredClassifier in conjunction with the Remove filter to remove it before building (and applying) the "actual" classifier.

The contents of the directory name attribute in the training data would not matter because it would be removed anyway before the actual classifier is applied.

Another option might be to wrap your classifier into an InputMappedClassifier.

Cheers,
Eibe

On Sat, Aug 5, 2017 at 9:19 AM, Bradley E Harris <[hidden email]> wrote:

Hi, everyone.

This seems like the answer should be simple, but I can't seem to get it right.

My goal is to take a training and test set (in ARFF format) with one string attribute and a class attribute in the training set,
build a model of a classifier, and then use that model on an ARFF file that comes from TextDirectoryLoader with the file name included.
I need to know which file each prediction comes from when I evaluate with -p
This means that I need to filter out that file name when I test against the model I built, since it only knows about the single string attribute.
So, I've trained a model on the ARFF file with just one string attribute and a class.
I've prepared a test arff file from TextDirectoryLoader, commenting out the class attribute and replacing the class it inserts from the directory name with a ?
So, in theory, this is ready for testing.
The difference is, as I mentioned, that the test file has the file name of the source of the string in it.
I tried using FilteredClassifier with loading the model with -l and the test arff with -T, then using -F with "weka.unsupervised.attribute.Remove -R 2"
but I get an exception telling me that -F is an illegal option.

I searched the list and came across a similar question where the poster was using FilteredClassifier to train the model. My model is built from my classifier directly, since I didn't need to filter out any attributes in the training set.
Is this what I have to do anyway, and just leave out the -F ?? The poster in that question was removing attributes from the training set as well.
Should I put a dummy attribute in the training set and then remove the second attribute training the model with FilteredClassifier?
or is there a way to do what I'm hoping for?

I also ask because it might become necessary to use StringToWordVector and I'd like to know how to filter out the filename attribute from that call as well.

I have to work on the command line because my server is headless, which could further complicate things.

Thanks in advance to anybody who might be able to help,
Brad


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Hopefully simple question

Bradley E Harris

Hi, Eibe.
Thanks for the suggestion. I didn't think of the InputMappedClassifier! I've been using it with StringToWordVector ARFF files, but it never clicked that I could use it for this as well.

Thanks so much for the advice,
Brad

Inactive hide details for Eibe Frank ---08/06/2017 03:38:19 AM---A you have indicated, a possible way to proceed is to add the Eibe Frank ---08/06/2017 03:38:19 AM---A you have indicated, a possible way to proceed is to add the directory name attribute to the traini

From: Eibe Frank <[hidden email]>
To: "Weka machine learning workbench list." <[hidden email]>
Date: 08/06/2017 03:38 AM
Subject: Re: [Wekalist] Hopefully simple question
Sent by: [hidden email]





A you have indicated, a possible way to proceed is to add the directory name attribute to the training data as well and then use the FilteredClassifier in conjunction with the Remove filter to remove it before building (and applying) the "actual" classifier.

The contents of the directory name attribute in the training data would not matter because it would be removed anyway before the actual classifier is applied.

Another option might be to wrap your classifier into an InputMappedClassifier.

Cheers,
Eibe

On Sat, Aug 5, 2017 at 9:19 AM, Bradley E Harris <[hidden email]> wrote:
    Hi, everyone.

    This seems like the answer should be simple, but I can't seem to get it right.


    My goal is to take a training and test set (in ARFF format) with one string attribute and a class attribute in the training set,
    build a model of a classifier, and then use that model on an ARFF file that comes from TextDirectoryLoader with the file name included.
    I need to know which file each prediction comes from when I evaluate with -p
    This means that I need to filter out that file name when I test against the model I built, since it only knows about the single string attribute.
    So, I've trained a model on the ARFF file with just one string attribute and a class.
    I've prepared a test arff file from TextDirectoryLoader, commenting out the class attribute and replacing the class it inserts from the directory name with a ?
    So, in theory, this is ready for testing.
    The difference is, as I mentioned, that the test file has the file name of the source of the string in it.
    I tried using FilteredClassifier with loading the model with -l and the test arff with -T, then using -F with "weka.unsupervised.attribute.Remove -R 2"
    but I get an exception telling me that -F is an illegal option.


    I searched the list and came across a similar question where the poster was using FilteredClassifier to train the model. My model is built from my classifier directly, since I didn't need to filter out any attributes in the training set.
    Is this what I have to do anyway, and just leave out the -F ?? The poster in that question was removing attributes from the training set as well.
    Should I put a dummy attribute in the training set and then remove the second attribute training the model with FilteredClassifier?
    or is there a way to do what I'm hoping for?


    I also ask because it might become necessary to use StringToWordVector and I'd like to know how to filter out the filename attribute from that call as well.


    I have to work on the command line because my server is headless, which could further complicate things.


    Thanks in advance to anybody who might be able to help,
    Brad



    _______________________________________________
    Wekalist mailing list
    Send posts to: [hidden email]
    List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
    List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status:
https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette:
http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html




_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Loading...