Re: text to arff

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: text to arff

Ashraf M. Kibriya
As long as that text file represents a single instance/example, yes it
would work. If it contains multiple examples then you might want to
split it into multiple files or write a piece of your own code to
convert it into arff.

An arff containing text should look somewhat like this:

@relation <some name>

@attribute contents string
@attribute class {class1, class2, class3, ..}
@data
<some text string> , class1
<some other text string>, class2
.
.
.

You can also download and look at text datasets available in the
downloads section of WEKA page on sourceforge here:
http://sourceforge.net/projects/weka


Regards,
Ashraf



Vanessa Pacheco wrote:

> Hi,
>  
> I came across an e-mail from you Subject: RE: TextDirectoryToArff
> (Mena Badieh)
> <http://news.gmane.org/find-root.php?message_id=%3c400C7B9F.7070503%40cs.waikato.ac.nz%3e> (http://article.gmane.org/gmane.comp.ai.weka/1943/match=textdirectorytoarff)
>  
> I am trying to train datasets with a huge number of dimensions
> (attributes) using WEKA.
>  
> The first task is to create an arff file from a text file (not a
> directory). Would the program listed do this job for me ?
>  
> Please could you help
>  
> Thank you,
> ~Vanessa
>  



_______________________________________________
Wekalist mailing list
[hidden email]
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Fwd: text to arff

Ashraf M. Kibriya
Hi,
Well I'm not an expert on JBuilder, and depending on which version you
are using you might need to mount the weka.jar file in the JBuilder
filesystem inorder for JBuilder to use/recognise classes in weka.

I'm also not much of an expert on the CSV file format, so can't offer
much advise regarding that either. You might have already noticed that
your text documents can also contain comma's and tabs, that is the
characters that are used as delimiters in CSV files. I don't really know
how CSV format deals with such characters but I'm guessing that you
would probably have to enclose text strings in " " or in ' ' inorder for
it to be processed properly (WEKA seems to be prefering single quotes ' ').


Hope this helps.

Regards,
Ashraf

Vanessa Pacheco wrote:

>
>
> ---------- Forwarded message ----------
> From: *Vanessa Pacheco* <[hidden email]
> <mailto:[hidden email]>>
> Date: Mar 15, 2006 3:42 PM
> Subject: Re: text to arff
> To: "Ashraf M. Kibriya" <[hidden email] <mailto:[hidden email]>>
>
> Hi Ashraf,
>  
> Thanks for the quick reply. I have another Q.....
>  
> I created a new Java project within JBuilder, then copied and pasted
> your code. However, it cannot find the Instances, vector class.
>  
> Since we use import weka.core.*, do i need to put the weka.jar file in
> the library folder, for the compiler to recognize the package ?
>  
> Also, I found the following command useful.
> *java -classpath /usr/local/weka-3-4-7/weka.jar
> weka.core.converters.CSVLoader iris.csv > iris.arff*
> to create a .arff file from a csv file. The only problem I faced was
> creating a csv file from a dat file. It didn't do this properly. Would
> you have any suggesstions ???
>  
> Thanks
> ~Vanessa



_______________________________________________
Wekalist mailing list
[hidden email]
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: text to arff

Yasmina
In reply to this post by Ashraf M. Kibriya
Yasmin wrote
Hello,

I'm working on text categorization, I know how to convert a directory into Arff format, I want to know how to convert a text file into arff format ..

Thanks in advance
Ashraf M. Kibriya wrote
As long as that text file represents a single instance/example, yes it
would work. If it contains multiple examples then you might want to
split it into multiple files or write a piece of your own code to
convert it into arff.

An arff containing text should look somewhat like this:

@relation <some name>

@attribute contents string
@attribute class {class1, class2, class3, ..}
@data
<some text string> , class1
<some other text string>, class2
.
.
.

You can also download and look at text datasets available in the
downloads section of WEKA page on sourceforge here:
http://sourceforge.net/projects/weka


Regards,
Ashraf



Vanessa Pacheco wrote:

> Hi,
>  
> I came across an e-mail from you Subject: RE: TextDirectoryToArff
> (Mena Badieh)
> <http://news.gmane.org/find-root.php?message_id=%3c400C7B9F.7070503%40cs.waikato.ac.nz%3e> (http://article.gmane.org/gmane.comp.ai.weka/1943/match=textdirectorytoarff)
>  
> I am trying to train datasets with a huge number of dimensions
> (attributes) using WEKA.
>  
> The first task is to create an arff file from a text file (not a
> directory). Would the program listed do this job for me ?
>  
> Please could you help
>  
> Thank you,
> ~Vanessa
>  



_______________________________________________
Wekalist mailing list
Wekalist@list.scms.waikato.ac.nz
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: text to arff

amine ameur
Hi all
You must used the  text directory loader  :
1.Run Weka-------> Explorer
2. in Preprocess  panel-----> open file
3. go to your directory (directory of the text files) ,with  subdirectories (set of textes in directory)
4. and clic open
5.you see the message ("cannot determine the file loader automatically......).Clic Ok
6. In the next panel clic choose   Button  and select "TextDirectoryLoader"
7. done

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: text to arff

Yasmina
Yasmin wrote
Thanks too much,
It works, but the file is empty :(, It didn't consider the data !!!

do you have a solution ??

Best Regards
Yasmin
amine ameur wrote
>
> Hi all
>
You must used the  text directory loader  :
1.Run Weka-------> *Explorer*
2. in <http://2.in> Preprocess  panel-----> *open file*
3. go to your directory (directory of the text files) ,with  *subdirectories
* (set of textes in directory)
4. and clic open
5.you see the message ("*cannot determine **the file loader
automatically......)***.Clic Ok
6. In the next panel clic choose   Button  and select "*TextDirectoryLoader*
"
7. done

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: text to arff

amine ameur
you must put the textuel data in the file (evry text in file)and every set of textuel files in subdirectory, and weka consider the names of subdirectories as classes

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: text to arff

Yasmina
<qoute author="Yasmin">
Yes I did that for the training data, but now I want to test the classifier on a txt file, It's like span detection !!!
So there is no way to convert single file into ARFF, or I'm in the wrong way ??

Thanks
Yasmin
</qoute>
amine ameur wrote
you must put the textuel data in the file (evry text in file)and every set
of textuel files in subdirectory, and weka consider the names of
subdirectories as classes

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@list.scms.waikato.ac.nz
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: text to arff

amine ameur


You can load one file by text directory loader and after that in the preprocess panel clic save .and you can latter used  the saved file  as arff

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: text to arff

Yasmin Anwar
Thanks woo much amine for yr replies,
It works by putting the txt file into directory and upload the directory that contains the text files...
I'll move to the second problem, how to predict the new instances :-) ; as the classifier don't accept the data which doesn't match !!!




Best Regards
Yasmina


On Fri, Feb 25, 2011 at 5:20 PM, amine ameur <[hidden email]> wrote:


You can load one file by text directory loader and after that in the preprocess panel clic save .and you can latter used  the saved file  as arff

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html






_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: text to arff

iqraameer133
This post has NOT been accepted by the mailing list yet.
In reply to this post by amine ameur
Hope you are doing great.
I have a problem in my ARFF file, I created ARFF as you suggested through "TextDirectoryLoader" in weka. I have two sub-directors 1. MALE 2. FEMALE as I have two classes. But in my ARFF there is only one class (MALE) at the end of instances. Although, I have text files labled both MALE and FEMALE in my directories. Could you please guide me where I am Wrong?
I look forward to hearing from you soon.
Thank you so much.
Loading...