Single quotes in ARFF file

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Single quotes in ARFF file

hugoejara7
I'm building a classifier model in Weka using Multilayer Perceptron.
My original data is in csv format and I use Weka to convert my files from
csv to ARFF.
However, I've noticed that Weka adds single quotes in  strings with spaces
or other special characters, for instance:

CSV File:
  Brown Rice
ARFF file
 'Brown Rice'

I've read in Weka documentation that this is how it handles spaces or
special characters
()
I´m not sure if the classifiers gets confused by this, if for the model
'Rice' would be different from Rice when handling with new data.

Does anybody know if the single quotes affects the performance or results of
the classifier?
Is it posible to add single quotes to all strings in my ARFF file even if
they don't have spaces?

Thanks in advance



--
Sent from: https://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Single quotes in ARFF file

Peter Reutemann
> I'm building a classifier model in Weka using Multilayer Perceptron.
> My original data is in csv format and I use Weka to convert my files from
> csv to ARFF.
> However, I've noticed that Weka adds single quotes in  strings with spaces
> or other special characters, for instance:
>
> CSV File:
>   Brown Rice
> ARFF file
>  'Brown Rice'
>
> I've read in Weka documentation that this is how it handles spaces or
> special characters
> ()
> I´m not sure if the classifiers gets confused by this, if for the model
> 'Rice' would be different from Rice when handling with new data.
>
> Does anybody know if the single quotes affects the performance or results of
> the classifier?
> Is it posible to add single quotes to all strings in my ARFF file even if
> they don't have spaces?

Nominal values or strings only need to be quoted (single or double) if
they contain a blank (or special character):
https://waikato.github.io/weka-wiki/formats_and_processing/arff_stable/

This quoting is necessary for parsing the ARFF file. But these (outer)
quotes won't confuse your classifier as they get discarded.
For example:
  'Rice'
will become
  Rice

However
  '\"Rice\"'
will become
  "Rice"
as only the outer quotes get discarded (and the inner ones un-backquoted).

Here is the method that does the quoting (appropriately name "quote"):
https://github.com/Waikato/weka-3.8/blob/master/weka/src/main/java/weka/core/Utils.java#L691

Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 858-5174
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html