Select records from an ARFF file

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Select records from an ARFF file

Gaetano
Hello i have a dataset with this structure:

@relation temp
@attribute idsito numeric
@attribute data DATE "yyyy-MM-dd"
@attribute x numeric
@attribute anno numeric

I would like to query the arff file based on the attribute "data" (eg, select all tuples with data<'2005-01-01') and save the query result to another file. How can I do this?
Reply | Threaded
Open this post in threaded view
|

Re: Select records from an ARFF file

Gaetano
Starting from a date in input, I select the instance of the file containing that date. The instance will include the test set. While the training set consists of instances that make up 30 days prior to the "date" set in the test set. How can I implement it?
Reply | Threaded
Open this post in threaded view
|

Re: Select records from an ARFF file

Eibe Frank-2
Administrator
Example for filtering instances by date

Input data (stored in a file called test.arff):

@relation test

@attribute D date dd/MM/yyyy

@data
18/02/2014
19/02/2014
20/02/2014

Command-line WEKA command:

java -cp ~/weka-3-8-1/weka.jar weka.Run .SubsetByExpression -E 'a1<=java("weka.core.Utils","double dateToMillis(String,String)","19/02/2014","dd/MM/yyyy")' -V < test.arff

Output:

@relation 'test-weka.filters.unsupervised.instance.SubsetByExpression-Ea1<=java(\"weka.core.Utils\",\"double dateToMillis(String,String)\",\"19/02/2014\",\"dd/MM/yyyy\")'

@attribute D date dd/MM/yyyy

@data

18/02/2014
19/02/2014

Cheers,
Eibe

> On 5/05/2017, at 7:32 AM, Gaetano <[hidden email]> wrote:
>
> Starting from a date in input, I select the instance of the file containing
> that date. The instance will include the test set. While the training set
> consists of instances that make up 30 days prior to the "date" set in the
> test set. How can I implement it?
>
>
>
> --
> View this message in context: http://weka.8497.n7.nabble.com/Select-records-from-an-ARFF-file-tp40485p40486.html
> Sent from the WEKA mailing list archive at Nabble.com.
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Select records from an ARFF file

Gaetano
Hi Eibe
In which directory i must include test.arff?
Thanks
Reply | Threaded
Open this post in threaded view
|

Re: Select records from an ARFF file

Eibe Frank-2
Administrator
That’s arbitrary, you just need to provide the correct path to the file. In my case, test.arff was in the current folder, so I did not need to provide a path.

Note that this command also works in the GUIs, e.g., the Preprocess panel of the Explorer. You just need to configure the parameters of SubsetByExpression appropriately.

Cheers,
Eibe

> On 8/05/2017, at 2:18 AM, Gaetano <[hidden email]> wrote:
>
> Hi Eibe
> In which directory i must include test.arff?
> Thanks
>
>
>
> --
> View this message in context: http://weka.8497.n7.nabble.com/Select-records-from-an-ARFF-file-tp40485p40515.html
> Sent from the WEKA mailing list archive at Nabble.com.
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Select records from an ARFF file

Gaetano
Ok thanks.
How do i do to convert the same command in java using weka api?

I tried this:

public static void main(String[] args) throws Exception {
   DataSource source = new DataSource("C:/Users/user/Documenti/pv_italy(ordered+removeAtt1).arff");
   Instances dataset = source.getDataSet();
   String[] options = new String[2];
   options[0]="-E";
   options[1]="2";
   SubsetByExpression filter = new SubsetByExpression();
   String expression = "ATT2<=2013-05-05";
   filter.setOptions(options);
   filter.setInputFormat(dataset);
   filter.setExpression(expression);
   Instances newData = SubsetByExpression.useFilter(dataset, filter);
   System.out.println(newData);
}

But the number of instances is empty
Reply | Threaded
Open this post in threaded view
|

Re: Select records from an ARFF file

Eibe Frank-2
Administrator
Try something like this:

import weka.core.Instances;
import weka.core.converters.ConverterUtils.DataSource;
import weka.filters.unsupervised.instance.SubsetByExpression;

public class Test {
 
   public static void main(String[] args) throws Exception {
       DataSource source = new DataSource("test.arff");
       Instances dataset = source.getDataSet();
       String[] options = new String[2];
       options[0]="-E";
       options[1]="ATT1<=java(\"weka.core.Utils\",\"double dateToMillis(String,String)\",\"19/02/2014\",\"dd/MM/yyyy\")";
       SubsetByExpression filter = new SubsetByExpression();
       filter.setOptions(options);
       filter.setInputFormat(dataset);
       Instances newData = SubsetByExpression.useFilter(dataset, filter);
       System.out.println(newData);
   }
 }

The backslashes are used to escape the quotation marks.

Cheers,
Eibe

> On 9 May 2017, at 20:56, Gaetano <[hidden email]> wrote:
>
> Ok thanks.
> How do i do to convert the same command in java using weka api?
>
> I tried this:
>
> public static void main(String[] args) throws Exception {
>   DataSource source = new
> DataSource("C:/Users/user/Documenti/pv_italy(ordered+removeAtt1).arff");
>   Instances dataset = source.getDataSet();
>   String[] options = new String[2];
>   options[0]="-E";
>   options[1]="2";
>   SubsetByExpression filter = new SubsetByExpression();
>   String expression = "ATT2<=2013-05-05";
>   filter.setOptions(options);
>   filter.setInputFormat(dataset);
>   filter.setExpression(expression);
>   Instances newData = SubsetByExpression.useFilter(dataset, filter);
>   System.out.println(newData);
> }
>
> But the number of instances is empty
>
>
>
> --
> View this message in context: http://weka.8497.n7.nabble.com/Select-records-from-an-ARFF-file-tp40485p40548.html
> Sent from the WEKA mailing list archive at Nabble.com.
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Select records from an ARFF file

Gaetano
A last question.
Since i should pass a Date variable in input to function DateToMillis, How should I modify the following line:
options1[1]="ATT2<java(\"weka.core.Utils\",\"double dateToMillis(String,String)\",\"2013-05-05\",\"yyyy-MM-dd\")";

My code is:
public static void main(String[] args) throws Exception {
                String s;
               do {
                   //si procura la data sotto forma di una stringa nel formato SHORT
                   System.out.println("Inserisci la data [yyyy-MM-dd]: ");
                   Scanner in = new Scanner(System.in);
                   s = in.nextLine();
                   try{
                       //converte la stringa della data in un oggetto di classe Date
                       SimpleDateFormat formato = new SimpleDateFormat("yyyy-MM-dd");
                       Date d = formato.parse(s);
                       //System.out.println("OUTPUT: " + formato.format(d));
                       break; //esce dal ciclo      
                   } catch (ParseException e) {
                       System.out.println("Formato data non valido.");
                   }
               }while(true);
               DataSource source = new DataSource("C:/Users/user/Documenti/pv_italy(ordered+removeAtt1).arff");
           Instances dataset = source.getDataSet();
           String[] options = new String[2];
           options[0]="-E";
           options[1]="ATT2=java(\"weka.core.Utils\",\"double dateToMillis(Date,String)\",\"formato.format(d)\",\"yyyy-MM-dd\")";
           SubsetByExpression filter = new SubsetByExpression();
           filter.setOptions(options);
           filter.setInputFormat(dataset);
           Instances testSet = SubsetByExpression.useFilter(dataset, filter);
           System.out.println(testSet);
           ArffSaver saver = new ArffSaver();
           saver.setInstances(testSet);
           saver.setFile(new File("C:/Users/user/Documenti/testSet.arff"));
           saver.writeBatch();

I tried this but that gives me error:
weka.core.expressionlanguage.common.JavaMacro$InvalidSignature: Invalid function signature in java macro (Expected type, got 'Date' instead)

Thanks and excuse me again
Cheers
Reply | Threaded
Open this post in threaded view
|

Re: Select records from an ARFF file

Eibe Frank-2
Administrator
Maybe try something like this:

"ATT2=java(\"weka.core.Utils\",\”double dateToMillis(Date,String)\",\”” + formato.format(d).toString() + "\",\"yyyy-MM-dd\")”;

Cheers,
Eibe


> On 10/05/2017, at 4:10 AM, Gaetano <[hidden email]> wrote:
>
> A last question.
> Since i should pass a Date variable in input to function DateToMillis, How
> should I modify the following line:
> options1[1]="ATT2<java(\"weka.core.Utils\",\"double
> dateToMillis(String,String)\",\"2013-05-05\",\"yyyy-MM-dd\")";
>
> My code is:
> public static void main(String[] args) throws Exception {
> String s;
>       do {
>           //si procura la data sotto forma di una stringa nel formato
> SHORT
>           System.out.println("Inserisci la data [yyyy-MM-dd]: ");
>           Scanner in = new Scanner(System.in);
>           s = in.nextLine();
>           try{
>               //converte la stringa della data in un oggetto di classe
> Date
>               SimpleDateFormat formato = new
> SimpleDateFormat("yyyy-MM-dd");
>               Date d = formato.parse(s);
>               //System.out.println("OUTPUT: " + formato.format(d));
>               break; //esce dal ciclo      
>           } catch (ParseException e) {
>               System.out.println("Formato data non valido.");
>           }
>       }while(true);
>       DataSource source = new
> DataSource("C:/Users/user/Documenti/pv_italy(ordered+removeAtt1).arff");
>           Instances dataset = source.getDataSet();
>           String[] options = new String[2];
>           options[0]="-E";
>           options[1]="ATT2=java(\"weka.core.Utils\",\"double
> dateToMillis(Date,String)\",\"formato.format(d)\",\"yyyy-MM-dd\")";
>           SubsetByExpression filter = new SubsetByExpression();
>           filter.setOptions(options);
>           filter.setInputFormat(dataset);
>           Instances testSet = SubsetByExpression.useFilter(dataset,
> filter);
>           System.out.println(testSet);
>           ArffSaver saver = new ArffSaver();
>           saver.setInstances(testSet);
>           saver.setFile(new File("C:/Users/user/Documenti/testSet.arff"));
>           saver.writeBatch();
>
> I tried this but that gives me error:
> weka.core.expressionlanguage.common.JavaMacro$InvalidSignature: Invalid
> function signature in java macro (Expected type, got 'Date' instead)
>
> Thanks and excuse me again
> Cheers
>
>
>
> --
> View this message in context: http://weka.8497.n7.nabble.com/Select-records-from-an-ARFF-file-tp40485p40556.html
> Sent from the WEKA mailing list archive at Nabble.com.
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html