Class index is negative

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Class index is negative

JP de Vooght
Hello,
I am still new with Weka and started to go through the MOOC while
experimenting on the Kaggle Titanic data set.

After applying ZeroR, I am trying an RF classification eliminating a few
variables from the training set.
Once trained, I choose "supplied test set" from the test options, select
the test.csv file and indicate it has no class. Under more, I request a
PlainText output as well.
Then I right click on my RF result and ask Weka to use the supplied test
set, but it returns right away complaining about a negative class
index... which is not part of the test.csv file.

I was trying to reproduce the steps outlined in
http://www.cc.uah.es/drg/courses/datamining/ClassifyingNewDataWeka.pdf

What have I missed? Should I create a dummy attribute? When I do, it's
treated as string and yields another kind of error...
TIA
JP

PS: I process the Kaggle train.csv to have a suitable class attribute
and select manually the class from the GUI
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Class index is negative

Peter Reutemann
> I am still new with Weka and started to go through the MOOC while
> experimenting on the Kaggle Titanic data set.
>
> After applying ZeroR, I am trying an RF classification eliminating a few
> variables from the training set.
> Once trained, I choose "supplied test set" from the test options, select the
> test.csv file and indicate it has no class. Under more, I request a
> PlainText output as well.
> Then I right click on my RF result and ask Weka to use the supplied test
> set, but it returns right away complaining about a negative class index...
> which is not part of the test.csv file.
>
> I was trying to reproduce the steps outlined in
> http://www.cc.uah.es/drg/courses/datamining/ClassifyingNewDataWeka.pdf
>
> What have I missed? Should I create a dummy attribute? When I do, it's
> treated as string and yields another kind of error...
> TIA
> JP
>
> PS: I process the Kaggle train.csv to have a suitable class attribute and
> select manually the class from the GUI

The test set (or dataset to make predictions with) has to have the
same data structure, i.e., also a class attribute. The class attribute
can have missing values or just dummy categorical values - they get
removed before making a predictions anyway.

To avoid the problem of incompatible datasets, simply use ARFF files,
as they define the attribute type in the header. This is not possible
with CSV files, as their attribute types get determine on the fly
based on the values encountered.

Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 858-5174
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Class index is negative

JP de Vooght
roger that.

Saved the files as ARFF and added the class attribute and '?' to every
instance in the test set.

Worked as expected from there on.

Thanks Peter!


On 02/08/2017 08:41 PM, Peter Reutemann wrote:

>> I am still new with Weka and started to go through the MOOC while
>> experimenting on the Kaggle Titanic data set.
>>
>> After applying ZeroR, I am trying an RF classification eliminating a few
>> variables from the training set.
>> Once trained, I choose "supplied test set" from the test options, select the
>> test.csv file and indicate it has no class. Under more, I request a
>> PlainText output as well.
>> Then I right click on my RF result and ask Weka to use the supplied test
>> set, but it returns right away complaining about a negative class index...
>> which is not part of the test.csv file.
>>
>> I was trying to reproduce the steps outlined in
>> http://www.cc.uah.es/drg/courses/datamining/ClassifyingNewDataWeka.pdf
>>
>> What have I missed? Should I create a dummy attribute? When I do, it's
>> treated as string and yields another kind of error...
>> TIA
>> JP
>>
>> PS: I process the Kaggle train.csv to have a suitable class attribute and
>> select manually the class from the GUI
> The test set (or dataset to make predictions with) has to have the
> same data structure, i.e., also a class attribute. The class attribute
> can have missing values or just dummy categorical values - they get
> removed before making a predictions anyway.
>
> To avoid the problem of incompatible datasets, simply use ARFF files,
> as they define the attribute type in the header. This is not possible
> with CSV files, as their attribute types get determine on the fly
> based on the values encountered.
>
> Cheers, Peter

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html