Use of Classification Algorithms, some doubts.

classic Classic list List threaded Threaded
19 messages Options
JC
Reply | Threaded
Open this post in threaded view
|

Use of Classification Algorithms, some doubts.

JC
Hi everyone, in my free time I'm trying to learn WEKA using Java. I have been
working in a project since a few months but I'm having some doubts...

The *structure of my dataset* is the following:

@attribute 'ID Patient' numeric
@attribute 'Date' {'2019-04-11 11:09:23','2019-04-11 11:08:52','2019-04-02
11:17:15','2019-04-02 10:38:30'}
@attribute Temp numeric
@attribute 'SPO2 Min' numeric
@attribute 'SPO2 Max' numeric
@attribute 'BPM Min' numeric
@attribute 'BPM Max' numeric
@attribute 'BPM Avg' numeric
@attribute SYS numeric
@attribute DIA numeric
@attribute 'EDA Min' numeric
@attribute 'EDA Max' numeric
@attribute 'EDA Avg' numeric
@attribute 'Disease' {'IRA', 'ITU', 'PB'}

*ID Patient* is the identificator for the patient, the next ones are the
*attributes of the patient* and the last one, *Disease*, is the Disease of
the patient.
Well, I would like to introduce some patient and try to predict the disease
that this patient would devolop.

So, *my class Index has to be ID Patient, right?*

And if I only want to predict the disease using a RandomTree, would I have
to pre process the data using InfoGainAttributeEval or attributes
evaluators?

And the last doubt, *when I use RandomTree I get a subtree all time like
this, only with date and disease, what could be wrong?* :

|   |   |   Date = 26-Aug-2020 08:13:00 : IRA (0/0)
|   |   |   Date = 12-Aug-2020 07:50:00 : IRA (0/0)
|   |   |   Date = 22-Aug-2020 10:24:00 : IRA (0/0)
|   |   |   Date = 01-Sep-2019 13:45:00 : PB (1/0)
|   |   |   Date = 01-Sep-2019 13:48:00 : PB (1/0)
|   |   |   Date = 03-Sep-2019 12:38:00 : PB (1/0)

Thanks a lot in advance!



--
Sent from: https://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Use of Classification Algorithms, some doubts.

Michael Hall


> On Nov 14, 2019, at 4:47 AM, JC <[hidden email]> wrote:
>
> Hi everyone, in my free time I'm trying to learn WEKA using Java. I have been
> working in a project since a few months but I'm having some doubts...
>
> The *structure of my dataset* is the following:
>
> @attribute 'ID Patient' numeric
> @attribute 'Date' {'2019-04-11 11:09:23','2019-04-11 11:08:52','2019-04-02
> 11:17:15','2019-04-02 10:38:30'}
> @attribute Temp numeric
> @attribute 'SPO2 Min' numeric
> @attribute 'SPO2 Max' numeric
> @attribute 'BPM Min' numeric
> @attribute 'BPM Max' numeric
> @attribute 'BPM Avg' numeric
> @attribute SYS numeric
> @attribute DIA numeric
> @attribute 'EDA Min' numeric
> @attribute 'EDA Max' numeric
> @attribute 'EDA Avg' numeric
> @attribute 'Disease' {'IRA', 'ITU', 'PB'}
>
> *ID Patient* is the identificator for the patient, the next ones are the
> *attributes of the patient* and the last one, *Disease*, is the Disease of
> the patient.
> Well, I would like to introduce some patient and try to predict the disease
> that this patient would devolop.
>
> So, *my class Index has to be ID Patient, right?*

I don’t think you probably want to include a patient id for training or prediction. If it is more or less a random unique number assigned to the patient it has no predictive power right? However, a classifier might say ok when I see this id it means the patient always gets this disease. It will overfit. You might want to read up on overfitting. If the id is present in test it will base classification on that id which you don’t want. If it isn’t your model trained using id will simply perform poorly.
 

>
> And if I only want to predict the disease using a RandomTree, would I have
> to pre process the data using InfoGainAttributeEval or attributes
> evaluators?
>
> And the last doubt, *when I use RandomTree I get a subtree all time like
> this, only with date and disease, what could be wrong?* :
>
> |   |   |   Date = 26-Aug-2020 08:13:00 : IRA (0/0)
> |   |   |   Date = 12-Aug-2020 07:50:00 : IRA (0/0)
> |   |   |   Date = 22-Aug-2020 10:24:00 : IRA (0/0)
> |   |   |   Date = 01-Sep-2019 13:45:00 : PB (1/0)
> |   |   |   Date = 01-Sep-2019 13:48:00 : PB (1/0)
> |   |   |   Date = 03-Sep-2019 12:38:00 : PB (1/0)

For the same reason I would suggest you omit time from your date-time stamp. It will lead to overfitting. Also, time stamps lead to time series which is usually a different type of classification I think. So I might suggest for sure omit the times and possibly omit the entire date field unless you want to actually get into time series. I would do more reading there and also the advanced Weka MOOC’s could be useful.

If I understood correctly that you are discretizing the disease, why?


_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
JC
Reply | Threaded
Open this post in threaded view
|

Re: Use of Classification Algorithms, some doubts.

JC
This post was updated on .
Hi Michael, thank you for your response.

So the options are delete date from the attributes or use an classifcation
algorithm using time series. Some recommendation of some classification
algorithm using time series?

My data has samples of different pacients, for example 8 samples of patient
1, I would like to study his 8 samples against the trained model and date
would be important here of course.

Thanks again!!!



--
Sent from: https://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list -- wekalist@list.waikato.ac.nz
Send posts to: To unsubscribe send an email to wekalist-leave@list.waikato.ac.nz
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Use of Classification Algorithms, some doubts.

Michael Hall


> On Nov 14, 2019, at 6:29 AM, JC <[hidden email]> wrote:
>
> Hi Michael, thank you for your response.
>
> So the options are delete date from the attributes or use an classifcation
> algorithm using time series. Some recommendation of some classification
> algorithm using time series?
>
> My data has samples of different pacients, for example 8 samples of patient
> 1, I would like to study his 8 samples against the trained model and date
> would be important here of course.
>
> Thanks again!
>

So, the id sort of provides a grouping for a set of tests. Then it seems like it would sort of be a question of whether the classifier learns for this grouping of tests what is the predicted outcome. Or, still overfitting, does it learn for this particular id with this set of tests what the likely outcome will be. You might do some testing with and without the id to see which way gives you training that best holds up against test. A serious drop in accuracy from training to test might still indicate that this feature is just causing over fitting.

I haven’t done much with time series myself. As I said it was covered in the Weka advanced MOOC. Simply google search on “weka time series” could give you better things to look at than I can answer.

On the time. You could maybe split that out to a separate attribute if you think it might have some bearing. Maybe something nominal like early/late morning/afternoon/evening.

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
JC
Reply | Threaded
Open this post in threaded view
|

Re: Use of Classification Algorithms, some doubts.

JC
Thanks Michael again, I will leave the time series for later.

*Now I am going to focus on making a simple prediction of the data*. I am
dividing by train 90% and test 10% and I get these results:

=== Detailed Accuracy By Class ===

TP Rate  FP Rate  Precision  Recall   F-Measure  MCC      ROC Area  PRC Area
Class
          ?        0,290    0,000      ?           ?                  ?                    
?         ?             IRA
          ?        0,710    0,000      ?           ?                  ?                    
?         ?             ITU
       0,000    ?             ?          0,000     ?                  ?                    
?         1,000      PB
Weighted Avg. 0,000 ? ?       0,000      ?                  ?                    
?         1,000    

=== Confusion matrix ===
   a   b   c   <-- classified as
   0   0   0 |   a = IRA
   0   0   0 |   b = ITU
 164 401   0 |   c = PB

** Random Tree Evaluation with Datasets **

Correctly Classified Instances           0                0      %
Incorrectly Classified Instances       565              100      %
Kappa statistic                          0    
Mean absolute error                      0.5678
Root mean squared error                  0.6405
Relative absolute error                105.2136 %
Root relative squared error            111.7653 %
Total Number of Instances              565    

*How can it be possible?*



--
Sent from: https://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Use of Classification Algorithms, some doubts.

Michael Hall


>
>
> === Confusion matrix ===
>   a   b   c   <-- classified as
>   0   0   0 |   a = IRA
>   0   0   0 |   b = ITU
> 164 401   0 |   c = PB
>

0% accuracy isn’t something I’ve run into a lot. 33% accuracy should be possible randomly guessing.
It looks like all you have is PB. For some reason it doesn’t classify any of those PB’s correctly but as one of the other two.
If I’m understanding the confusion matrix correctly.
Off-hand I’m not sure how that would be possible. I think there are ways to look at the decision tree as you showed earlier.
Are there any hints there?


_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Use of Classification Algorithms, some doubts.

Eibe Frank-2
Administrator
It’s certainly possible. Most likely, the training data only contains IRA and ITU cases, and the test data only contains PB cases.

Cheers,
Eibe

> On 15/11/2019, at 6:10 AM, Michael Hall <[hidden email]> wrote:
>
>
>
>>
>>
>> === Confusion matrix ===
>>  a   b   c   <-- classified as
>>  0   0   0 |   a = IRA
>>  0   0   0 |   b = ITU
>> 164 401   0 |   c = PB
>>
>
> 0% accuracy isn’t something I’ve run into a lot. 33% accuracy should be possible randomly guessing.
> It looks like all you have is PB. For some reason it doesn’t classify any of those PB’s correctly but as one of the other two.
> If I’m understanding the confusion matrix correctly.
> Off-hand I’m not sure how that would be possible. I think there are ways to look at the decision tree as you showed earlier.
> Are there any hints there?
>
>
> _______________________________________________
> Wekalist mailing list -- [hidden email]
> Send posts to: To unsubscribe send an email to [hidden email]
> To subscribe, unsubscribe, etc., visit
> https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
JC
Reply | Threaded
Open this post in threaded view
|

Re: Use of Classification Algorithms, some doubts.

JC
In reply to this post by Michael Hall
Thank you Michael and Eibe for your help!

Have to be that, maybe the algorithm have not enought instances classified
as PB.

*Can I split data instances in 75% train and 25% test picking the instances
randomly?* I am doing this in this way:

                int trainSize = (int) Math.round(trainingSet.numInstances() * 0.75);
                int testSize = trainingSet.numInstances() - trainSize;
                Instances train = new Instances(trainingSet, 0, trainSize);
                Instances test = new Instances(trainingSet, trainSize, testSize);

But in this way I am picking the first 75% instances for train and the last
25% for test.

And the last question, *is there any way to cancel the attribute date
without deleting it?* To make the algorithm not evaluate it but remain in
the dataset.

Thank you in advance!



--
Sent from: https://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Use of Classification Algorithms, some doubts.

Michael Hall


On Nov 15, 2019, at 4:20 AM, JC <[hidden email]> wrote:

Thank you Michael and Eibe for your help!

Have to be that, maybe the algorithm have not enought instances classified
as PB.

*Can I split data instances in 75% train and 25% test picking the instances
randomly?* I am doing this in this way:

int trainSize = (int) Math.round(trainingSet.numInstances() * 0.75);
int testSize = trainingSet.numInstances() - trainSize;
Instances train = new Instances(trainingSet, 0, trainSize);
Instances test = new Instances(trainingSet, trainSize, testSize);

But in this way I am picking the first 75% instances for train and the last
25% for test.

I believe random is pretty much how Weka Explorer does it as indicated here…

This appears to be an example of how you could do it from java…

The method used appears to still be present…

showc weka.core.Utils | grep splitOptions
System.in:49:   public static java.lang.String[]   splitOptions(java.lang.String) throws java.lang.Exception;
System.in:50:   public static java.lang.String[]   splitOptions(java.lang.String, java.lang.String[], char[]) throws java.lang.Exception;


This shows how you could do it ahead of time with Explorer and save the datasets…


And the last question, *is there any way to cancel the attribute date
without deleting it?* To make the algorithm not evaluate it but remain in
the dataset.

I think the remove filter is what you would want. Some examples of using that…



_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
JC
Reply | Threaded
Open this post in threaded view
|

Re: Use of Classification Algorithms, some doubts.

JC
Thank you Michael again, you are helping me a lot!

I am using this method to randomize the dataset of instances and working
fine:

        trainingSet.randomize(new Random(1));

And *trying to remove the attribute Date* I'm doing this:

                Remove rm = new Remove();
                rm.setAttributeIndices("Date");
                rm.setInputFormat(trainingSet);

But in the last line I get the following error:

        java.lang.IllegalArgumentException: Invalid range list at Date
        at weka.core.Range.setFlags(Range.java:319)
        at weka.core.Range.setUpper(Range.java:91)
        at
weka.filters.unsupervised.attribute.Remove.setInputFormat(Remove.java:203)

Why could be happening this?


And I think this is my last question  sorry because I know that I am asking
many questions ... *If I use a FilteredClassifier using the classifier
RandomTree can I modify the MaxDepht? *
If I use a normal RandomTree without FilteredClassifier I can use it but if
I try it with FilteredClassifier I can't see the method setMaxDepht.

Thank you in advance!



--
Sent from: https://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Use of Classification Algorithms, some doubts.

Michael Hall


On Nov 15, 2019, at 9:16 AM, JC <[hidden email]> wrote:

Thank you Michael again, you are helping me a lot!

I am using this method to randomize the dataset of instances and working
fine:

trainingSet.randomize(new Random(1));

And *trying to remove the attribute Date* I'm doing this:

               Remove rm = new Remove();
rm.setAttributeIndices("Date");
rm.setInputFormat(trainingSet);

But in the last line I get the following error:

       java.lang.IllegalArgumentException: Invalid range list at Date
at weka.core.Range.setFlags(Range.java:319)
at weka.core.Range.setUpper(Range.java:91)
at
weka.filters.unsupervised.attribute.Remove.setInputFormat(Remove.java:203)

Why could be happening this?

I don’t believe the attribute name is a valid index.
You might want to refer to the java doc

With…
rangeList - a string representing the list of attributes. Since the string will typically come from a user, attributes are indexed from 1. 
eg: first-3,5,6-last

“first” and “last” I think might be the only valid nonnumeric range strings. With “-“ to indicate ranges.
An example where I used it…

  Remove rm = new Remove();
  rm.setAttributeIndices("last");
  rm.setInputFormat(instances);
  instances = Filter.useFilter(instances, rm);




And I think this is my last question  sorry because I know that I am asking
many questions ... *If I use a FilteredClassifier using the classifier
RandomTree can I modify the MaxDepht? *
If I use a normal RandomTree without FilteredClassifier I can use it but if
I try it with FilteredClassifier I can't see the method setMaxDepht.

I think you are looking at one of the examples.
As shown above I think you can use a filter without getting into FilteredClassifier.
However, you can modify the parameters for the RandomTree classifier then you add that classifier and the filter to the FilteredClassifier. You would need to look closer at the example. It should show that I think.


Thank you in advance!


Sure.


_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
JC
Reply | Threaded
Open this post in threaded view
|

Re: Use of Classification Algorithms, some doubts.

JC
This post was updated on .
Thanks a lot again Michael :D

Nice, is right first modify the RandomTree and then I use it with
FilteredClassifier. I am using this FilteredClassifier because I am
modifying the weight of the attributes to see what attributes have more
impact on the prediction.

In the other hand, I have to investigate a little more to remove the
attribute data.

Last last last doubt  Right now I am splitting the data in 90% to training
and 10% to test. Then I would like to test other new data, if my list of
attributes are the following:

@attribute 'ID Patient' numeric
@attribute 'Date' {'2019-04-11 11:09:23','2019-04-11 11:08:52','2019-04-02
11:17:15','2019-04-02 10:38:30'}
@attribute Temp numeric
@attribute 'SPO2 Min' numeric
@attribute 'SPO2 Max' numeric
@attribute 'BPM Min' numeric
@attribute 'BPM Max' numeric
@attribute 'BPM Avg' numeric
@attribute SYS numeric
@attribute DIA numeric
@attribute 'EDA Min' numeric
@attribute 'EDA Max' numeric
@attribute 'EDA Avg' numeric
@attribute 'Disease' {'IRA', 'ITU', 'PB'}

*And I want to predict the attribute Disease in the new instances, Should I
put it like this in arff file? *:

19,'2020-01-01
15:45:00',34.31,93.59,98.04,62,65,64.47,78,61,0.91,0.99,0.96,?
37,'2018-05-03
09:51:00',35.57,92.51,97.54,71,74,73.4,130,92,2.19,2.39,2.27,?

The *interrogation* is because I don't know what Disease could they
develop... But in this way I am having the next problem:

Total Number of Instances                0    
Ignored Class Unknown Instances                 10  

Because I'm using ? instead of IRA, ITU or PB.

*How can I do this?
*
And again, thank you a lot Michael, hope to get your knowledge with WEKA
some day!!



--
Sent from: https://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list -- wekalist@list.waikato.ac.nz
Send posts to: To unsubscribe send an email to wekalist-leave@list.waikato.ac.nz
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Use of Classification Algorithms, some doubts.

Michael Hall


On Nov 15, 2019, at 8:05 PM, JC <[hidden email]> wrote:

The *interrogation* is because I don't know what Disease could they
develop... But in this way I am having the next problem:

Total Number of Instances                0     
Ignored Class Unknown Instances                 10   

Because I'm using ? instead of IRA, ITU or PB. 

“Interrogation”?

If you are doing supervised learning your training data needs to have the disease/class field known and not ?
It needs this to train right? Thats the supervision.

For the test data ? Would be fine. Or for any missing data other than disease/class.

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
JC
Reply | Threaded
Open this post in threaded view
|

Re: Use of Classification Algorithms, some doubts.

JC
Sorry, my spanglish... the symbol ? is what I want to say.

For example if I want to predict the disease class for the two next
instances:

19,'2020-01-01
15:45:00',34.31,93.59,98.04,62,65,64.47,78,61,0.91,0.99,0.96,?
37,'2018-05-03
09:51:00',35.57,92.51,97.54,71,74,73.4,130,92,2.19,2.39,2.27,?

is fine in this way? because I introduce this instances as a test instances
but the model tell me that:
 Ignored Class Unknown Instances                 10

*So I don't know if I have to do it in this way or other.
*
Thank you again Michael!



--
Sent from: https://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Use of Classification Algorithms, some doubts.

Michael Hall


On Nov 16, 2019, at 4:58 AM, JC <[hidden email]> wrote:

Ignored Class Unknown Instances                 10

*So I don't know if I have to do it in this way or other.

Possibly I wasn’t clear before. ? as a character has a special meaning in ARFF files. That being missing value. 
Which is usually fine but not always. In training having your class value, the disease in your cas,e having a missing value means it is really not valid at all for training. How can you learn the disease when none is provided? 
If I’m understanding correctly, you are trying to train with 10 instances having the class/disease value set to ?. Again, meaning missing. Weka is telling you it can’t train with those 10 instances.
If I’m not understanding correctly, someone please correct me.




_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
JC
Reply | Threaded
Open this post in threaded view
|

Re: Use of Classification Algorithms, some doubts.

JC
That's right but what I want to do with these 10 instances is just for test,
not for training. I train the model with a previous data and then I try to
test the model with these 10 instances and is when I have the error
"/Ignored Class Unknown Instances/".

To be more concret, I have the algorithm FilteredClassifier with a
RandomTree training with the 90% instances of a dataset and tested with the
10%, randomly.
Then I try to test again the model using these 10 instances and is when I
have the error.

If you need to know something more, just tell me it.

Thank you a lot Michael.



--
Sent from: https://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Use of Classification Algorithms, some doubts.

Michael Hall


> On Nov 16, 2019, at 6:27 AM, JC <[hidden email]> wrote:
>
> That's right but what I want to do with these 10 instances is just for test,
> not for training. I train the model with a previous data and then I try to
> test the model with these 10 instances and is when I have the error
> "/Ignored Class Unknown Instances/".
>
> To be more concret, I have the algorithm FilteredClassifier with a
> RandomTree training with the 90% instances of a dataset and tested with the
> 10%, randomly.
> Then I try to test again the model using these 10 instances and is when I
> have the error.
>
> If you need to know something more, just tell me it.


OK, I have not tried everything and it can be difficult to know exactly what you are doing.
Prediction should be possible with missing class values. Evaluating the accuracy of the predictions is of course not possible.
I copied the Weka provided iris dataset. Pulled out 5 records of each type and saved them in a separate test dataset with ? classes.
Might of messed that up a bit somehow since I then showed iris as having only 130 instances instead of the expected 135.
But it should still work.
I switched to J48. RandomTree is I think generally meant to be used with RandomForest?
I then get as you indicated…

Ignored Class Unknown Instances                 15  

For the evaluation. Because it can’t.

I changed it to output predictions and got…
=== Predictions on test set ==

    inst#     actual  predicted error prediction
        1        1:? 3:Iris-virginica       0.972
        2        1:? 3:Iris-virginica       0.972
        3        1:? 3:Iris-virginica       0.972
        4        1:? 3:Iris-virginica       0.972
        5        1:? 3:Iris-virginica       0.972
        6        1:? 2:Iris-versicolor       0.977
        7        1:? 2:Iris-versicolor       0.977
        8        1:? 2:Iris-versicolor       0.977
        9        1:? 2:Iris-versicolor       0.977
       10        1:? 2:Iris-versicolor       0.977
       11        1:? 1:Iris-setosa       1
       12        1:? 1:Iris-setosa       1
       13        1:? 1:Iris-setosa       1
       14        1:? 1:Iris-setosa       1
       15        1:? 1:Iris-setosa       1

So it does predict. It just can’t evaluate prediction accuracy.
I believe the Weka MOOC’s are always available online. Working through some of these might be helpful to you.
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
JC
Reply | Threaded
Open this post in threaded view
|

Re: Use of Classification Algorithms, some doubts.

JC
Hi Michael!

I have tried in different ways but still the same... Now, I have added the
data without class and the ? to the dataset, and I divided the dataset in
99% to training and 1% for test (this 1% only have instances without class
and the class has to be predicted because is ?) but still the same ... this
is the result:

** Random Tree Evaluation with Datasets **

Total Number of Instances                0    
Ignored Class Unknown Instances                 63    


=== Detailed Accuracy By Class ===

                 TP Rate  FP Rate  Precision  Recall   F-Measure  MCC    
ROC Area  PRC Area  Class
                 ?        ?        ?          ?        ?          ?        ?        
?         IRA
                 ?        ?        ?          ?        ?          ?        ?        
?         ITU
                 ?        ?        ?          ?        ?          ?        ?        
?         PB
WAvg.       ?        ?        ?          ?        ?          ?        ?        
?        

=== Confusion matrix ===
 a b c   <-- classified as
 0 0 0 | a = IRA
 0 0 0 | b = ITU
 0 0 0 | c = PB

Don't know how to do it so ...

Thank you in advance again :) !



--
Sent from: https://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Use of Classification Algorithms, some doubts.

Eibe Frank-2
Administrator
As Michael indicated, you need to output the predictions. Evaluation measures such as ROC will not be available in this case because there is no “ground truth” to compare to: the class values are missing. That is also why the confusion matrix is empty.

Cheers,
Eibe

> On 18/11/2019, at 12:52 AM, JC <[hidden email]> wrote:
>
> Hi Michael!
>
> I have tried in different ways but still the same... Now, I have added the
> data without class and the ? to the dataset, and I divided the dataset in
> 99% to training and 1% for test (this 1% only have instances without class
> and the class has to be predicted because is ?) but still the same ... this
> is the result:
>
> ** Random Tree Evaluation with Datasets **
>
> Total Number of Instances                0    
> Ignored Class Unknown Instances                 63    
>
>
> === Detailed Accuracy By Class ===
>
>                 TP Rate  FP Rate  Precision  Recall   F-Measure  MCC    
> ROC Area  PRC Area  Class
>                 ?        ?        ?          ?        ?          ?        ?        
> ?         IRA
>                 ?        ?        ?          ?        ?          ?        ?        
> ?         ITU
>                 ?        ?        ?          ?        ?          ?        ?        
> ?         PB
> WAvg.       ?        ?        ?          ?        ?          ?        ?        
> ?        
>
> === Confusion matrix ===
> a b c   <-- classified as
> 0 0 0 | a = IRA
> 0 0 0 | b = ITU
> 0 0 0 | c = PB
>
> Don't know how to do it so ...
>
> Thank you in advance again :) !
>
>
>
> --
> Sent from: https://weka.8497.n7.nabble.com/
> _______________________________________________
> Wekalist mailing list -- [hidden email]
> Send posts to: To unsubscribe send an email to [hidden email]
> To subscribe, unsubscribe, etc., visit
> https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html