Document classification

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Document classification

andria lan
I need to perfrom document classification, for this reeason, I have two files (training set and external test set). Now, I applied StringToWordVector through FilterdClassifier to classify the test set data that get loaded from "Supplied test set" option, and I get the result. 

Now, I need to know how the classification process occurred using StringToWordVector. Precisely, StringToWordVector converted the text in the training set into data from numeric type, then learning algorithm learned this data. However, test data have a string type. Meaning that, training set and test set have different types, but how the classification process accomplished in spite this sort of type differences (training and test sets are not compatible?

Any help would be highly appreciated.

Andria

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Document classification

haytham.salhi
Hi Andria, 

Not sure I understand you exactly. Anyhow, StringToWordVector is needed if you want to convert your text data into a vector space representation. If you apply this filter on the training data and then you train your classifier based on this filtered data, then you have to apply the same filter on the test data to get reasonable results. In other words, both training and test data should be represented in the same way.

Best,
Haytham  



On Wed, Dec 28, 2016 at 8:29 PM, Andria Lan <[hidden email]> wrote:
I need to perfrom document classification, for this reeason, I have two files (training set and external test set). Now, I applied StringToWordVector through FilterdClassifier to classify the test set data that get loaded from "Supplied test set" option, and I get the result. 

Now, I need to know how the classification process occurred using StringToWordVector. Precisely, StringToWordVector converted the text in the training set into data from numeric type, then learning algorithm learned this data. However, test data have a string type. Meaning that, training set and test set have different types, but how the classification process accomplished in spite this sort of type differences (training and test sets are not compatible?

Any help would be highly appreciated.

Andria

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Document classification

Baskar Jayaraman
In reply to this post by andria lan
There are a couple of ways to handle these scenarios and it depends on the options you are using in your vector space representation of your corpus.

1. Merge the train and test into one data set, apply StringToWordVector to the entire data. Then split the data into train and test data (using a unique row ID for example), build the model on train data and test it on test data. I am not in favor of this approach because in my opinion you are "contaminating" the training data with test data and as such the model won't be unbiased.

2. I think a better approach is to apply the StringToWordVector to the training data, build a lookup table of the necessary metrics that you would need to do the same calculations on the test data (for example, if you are using TF-IDF, then you would need the IDF for each word as generated by StringToWordVector on the train data), convert the test data into the same format using the lookup table and then test the model on the converted test data.

Perhaps there are some weka filters that can help make this easier.

HTH,
Baskar

On Wed, Dec 28, 2016 at 10:29 AM, Andria Lan <[hidden email]> wrote:
I need to perfrom document classification, for this reeason, I have two files (training set and external test set). Now, I applied StringToWordVector through FilterdClassifier to classify the test set data that get loaded from "Supplied test set" option, and I get the result. 

Now, I need to know how the classification process occurred using StringToWordVector. Precisely, StringToWordVector converted the text in the training set into data from numeric type, then learning algorithm learned this data. However, test data have a string type. Meaning that, training set and test set have different types, but how the classification process accomplished in spite this sort of type differences (training and test sets are not compatible?

Any help would be highly appreciated.

Andria

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Document classification

andria lan

Thanks guys, but this is not the answers I'm looking for. Perhaps if any of WEKA team can help my to understand the issue.

Andria

On 29 Dec 2016 2:50 am, "Baskar Jayaraman" <[hidden email]> wrote:
There are a couple of ways to handle these scenarios and it depends on the options you are using in your vector space representation of your corpus.

1. Merge the train and test into one data set, apply StringToWordVector to the entire data. Then split the data into train and test data (using a unique row ID for example), build the model on train data and test it on test data. I am not in favor of this approach because in my opinion you are "contaminating" the training data with test data and as such the model won't be unbiased.

2. I think a better approach is to apply the StringToWordVector to the training data, build a lookup table of the necessary metrics that you would need to do the same calculations on the test data (for example, if you are using TF-IDF, then you would need the IDF for each word as generated by StringToWordVector on the train data), convert the test data into the same format using the lookup table and then test the model on the converted test data.

Perhaps there are some weka filters that can help make this easier.

HTH,
Baskar

On Wed, Dec 28, 2016 at 10:29 AM, Andria Lan <[hidden email]> wrote:
I need to perfrom document classification, for this reeason, I have two files (training set and external test set). Now, I applied StringToWordVector through FilterdClassifier to classify the test set data that get loaded from "Supplied test set" option, and I get the result. 

Now, I need to know how the classification process occurred using StringToWordVector. Precisely, StringToWordVector converted the text in the training set into data from numeric type, then learning algorithm learned this data. However, test data have a string type. Meaning that, training set and test set have different types, but how the classification process accomplished in spite this sort of type differences (training and test sets are not compatible?

Any help would be highly appreciated.

Andria

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Document classification

Peter Reutemann
In reply to this post by andria lan
> I need to perfrom document classification, for this reeason, I have two
> files (training set and external test set). Now, I applied
> StringToWordVector through FilterdClassifier to classify the test set data
> that get loaded from "Supplied test set" option, and I get the result.
>
> Now, I need to know how the classification process occurred using
> StringToWordVector. Precisely, StringToWordVector converted the text in the
> training set into data from numeric type, then learning algorithm learned
> this data. However, test data have a string type. Meaning that, training set
> and test set have different types, but how the classification process
> accomplished in spite this sort of type differences (training and test sets
> are not compatible?

I'm not quite sure I'm following what your data looks like. If you use
the FilteredClassifier in conjunction with your choice of base
classifier and StringToWordVector, then your training and test data
still have to have the same structure, at least a STRING attribute
containing the textual data and a class attribute. The filter gets
initialized using the training data and the test data gets processed
accordingly. Depending on the setup, the StringToWordVector builds a
dictionary of words occurring in the data (keeping only a specified
maximum; stopword removal; application of word stemming) which will be
used as features, e.g., absence/presence of a word, word count,
TF/IDF, etc.

Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 858-5174
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Document classification

andria lan

> > I need to perfrom document classification, for this reeason, I have two
> > files (training set and external test set). Now, I applied
> > StringToWordVector through FilterdClassifier to classify the test set data
> > that get loaded from "Supplied test set" option, and I get the result.
> >
> > Now, I need to know how the classification process occurred using
> > StringToWordVector. Precisely, StringToWordVector converted the text in the
> > training set into data from numeric type, then learning algorithm learned
> > this data. However, test data have a string type. Meaning that, training set
> > and test set have different types, but how the classification process
> > accomplished in spite this sort of type differences (training and test sets
> > are not compatible?
>
> I'm not quite sure I'm following what your data looks like. If you use
> the FilteredClassifier in conjunction with your choice of base
> classifier and StringToWordVector, then your training and test data
> still have to have the same structure, at least a STRING attribute

Thanks Peter. But still not clear for me having same data structure after applying StringToWordVector. Again here is my scenario:

I have 2 datasets (training and test) both have STRING type. I loaded training set from Preprocess panel and test set from Supplied test set option.

Now, after applying StringToWordVector (through FilteredClassifier), it converts training set into a NUMERIC type, and then the selected classifier builds the classification model from the numeric training set. While test set remains from STRING type. In the end, we left with numeric attributes for the training set, while string attribute for the test set. How these 2 datasets have same structure and how they are compatible?

Andria

> containing the textual data and a class attribute. The filter gets
> initialized using the training data and the test data gets processed
> accordingly. Depending on the setup, the StringToWordVector builds a
> dictionary of words occurring in the data (keeping only a specified
> maximum; stopword removal; application of word stemming) which will be
> used as features, e.g., absence/presence of a word, word count,
> TF/IDF, etc.
>
> Cheers, Peter
> --
> Peter Reutemann
> Dept. of Computer Science
> University of Waikato, NZ
> +64 (7) 858-5174
> http://www.cms.waikato.ac.nz/~fracpete/
> http://www.data-mining.co.nz/
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Document classification

Peter Reutemann
>> I'm not quite sure I'm following what your data looks like. If you use
>> the FilteredClassifier in conjunction with your choice of base
>> classifier and StringToWordVector, then your training and test data
>> still have to have the same structure, at least a STRING attribute
>
> Thanks Peter. But still not clear for me having same data structure after
> applying StringToWordVector. Again here is my scenario:
>
> I have 2 datasets (training and test) both have STRING type. I loaded
> training set from Preprocess panel and test set from Supplied test set
> option.
>
> Now, after applying StringToWordVector (through FilteredClassifier), it
> converts training set into a NUMERIC type, and then the selected classifier
> builds the classification model from the numeric training set. While test
> set remains from STRING type. In the end, we left with numeric attributes
> for the training set, while string attribute for the test set. How these 2
> datasets have same structure and how they are compatible?

The StringToWordVector filter, after it has been initialized and the
dictionary been built, will convert subsequent datasets (in your case
the test set), into the same structure. You don't ever see that as it
happens internally within the FilteredClassifier. The model output of
the FilteredClassifier outputs what the transformed dataset structure
looks like.

Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 858-5174
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Document classification

andria lan
This makes sense now. Thanks Peter.

On the other hand, is same thing occurs when saving the model that is resulting from the FilterdClassifier (i.e., StringToWordVector and the selected classifier method) then reload it again and apply the "Re-evaluate model on current test set" event? 

Adnria 

On Thu, Dec 29, 2016 at 3:53 PM, Peter Reutemann <[hidden email]> wrote:
>> I'm not quite sure I'm following what your data looks like. If you use
>> the FilteredClassifier in conjunction with your choice of base
>> classifier and StringToWordVector, then your training and test data
>> still have to have the same structure, at least a STRING attribute
>
> Thanks Peter. But still not clear for me having same data structure after
> applying StringToWordVector. Again here is my scenario:
>
> I have 2 datasets (training and test) both have STRING type. I loaded
> training set from Preprocess panel and test set from Supplied test set
> option.
>
> Now, after applying StringToWordVector (through FilteredClassifier), it
> converts training set into a NUMERIC type, and then the selected classifier
> builds the classification model from the numeric training set. While test
> set remains from STRING type. In the end, we left with numeric attributes
> for the training set, while string attribute for the test set. How these 2
> datasets have same structure and how they are compatible?

The StringToWordVector filter, after it has been initialized and the
dictionary been built, will convert subsequent datasets (in your case
the test set), into the same structure. You don't ever see that as it
happens internally within the FilteredClassifier. The model output of
the FilteredClassifier outputs what the transformed dataset structure
looks like.

Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
<a href="tel:%2B64%20%287%29%20858-5174" value="+6478585174">+64 (7) 858-5174
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Document classification

Jose Maria Gomez Hidalgo-2
In reply to this post by Peter Reutemann
Hi all

To keep it simple: the FilteredClassifier that combines the StringToWordVector (STWV) with the choosen classifier is a classifier itself. Its input is a dataset (training set) with at least one attribute of String type. How the STWV is applied internally by the FilteredClassifier is transparent to the user.

As the input for training is a dataset with a String attribute, the input as test set is a dataset with a String attribute. You have only to take care of the other attributes (class, and/or others you may have in your datasets).

For instance, let's say you have a spam filtering problem with this training dataset T.arff:

@relation training-spam-or-not
@attribute message String
@attribute class {spam,ham}
@data
"viagra",spam
"I am your boss",ham

You may build a FilteredClassifier=C with a STWV filter and a NaiveBayes classifier. Its input is the previous dataset. After training, you do not access to the STWV "model" (words, etc.), instead you have a classifier C with T.arff input, that is, strings plus classes.

In order yo evaluate it on a separate test, you need another dataset with the same structure:

@relation test-spam-or-not
@attribute message String
@attribute class {spam,ham}
@data
...

That is, training and test sets do match, because the input for C is strings plus classes. WEKA handles this transparetnly and smoothly.

For the case you need an extended example, and apologizing for the SSP, I may suggest some posts from my blog:


I hope this helps.

Best regards and Season's Greetings,

Jose Maria

--
José María Gómez Hidalgo
Twitter: @jmgomez
LinkedIn: http://www.linkedin.com/in/jmgomezh/
Web: http://www.esp.uem.es/jmgomez/



De: Peter Reutemann <[hidden email]>
Para: Weka machine learning workbench list. <[hidden email]>
Enviado: Jueves 29 de diciembre de 2016 8:53
Asunto: Re: [Wekalist] Document classification

>> I'm not quite sure I'm following what your data looks like. If you use
>> the FilteredClassifier in conjunction with your choice of base
>> classifier and StringToWordVector, then your training and test data
>> still have to have the same structure, at least a STRING attribute
>
> Thanks Peter. But still not clear for me having same data structure after
> applying StringToWordVector. Again here is my scenario:
>
> I have 2 datasets (training and test) both have STRING type. I loaded
> training set from Preprocess panel and test set from Supplied test set
> option.
>
> Now, after applying StringToWordVector (through FilteredClassifier), it
> converts training set into a NUMERIC type, and then the selected classifier
> builds the classification model from the numeric training set. While test
> set remains from STRING type. In the end, we left with numeric attributes
> for the training set, while string attribute for the test set. How these 2
> datasets have same structure and how they are compatible?

The StringToWordVector filter, after it has been initialized and the
dictionary been built, will convert subsequent datasets (in your case
the test set), into the same structure. You don't ever see that as it
happens internally within the FilteredClassifier. The model output of
the FilteredClassifier outputs what the transformed dataset structure

looks like.

Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 858-5174
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Document classification

Peter Reutemann-3
In reply to this post by andria lan

>On the other hand, is same thing occurs when saving the model that is
>resulting from the FilterdClassifier (i.e., StringToWordVector and the
>selected classifier method) then reload it again and apply the
>"Re-evaluate
>model on current test set" event?

Yes, makes no difference, gets handled by the FilteredClassifier each time.

Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 858-5174
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Document classification

andria lan
Now when loading "ReutersCorn-train" dataset from the Preprocess panel, and import the "ReutersCorn-test". After that applied the FilteredClassifier in conjunction with both "J48" and "StringToWordVector", I had correctly Classified Instances = 98.2955 %. Then when saved the resulting model, reload it, and applied Re-evaluate model on the test set, I had Correctly Classified Instances= 97.351 %, which is different from the previous case. Why there are two different results in the two cases?

Andria


On Thu, Dec 29, 2016 at 5:11 PM, Peter Reutemann <[hidden email]> wrote:

>On the other hand, is same thing occurs when saving the model that is
>resulting from the FilterdClassifier (i.e., StringToWordVector and the
>selected classifier method) then reload it again and apply the
>"Re-evaluate
>model on current test set" event?

Yes, makes no difference, gets handled by the FilteredClassifier each time.

Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
<a href="tel:%2B64%20%287%29%20858-5174" value="+6478585174">+64 (7) 858-5174
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Document classification

Peter Reutemann
> Now when loading "ReutersCorn-train" dataset from the Preprocess panel, and
> import the "ReutersCorn-test". After that applied the FilteredClassifier in
> conjunction with both "J48" and "StringToWordVector", I had correctly
> Classified Instances = 98.2955 %. Then when saved the resulting model,
> reload it, and applied Re-evaluate model on the test set, I had Correctly
> Classified Instances= 97.351 %, which is different from the previous case.
> Why there are two different results in the two cases?

Don't know, as I don't know what your evaluation method was when
building the model.
Unless you select the same dataset as "supplied test set" that you use
for "re-evaluating model", your results will most likely differ.

Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 858-5174
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Document classification

andria lan
I used Percentage split to evaluate the model and then save it. Which in trun produces different results after loading adn applying the Re-evaluate model on the test set. Its' a strange behavior 

Andira

On Thu, Dec 29, 2016 at 5:23 PM, Peter Reutemann <[hidden email]> wrote:
> Now when loading "ReutersCorn-train" dataset from the Preprocess panel, and
> import the "ReutersCorn-test". After that applied the FilteredClassifier in
> conjunction with both "J48" and "StringToWordVector", I had correctly
> Classified Instances = 98.2955 %. Then when saved the resulting model,
> reload it, and applied Re-evaluate model on the test set, I had Correctly
> Classified Instances= 97.351 %, which is different from the previous case.
> Why there are two different results in the two cases?

Don't know, as I don't know what your evaluation method was when
building the model.
Unless you select the same dataset as "supplied test set" that you use
for "re-evaluating model", your results will most likely differ.

Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
<a href="tel:%2B64%20%287%29%20858-5174" value="+6478585174">+64 (7) 858-5174
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Document classification

Peter Reutemann
> I used Percentage split to evaluate the model and then save it. Which in
> trun produces different results after loading adn applying the Re-evaluate
> model on the test set. Its' a strange behavior

How is that a strange behavior? You have two different test sets.

Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 858-5174
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Document classification

andria lan
I see. It's clear now.

Thanks Peter. 

Andria

On Thu, Dec 29, 2016 at 5:30 PM, Peter Reutemann <[hidden email]> wrote:
> I used Percentage split to evaluate the model and then save it. Which in
> trun produces different results after loading adn applying the Re-evaluate
> model on the test set. Its' a strange behavior

How is that a strange behavior? You have two different test sets.

Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
<a href="tel:%2B64%20%287%29%20858-5174" value="+6478585174">+64 (7) 858-5174
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Loading...