About the optimization of the Naive Bayes classifier.

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

About the optimization of the Naive Bayes classifier.

Liming Tan
Hello!

A paper I read recently mentioned the use of the open-source toolkit WEKA. 

Three data sets are used in the paper: training set, development set, and test set. The classifier chosen is a Naive Bayes classifier.

The original paper contains this sentence:
"The parameters of the classifier (using kernel density or normal estimator) are optimised on the development set and applied to the test set."

But I don't find the option to use the development set in WEKA's Explorer.
Does this mean that the development set is merged into the training set?


_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: About the optimization of the Naive Bayes classifier.

Eibe Frank-2
Administrator
There is no very convenient way to do this in WEKA using a single train/validation split. However, you can use MultiScheme (https://weka.sourceforge.io/doc.stable-3-8/weka/classifiers/meta/MultiScheme.html) to implement selection using k-fold cross-validation (on the training set). This will be more robust anyway and is generally preferable unless the dataset is so large that k-fold cross-validation becomes too expensive..

Cheers,
Eibe

On Fri, Apr 30, 2021 at 11:22 AM Liming Tan <[hidden email]> wrote:
Hello!

A paper I read recently mentioned the use of the open-source toolkit WEKA. 

Three data sets are used in the paper: training set, development set, and test set. The classifier chosen is a Naive Bayes classifier.

The original paper contains this sentence:
"The parameters of the classifier (using kernel density or normal estimator) are optimised on the development set and applied to the test set."

But I don't find the option to use the development set in WEKA's Explorer.
Does this mean that the development set is merged into the training set?

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: About the optimization of the Naive Bayes classifier.

Liming Tan
Hi Eibe,
Thank you for your reply.

MultiScheme appears to be used to compare the performance of different classifiers. I merged the training and development sets into one training set and then used 10-folds cross-validation, which still did not work particularly well on the test set.

If I want to train a classifier model based on a train set (e.g. naive bayes) and optimize the parameters of the model using a development set in several iterations, and finally use a test set to test the performance of the model.
I would like to know how to implement it using weka.

Cheers,
Liming.


 
 
 
------------------ Original ------------------
Date:  Sat, May 1, 2021 05:03 PM
To:  "Weka machine learning workbench list."<[hidden email]>;
Subject:  [Wekalist] Re: About the optimization of the Naive Bayes classifier.
 
There is no very convenient way to do this in WEKA using a single train/validation split. However, you can use MultiScheme (https://weka.sourceforge.io/doc.stable-3-8/weka/classifiers/meta/MultiScheme.html) to implement selection using k-fold cross-validation (on the training set). This will be more robust anyway and is generally preferable unless the dataset is so large that k-fold cross-validation becomes too expensive..

Cheers,
Eibe

On Fri, Apr 30, 2021 at 11:22 AM Liming Tan <[hidden email]> wrote:
Hello!

A paper I read recently mentioned the use of the open-source toolkit WEKA. 

Three data sets are used in the paper: training set, development set, and test set. The classifier chosen is a Naive Bayes classifier.

The original paper contains this sentence:
"The parameters of the classifier (using kernel density or normal estimator) are optimised on the development set and applied to the test set."

But I don't find the option to use the development set in WEKA's Explorer.
Does this mean that the development set is merged into the training set?

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: About the optimization of the Naive Bayes classifier.

Eibe Frank-3
As I said, you can use MultiScheme to automatically choose between different classifiers or classifier configurations based on k-fold cross-validation. It is not designed to compare the performance of different classifiers or classifier configurations. The Experimenter is the right tool for such a comparison (or, in a limited manner, the standard Explorer/KnowledgeFlow/CLI evaluation based on k-fold cross-validation or similar). MultiScheme performs *internal* k-fold cross-validation based on the training set only. It does this for all classifiers or classifier configurations specified as parameters of MultiScheme when you run it, picking the one that yields the best performance as estimated by the k-fold cross-validation as the final classifier that will be used for any test data (e.g., test data provided as part of an *outer* k-fold cross-validation).

If k-fold cross-validation-based selection via MultiScheme does not provide reasonable selection performance, and cannot successfully choose the best NaiveBayes classifier configuration for you, a single train-validation split will generally not give you reasonable performance either. (Having said that, one reason not to use standard k-fold cross-validation for selection, or evaluation for that matter, is when the data is ordered because shuffling the data in k-fold cross-validation will destroy the order.)

There is the option to write a scriptable classifier in Groovy or Jython that does exactly what you want. Using the WekaPyScript package, you could even write such a classifier in Python. Of course, you could use Java too. However, to my knowledge (and it is entirely possible that I'm overlooking something), there is no way to do what you want to do directly via WEKA's built-in tools.

Cheers,
Eibe

On Thu, May 6, 2021 at 9:37 PM Liming Tan <[hidden email]> wrote:
Hi Eibe,
Thank you for your reply.

MultiScheme appears to be used to compare the performance of different classifiers. I merged the training and development sets into one training set and then used 10-folds cross-validation, which still did not work particularly well on the test set.

If I want to train a classifier model based on a train set (e.g. naive bayes) and optimize the parameters of the model using a development set in several iterations, and finally use a test set to test the performance of the model.
I would like to know how to implement it using weka.

Cheers,
Liming.


 
 
 
------------------ Original ------------------
From:  "Eibe Frank"<[hidden email]>;
Date:  Sat, May 1, 2021 05:03 PM
To:  "Weka machine learning workbench list."<[hidden email]>;
Subject:  [Wekalist] Re: About the optimization of the Naive Bayes classifier.
 
There is no very convenient way to do this in WEKA using a single train/validation split. However, you can use MultiScheme (https://weka.sourceforge.io/doc.stable-3-8/weka/classifiers/meta/MultiScheme.html) to implement selection using k-fold cross-validation (on the training set). This will be more robust anyway and is generally preferable unless the dataset is so large that k-fold cross-validation becomes too expensive..

Cheers,
Eibe

On Fri, Apr 30, 2021 at 11:22 AM Liming Tan <[hidden email]> wrote:
Hello!

A paper I read recently mentioned the use of the open-source toolkit WEKA. 

Three data sets are used in the paper: training set, development set, and test set. The classifier chosen is a Naive Bayes classifier.

The original paper contains this sentence:
"The parameters of the classifier (using kernel density or normal estimator) are optimised on the development set and applied to the test set."

But I don't find the option to use the development set in WEKA's Explorer.
Does this mean that the development set is merged into the training set?

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html