the use of SMOTE

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

the use of SMOTE

manal alghamdi

I have imbalance class, and the algorithms won't learn anything.  It shows no good result at all.  So, I want to apply SMOTE to have the class more balanced.  So, if I apply the SMOTE on the whole dataset; then, I divided the dataset into training dataset and testing dataset, would that be correct?

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: the use of SMOTE

Eibe Frank-3
No, you should leave the test data untouched. The FilteredClassifier will do the right thing for you. Apply SMOTE with the FilteredClassifier to the unmodified training and test data.

Cheers,
Eibe

On 29/12/2016 1:34 pm, "manal alghamdi" <[hidden email]> wrote:

I have imbalance class, and the algorithms won't learn anything.  It shows no good result at all.  So, I want to apply SMOTE to have the class more balanced.  So, if I apply the SMOTE on the whole dataset; then, I divided the dataset into training dataset and testing dataset, would that be correct?

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Evaluating Numeric Prediction

Bob Matthews
I am reading Chapter 5, pp195-197
Weka: Data Mining 4th edition

When I run M5P on 4,100 instances using 4 attributes and seeking % change as the Class I get the following results

Why are the results so poor and what is causing the two infinity results ?

Bob M

=== Evaluation on training set ===







Time taken to test model on training data: 2.44 seconds






=== Summary ===









Correlation coefficient                  0.105


Mean absolute percentage error     Infinity

Root mean square percentage error  Infinity

Mean absolute error                      0.2211


Root mean squared error                  1.5774


Relative absolute error                110.5534 %

Root relative squared error             99.447  %

Total Number of Instances             4100    








_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Evaluating Numeric Prediction

Mark Hall

The relative metrics seem to suggest that M5P is not achieving any better than just predicting using the mean class value of the training data. What was the model generated by M5P?

There are probably instances that have zero as the class value. The MAPE breaks down in this case:

https://en.wikipedia.org/wiki/Mean_absolute_percentage_error

Cheers,

Mark.

 

From: <[hidden email]> on behalf of Bob Matthews <[hidden email]>
Reply-To: "Weka machine learning workbench list." <[hidden email]>
Date: Thursday, 29 December 2016 at 4:08 PM
To: "Weka machine learning workbench list." <[hidden email]>
Subject: [Wekalist] Evaluating Numeric Prediction

 

I am reading Chapter 5, pp195-197
Weka: Data Mining 4th edition

When I run M5P on 4,100 instances using 4 attributes and seeking % change as the Class I get the following results

Why are the results so poor and what is causing the two infinity results ?

Bob M

=== Evaluation on training set ===

 

 

Time taken to test model on training data: 2.44 seconds

=== Summary ===

Correlation coefficient                  0.105

Mean absolute percentage error     Infinity

Root mean square percentage error  Infinity

Mean absolute error                      0.2211

Root mean squared error                  1.5774

Relative absolute error                110.5534 %

Root relative squared error             99.447  %

Total Number of Instances             4100     






_______________________________________________ Wekalist mailing list Send posts to: [hidden email] List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Loading...