ratio of Class values

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

ratio of Class values

asadbtk
Hi

Suppose I have 1000 instances with No (960 values) and Yes (40 values) as a Class attribute. Now I am using SMOTE to perform the oversampling, so what should be the ratio of Yes and No values? If I run SMOTE, the Yes values will become 80 but still, there is a huge difference between the two. My question is what ratio we need to take here and how many times we need to run the SMOTE? I am using software fault prediction dataset so it should be kept in mind that No (non-faulty classes) will have more percentage than Yes (faulty class). 

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: ratio of Class values

Eibe Frank-2
Administrator

Is the percentage parameter in SMOTE limited to 100%?

 

Cheers,

Eibe

 

From: [hidden email]
Sent: Monday, 8 July 2019 9:13 PM
To: [hidden email]
Subject: [Wekalist] ratio of Class values

 

Hi

 

Suppose I have 1000 instances with No (960 values) and Yes (40 values) as a Class attribute. Now I am using SMOTE to perform the oversampling, so what should be the ratio of Yes and No values? If I run SMOTE, the Yes values will become 80 but still, there is a huge difference between the two. My question is what ratio we need to take here and how many times we need to run the SMOTE? I am using software fault prediction dataset so it should be kept in mind that No (non-faulty classes) will have more percentage than Yes (faulty class). 

 


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: ratio of Class values

asadbtk
Yes Eibe, its 100%.



On Tue, Jul 9, 2019 at 9:23 AM Eibe Frank <[hidden email]> wrote:

Is the percentage parameter in SMOTE limited to 100%?

 

Cheers,

Eibe

 

From: [hidden email]
Sent: Monday, 8 July 2019 9:13 PM
To: [hidden email]
Subject: [Wekalist] ratio of Class values

 

Hi

 

Suppose I have 1000 instances with No (960 values) and Yes (40 values) as a Class attribute. Now I am using SMOTE to perform the oversampling, so what should be the ratio of Yes and No values? If I run SMOTE, the Yes values will become 80 but still, there is a huge difference between the two. My question is what ratio we need to take here and how many times we need to run the SMOTE? I am using software fault prediction dataset so it should be kept in mind that No (non-faulty classes) will have more percentage than Yes (faulty class). 

 

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: ratio of Class values

asadbtk
Hi Eibe, I am still waiting for your response about the ratio we need to consider when using the SMOTE?  Can we consider the 20-80 ratio, especially when we need to classify the software faults (as mostly the faulty modules are less than the non-faulty modules)?

On Tue, Jul 9, 2019 at 10:25 AM javed khan <[hidden email]> wrote:
Yes Eibe, its 100%.



On Tue, Jul 9, 2019 at 9:23 AM Eibe Frank <[hidden email]> wrote:

Is the percentage parameter in SMOTE limited to 100%?

 

Cheers,

Eibe

 

From: [hidden email]
Sent: Monday, 8 July 2019 9:13 PM
To: [hidden email]
Subject: [Wekalist] ratio of Class values

 

Hi

 

Suppose I have 1000 instances with No (960 values) and Yes (40 values) as a Class attribute. Now I am using SMOTE to perform the oversampling, so what should be the ratio of Yes and No values? If I run SMOTE, the Yes values will become 80 but still, there is a huge difference between the two. My question is what ratio we need to take here and how many times we need to run the SMOTE? I am using software fault prediction dataset so it should be kept in mind that No (non-faulty classes) will have more percentage than Yes (faulty class). 

 

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: ratio of Class values

Eibe Frank-2
Administrator
In reply to this post by asadbtk

You can provide values larger than 100 for the -P parameter of SMOTE. If you use 200, you will triple the amount of data in the chosen class.

 

Cheers,

Eibe

 

From: [hidden email]
Sent: Tuesday, 9 July 2019 8:26 PM
To: [hidden email]
Subject: Re: [Wekalist] ratio of Class values

 

Yes Eibe, its 100%.

 

 

 

On Tue, Jul 9, 2019 at 9:23 AM Eibe Frank <[hidden email]> wrote:

Is the percentage parameter in SMOTE limited to 100%?

 

Cheers,

Eibe

 

From: [hidden email]
Sent: Monday, 8 July 2019 9:13 PM
To: [hidden email]
Subject: [Wekalist] ratio of Class values

 

Hi

 

Suppose I have 1000 instances with No (960 values) and Yes (40 values) as a Class attribute. Now I am using SMOTE to perform the oversampling, so what should be the ratio of Yes and No values? If I run SMOTE, the Yes values will become 80 but still, there is a huge difference between the two. My question is what ratio we need to take here and how many times we need to run the SMOTE? I am using software fault prediction dataset so it should be kept in mind that No (non-faulty classes) will have more percentage than Yes (faulty class). 

 

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

 


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html