

Hi
Suppose I have 1000 instances with No (960 values) and Yes (40 values) as a Class attribute. Now I am using SMOTE to perform the oversampling, so what should be the ratio of Yes and No values? If I run SMOTE, the Yes values will become 80 but still, there is a huge difference between the two. My question is what ratio we need to take here and how many times we need to run the SMOTE? I am using software fault prediction dataset so it should be kept in mind that No (nonfaulty classes) will have more percentage than Yes (faulty class).
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalistList etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Administrator

Is the percentage parameter in SMOTE limited to 100%? Cheers, Eibe Suppose I have 1000 instances with No (960 values) and Yes (40 values) as a Class attribute. Now I am using SMOTE to perform the oversampling, so what should be the ratio of Yes and No values? If I run SMOTE, the Yes values will become 80 but still, there is a huge difference between the two. My question is what ratio we need to take here and how many times we need to run the SMOTE? I am using software fault prediction dataset so it should be kept in mind that No (nonfaulty classes) will have more percentage than Yes (faulty class). _______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalistList etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


Is the percentage parameter in SMOTE limited to 100%? Cheers, Eibe Suppose I have 1000 instances with No (960 values) and Yes (40 values) as a Class attribute. Now I am using SMOTE to perform the oversampling, so what should be the ratio of Yes and No values? If I run SMOTE, the Yes values will become 80 but still, there is a huge difference between the two. My question is what ratio we need to take here and how many times we need to run the SMOTE? I am using software fault prediction dataset so it should be kept in mind that No (nonfaulty classes) will have more percentage than Yes (faulty class). _______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalistList etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


Hi Eibe, I am still waiting for your response about the ratio we need to consider when using the SMOTE? Can we consider the 2080 ratio, especially when we need to classify the software faults (as mostly the faulty modules are less than the nonfaulty modules)?
Is the percentage parameter in SMOTE limited to 100%? Cheers, Eibe Suppose I have 1000 instances with No (960 values) and Yes (40 values) as a Class attribute. Now I am using SMOTE to perform the oversampling, so what should be the ratio of Yes and No values? If I run SMOTE, the Yes values will become 80 but still, there is a huge difference between the two. My question is what ratio we need to take here and how many times we need to run the SMOTE? I am using software fault prediction dataset so it should be kept in mind that No (nonfaulty classes) will have more percentage than Yes (faulty class). _______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalistList etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Administrator

You can provide values larger than 100 for the P parameter of SMOTE. If you use 200, you will triple the amount of data in the chosen class. Cheers, Eibe Is the percentage parameter in SMOTE limited to 100%? Cheers, Eibe Suppose I have 1000 instances with No (960 values) and Yes (40 values) as a Class attribute. Now I am using SMOTE to perform the oversampling, so what should be the ratio of Yes and No values? If I run SMOTE, the Yes values will become 80 but still, there is a huge difference between the two. My question is what ratio we need to take here and how many times we need to run the SMOTE? I am using software fault prediction dataset so it should be kept in mind that No (nonfaulty classes) will have more percentage than Yes (faulty class).
_______________________________________________ Wekalist mailing list Send posts to: [hidden email] To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html _______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalistList etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

