WEKA and "small" data

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

WEKA and "small" data

Anatoliy
I would like to hear from you some advice on the peculiarities of working
with data of small sets but with a large number of features (for example, 15
instances and 30 features). Unfortunately, with such data, the model result
(regression, correlation coefficient) strongly depends (up to a sign change)
on the folds of cross-validation when determining the training - test set.

thanks in advance
Anatoliy



--
Sent from: https://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: WEKA and "small" data

George Dombi-2
Hi Anatoliy,
Some people use a support vector machine (SVM) for the problem of too few cases, too many variables.
WEKA has an SVM application.
Here is a tutorial.
bye for now,
George

On Mon, Nov 30, 2020 at 3:46 PM Anatoliy <[hidden email]> wrote:
I would like to hear from you some advice on the peculiarities of working
with data of small sets but with a large number of features (for example, 15
instances and 30 features). Unfortunately, with such data, the model result
(regression, correlation coefficient) strongly depends (up to a sign change)
on the folds of cross-validation when determining the training - test set.

thanks in advance
Anatoliy



--
Sent from: https://urldefense.proofpoint.com/v2/url?u=https-3A__weka.8497.n7.nabble.com_&d=DwICAg&c=dWz0sRZOjEnYSN4E4J0dug&r=L_XcE5s3Y8gYiZ93aWBrvw&m=2W_ET9rVJcvQLZzEfocUfuUDQo9s95xtc6TezMvyNNY&s=QYSASoWeUui58WRHNOEAc2ey9ZO5QWTYOeig8XEtpIM&e=
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://urldefense.proofpoint.com/v2/url?u=https-3A__list.waikato.ac.nz_postorius_lists_wekalist.list.waikato.ac.nz&d=DwICAg&c=dWz0sRZOjEnYSN4E4J0dug&r=L_XcE5s3Y8gYiZ93aWBrvw&m=2W_ET9rVJcvQLZzEfocUfuUDQo9s95xtc6TezMvyNNY&s=I0LTE0lHuDi_2Q6W5ByjqyMT2eDvv9ErtQDy7QgM0Oo&e=
List etiquette: https://urldefense.proofpoint.com/v2/url?u=http-3A__www.cs.waikato.ac.nz_-7Eml_weka_mailinglist-5Fetiquette.html&d=DwICAg&c=dWz0sRZOjEnYSN4E4J0dug&r=L_XcE5s3Y8gYiZ93aWBrvw&m=2W_ET9rVJcvQLZzEfocUfuUDQo9s95xtc6TezMvyNNY&s=rNWj87HmMbBK0Z3BT1uWThM102YDLy8Z6XzMiIRhGxo&e=

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: WEKA and "small" data

Anatoliy
Hi George!
Great. Thanks. Yes, I read about the use of SVM. By the way, thanks for the
link, a good resource. I am more interested in the question - what to do
with the division into training / test set of such small data? Is there a
principle for determining effective cross-validation folds?



--
Sent from: https://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: WEKA and "small" data

George Dombi-2
Hi Anatoliy,
Thanks for your email.
I usually use 1/10 to 1/4 of the data for the training set.
But with the problem of too few cases, I would be tempted to treat the whole data set as the training set.
You can experiment with different splits in the testing/ training set ratio and see if it makes a significant difference.
I'm not the best person to ask about how to determine the most effective cross-validation folds.  I am under the assumption that folds require large data sets and you don't have one. So I hesitate to offer an opinion there.
I do know that if you are looking at a known minority case you may have to over represent it in the training set or it will be ignored if it is too infrequent.  This occurs in medical data where Death is an infrequent but important outcome. If that outcome is less than 1/10 of the  cases, then there is a risk that it will be neglected in the training set.  I sometimes. put more of the rare cases into the training set than in the total data set so as the learning engine will not dismiss it. I also make sure there are still some un-see minority cases in the test set.
Good luck.
Bye for now,
George


_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: WEKA and "small" data

Anatoliy
Hi George!
Thanks for the feedback! I realized I needed to increase the amount of data.
Since, in my case, model correlation is highly dependent on fold
cross-validation. As far as I understand, choosing the training set itself
as a test is a rather dangerous thing, but if I don't intend to generalize
the model, can I try it?

regards
Anatoliy



--
Sent from: https://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: WEKA and "small" data

George Dombi-2
Hi Anatoliy,
Sure, you can try it, Just create a set of different models  for each test/training split.
Then look at the outcome of the models to see if they are different in any significant way.
Bye for now,
George

On Thu, Dec 3, 2020 at 3:45 PM Anatoliy <[hidden email]> wrote:
Hi George!
Thanks for the feedback! I realized I needed to increase the amount of data.
Since, in my case, model correlation is highly dependent on fold
cross-validation. As far as I understand, choosing the training set itself
as a test is a rather dangerous thing, but if I don't intend to generalize
the model, can I try it?

regards
Anatoliy



--
Sent from: https://urldefense.proofpoint.com/v2/url?u=https-3A__weka.8497.n7.nabble.com_&d=DwICAg&c=dWz0sRZOjEnYSN4E4J0dug&r=L_XcE5s3Y8gYiZ93aWBrvw&m=gk3i4AYi2rZ-9atFMCwwdddgqsubuqvbT4BfBqSuf-A&s=taZCmxTyHhgoiJR5YsvCM7yqU1MZkUHFmVoKPJJgEoE&e=
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://urldefense.proofpoint.com/v2/url?u=https-3A__list.waikato.ac.nz_postorius_lists_wekalist.list.waikato.ac.nz&d=DwICAg&c=dWz0sRZOjEnYSN4E4J0dug&r=L_XcE5s3Y8gYiZ93aWBrvw&m=gk3i4AYi2rZ-9atFMCwwdddgqsubuqvbT4BfBqSuf-A&s=CqgPKq2bzl5-9UyQ9ps8rEJRbZ8FOmT-QmSXN0cVE3k&e=
List etiquette: https://urldefense.proofpoint.com/v2/url?u=http-3A__www.cs.waikato.ac.nz_-7Eml_weka_mailinglist-5Fetiquette.html&d=DwICAg&c=dWz0sRZOjEnYSN4E4J0dug&r=L_XcE5s3Y8gYiZ93aWBrvw&m=gk3i4AYi2rZ-9atFMCwwdddgqsubuqvbT4BfBqSuf-A&s=o2t6pY4r6R-6xQLBW34z9zkFP-JCJT5SdBMA1ssJppw&e=

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: WEKA and "small" data

Anatoliy
Hi George!
Yes, if I am not generalizing the model then I can probably do so. Why am I
not generalizing? - this is a heuristic conclusion - there are a couple of
attributes that I cannot take into account in the model, but which
significantly affect generalization. Therefore, I decided to apply the model
locally, where these attributes affect on average in the form of a constant.

kind regards

Anatoliy



--
Sent from: https://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: WEKA and "small" data

George Dombi-2
Hi Anatoliy,
Thanks for your email.
Try it and let me know how it goes.
Bye for now,
George

On Sat, Dec 5, 2020 at 5:36 PM Anatoliy <[hidden email]> wrote:
Hi George!
Yes, if I am not generalizing the model then I can probably do so. Why am I
not generalizing? - this is a heuristic conclusion - there are a couple of
attributes that I cannot take into account in the model, but which
significantly affect generalization. Therefore, I decided to apply the model
locally, where these attributes affect on average in the form of a constant.

kind regards

Anatoliy



--
Sent from: https://urldefense.proofpoint.com/v2/url?u=https-3A__weka.8497.n7.nabble.com_&d=DwICAg&c=dWz0sRZOjEnYSN4E4J0dug&r=L_XcE5s3Y8gYiZ93aWBrvw&m=kU6_n9MCsF5wf8Ar7-NpQ5FdyOIs76Z4C9eCh06Dzd4&s=zvt81FWyBFveFjIl3ZbEkRPeBlFz1P4DthPjaR3fH2w&e=
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://urldefense.proofpoint.com/v2/url?u=https-3A__list.waikato.ac.nz_postorius_lists_wekalist.list.waikato.ac.nz&d=DwICAg&c=dWz0sRZOjEnYSN4E4J0dug&r=L_XcE5s3Y8gYiZ93aWBrvw&m=kU6_n9MCsF5wf8Ar7-NpQ5FdyOIs76Z4C9eCh06Dzd4&s=vyaaQlPr9tKSHeOfnfWSbOCSczCMm3OBLPi6UTtRy9Q&e=
List etiquette: https://urldefense.proofpoint.com/v2/url?u=http-3A__www.cs.waikato.ac.nz_-7Eml_weka_mailinglist-5Fetiquette.html&d=DwICAg&c=dWz0sRZOjEnYSN4E4J0dug&r=L_XcE5s3Y8gYiZ93aWBrvw&m=kU6_n9MCsF5wf8Ar7-NpQ5FdyOIs76Z4C9eCh06Dzd4&s=EFXRL9ylMR2RacsY1kMwLpvQOCai9f2FKAuenMOFFL8&e=

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html