Problem of "extrapolation" in machine learning

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Problem of "extrapolation" in machine learning

Anatoliy
Hi everybody!
Can you advise what to read about the problem of "extrapolation" in machine
learning? This does not mean predicting the "future." For example, we set up
a model (classification or regression) on one data set. We use this model
for data that has its own minimum and maximum values ​​in attributes that
differ from the model in statistical characteristics and whose values ​​are
not present in the training set. How will the model behave in this case?
In the case of classification, we can get a new class (undefined), but what
happens to regression models?

thanks in advance

regards

Anatoliy



--
Sent from: https://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Problem of "extrapolation" in machine learning

Eibe Frank
Standard multi-class classification methods (e.g., almost all of the ones in WEKA) do not actually have the ability to abstain from making a prediction (in WEKA, this would be implemented in a classifier by returning a missing value as the predicted class value, and the instance would be counted as an "unclassified" one). Standard multi-class methods such as J48/C4.5 will always assign one of the class values in the training data regardless of whether a test instance is in the range of the training data or not!

Regarding extrapolation, there are some regression methods that are clearly unsuitable for this. For example, regression trees built by REPTree in WEKA represent piece-wise constant predictors (each leaf node has a constant numeric value) so they will not be able to extrapolate. On the other hand, model trees, such as those generated with M5P, have linear models at the leaf nodes so they are potentially able to extrapolate. Obviously, they will only be able to do so successfully if there is a linear trend beyond the range of the training data.

Note that all standard machine learning methods and particularly standard evaluation methods such as k-fold cross-validation assume that the data are IID (independent and identically distributed). In practice, the distribution is often not constant, and models are rebuilt periodically to compensate for change in the underlying probability distribution.

Cheers,
Eibe

On Fri, Jan 3, 2020 at 12:04 AM Anatoliy <[hidden email]> wrote:
Hi everybody!
Can you advise what to read about the problem of "extrapolation" in machine
learning? This does not mean predicting the "future." For example, we set up
a model (classification or regression) on one data set. We use this model
for data that has its own minimum and maximum values ​​in attributes that
differ from the model in statistical characteristics and whose values ​​are
not present in the training set. How will the model behave in this case?
In the case of classification, we can get a new class (undefined), but what
happens to regression models?

thanks in advance

regards

Anatoliy



--
Sent from: https://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Problem of "extrapolation" in machine learning

Edward Wiskers
Hi all,

In practice, the distribution is often not constant, and models are rebuilt periodically to compensate for change in the underlying probability distribution.


Eibe, kindly two questions:

1- is it compulsory to have all the data distributed similarly? 

2- What it the general action to do in order to handle inconsistent distribution? Or no need to do any action?





Cheers,
Eibe

On Fri, Jan 3, 2020 at 12:04 AM Anatoliy <[hidden email]> wrote:
Hi everybody!
Can you advise what to read about the problem of "extrapolation" in machine
learning? This does not mean predicting the "future." For example, we set up
a model (classification or regression) on one data set. We use this model
for data that has its own minimum and maximum values ​​in attributes that
differ from the model in statistical characteristics and whose values ​​are
not present in the training set. How will the model behave in this case?
In the case of classification, we can get a new class (undefined), but what
happens to regression models?

thanks in advance

regards

Anatoliy



--
Sent from: https://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Problem of "extrapolation" in machine learning

Anatoliy
In reply to this post by Eibe Frank
Hi, Eibe.
Thank you for answering.
Uh, I need to "digest" the information.
So generalization may not help if there's no linear dependence? And how
would the neural network react in this case?

regards
Anatoliy



--
Sent from: https://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Problem of "extrapolation" in machine learning

Edward Wiskers
In reply to this post by Eibe Frank

In practice, the distribution is often not constant, and models are rebuilt periodically to compensate for change in the underlying probability distribution.


Eibe, kindly two questions:


1- is it compulsory to have all the data distributed similarly? 


2- What it the general action to do in order to handle inconsistent distribution? Or no need to do any action?

Thanks in advance.

Edward 

n Tue, 7 Jan 2020, 7:25 am Eibe Frank, <[hidden email]> wrote:
Standard multi-class classification methods (e.g., almost all of the ones in WEKA) do not actually have the ability to abstain from making a prediction (in WEKA, this would be implemented in a classifier by returning a missing value as the predicted class value, and the instance would be counted as an "unclassified" one). Standard multi-class methods such as J48/C4.5 will always assign one of the class values in the training data regardless of whether a test instance is in the range of the training data or not!

Regarding extrapolation, there are some regression methods that are clearly unsuitable for this. For example, regression trees built by REPTree in WEKA represent piece-wise constant predictors (each leaf node has a constant numeric value) so they will not be able to extrapolate. On the other hand, model trees, such as those generated with M5P, have linear models at the leaf nodes so they are potentially able to extrapolate. Obviously, they will only be able to do so successfully if there is a linear trend beyond the range of the training data.

Note that all standard machine learning methods and particularly standard evaluation methods such as k-fold cross-validation assume that the data are IID (independent and identically distributed). In practice, the distribution is often not constant, and models are rebuilt periodically to compensate for change in the underlying probability distribution.

Cheers,
Eibe

On Fri, Jan 3, 2020 at 12:04 AM Anatoliy <[hidden email]> wrote:
Hi everybody!
Can you advise what to read about the problem of "extrapolation" in machine
learning? This does not mean predicting the "future." For example, we set up
a model (classification or regression) on one data set. We use this model
for data that has its own minimum and maximum values ​​in attributes that
differ from the model in statistical characteristics and whose values ​​are
not present in the training set. How will the model behave in this case?
In the case of classification, we can get a new class (undefined), but what
happens to regression models?

thanks in advance

regards

Anatoliy



--
Sent from: https://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html