

Hi everybody!
Can you advise what to read about the problem of "extrapolation" in machine
learning? This does not mean predicting the "future." For example, we set up
a model (classification or regression) on one data set. We use this model
for data that has its own minimum and maximum values in attributes that
differ from the model in statistical characteristics and whose values are
not present in the training set. How will the model behave in this case?
In the case of classification, we can get a new class (undefined), but what
happens to regression models?
thanks in advance
regards
Anatoliy

Sent from: https://weka.8497.n7.nabble.com/_______________________________________________
Wekalist mailing list  [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nzList etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


Standard multiclass classification methods (e.g., almost all of the ones in WEKA) do not actually have the ability to abstain from making a prediction (in WEKA, this would be implemented in a classifier by returning a missing value as the predicted class value, and the instance would be counted as an "unclassified" one). Standard multiclass methods such as J48/C4.5 will always assign one of the class values in the training data regardless of whether a test instance is in the range of the training data or not!
Regarding extrapolation, there are some regression methods that are clearly unsuitable for this. For example, regression trees built by REPTree in WEKA represent piecewise constant predictors (each leaf node has a constant numeric value) so they will not be able to extrapolate. On the other hand, model trees, such as those generated with M5P, have linear models at the leaf nodes so they are potentially able to extrapolate. Obviously, they will only be able to do so successfully if there is a linear trend beyond the range of the training data.
Note that all standard machine learning methods and particularly standard evaluation methods such as kfold crossvalidation assume that the data are IID (independent and identically distributed). In practice, the distribution is often not constant, and models are rebuilt periodically to compensate for change in the underlying probability distribution.
Cheers, Eibe
Hi everybody!
Can you advise what to read about the problem of "extrapolation" in machine
learning? This does not mean predicting the "future." For example, we set up
a model (classification or regression) on one data set. We use this model
for data that has its own minimum and maximum values in attributes that
differ from the model in statistical characteristics and whose values are
not present in the training set. How will the model behave in this case?
In the case of classification, we can get a new class (undefined), but what
happens to regression models?
thanks in advance
regards
Anatoliy

Sent from: https://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list  [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
_______________________________________________
Wekalist mailing list  [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nzList etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


In practice, the distribution is often not constant, and models are rebuilt periodically to compensate for change in the underlying probability distribution.
Eibe, kindly two questions:
1 is it compulsory to have all the data distributed similarly?
2 What it the general action to do in order to handle inconsistent distribution? Or no need to do any action?
Thanks in advance.
Edward
Standard multiclass classification methods (e.g., almost all of the ones in WEKA) do not actually have the ability to abstain from making a prediction (in WEKA, this would be implemented in a classifier by returning a missing value as the predicted class value, and the instance would be counted as an "unclassified" one). Standard multiclass methods such as J48/C4.5 will always assign one of the class values in the training data regardless of whether a test instance is in the range of the training data or not!
Regarding extrapolation, there are some regression methods that are clearly unsuitable for this. For example, regression trees built by REPTree in WEKA represent piecewise constant predictors (each leaf node has a constant numeric value) so they will not be able to extrapolate. On the other hand, model trees, such as those generated with M5P, have linear models at the leaf nodes so they are potentially able to extrapolate. Obviously, they will only be able to do so successfully if there is a linear trend beyond the range of the training data.
Note that all standard machine learning methods and particularly standard evaluation methods such as kfold crossvalidation assume that the data are IID (independent and identically distributed). In practice, the distribution is often not constant, and models are rebuilt periodically to compensate for change in the underlying probability distribution.
Cheers, Eibe
Hi everybody!
Can you advise what to read about the problem of "extrapolation" in machine
learning? This does not mean predicting the "future." For example, we set up
a model (classification or regression) on one data set. We use this model
for data that has its own minimum and maximum values in attributes that
differ from the model in statistical characteristics and whose values are
not present in the training set. How will the model behave in this case?
In the case of classification, we can get a new class (undefined), but what
happens to regression models?
thanks in advance
regards
Anatoliy

Sent from: https://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list  [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
_______________________________________________
Wekalist mailing list  [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
_______________________________________________
Wekalist mailing list  [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nzList etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

