Evaluating results for a regression problem

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Evaluating results for a regression problem

Ayşe Arslan
Hello,

I start to use weka for a regression problem. According to some parameters, I want to predict a result value (bandwidth [MB/s] ).
Can you suggest me a source to understand whether the selected algorithm fits my dataset and results are good.

I know highlighted values are important, but what is the criteria to accept?
Thank you.



=== Run information ===

Scheme:       weka.classifiers.lazy.IBk -K 1 -W 0 -A "weka.core.neighboursearch.LinearNNSearch -A \"weka.core.EuclideanDistance -R first-last\""
Relation:     morethantriple-weka.filters.unsupervised.attribute.Remove-R2
Instances:    29856
Attributes:   7
              number_of_processes
              bytes
              romio_cb_read
              romio_cb_write
              striping_factor
              striping_unit
              bandwidth
Test mode:    10-fold cross-validation

=== Classifier model (full training set) ===

IB1 instance-based classifier
using 1 nearest neighbour(s) for classification


Time taken to build model: 0.01 seconds

=== Cross-validation ===
=== Summary ===

Correlation coefficient                  0.9578
Mean absolute error                    131.8909
Root mean squared error                330.838

Relative absolute error                 15.9919 %
Root relative squared error             28.9914 %
Total Number of Instances            29856    



_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Evaluating results for a regression problem

Michael Hall

=== Run information ===

Scheme:       weka.classifiers.lazy.IBk -K 1 -W 0 -A "weka.core.neighboursearch.LinearNNSearch -A \"weka.core.EuclideanDistance -R first-last\""
Relation:     morethantriple-weka.filters.unsupervised.attribute.Remove-R2
Instances:    29856
Attributes:   7
              number_of_processes
              bytes
              romio_cb_read
              romio_cb_write
              striping_factor
              striping_unit
              bandwidth
Test mode:    10-fold cross-validation

=== Classifier model (full training set) ===

IB1 instance-based classifier
using 1 nearest neighbour(s) for classification


Time taken to build model: 0.01 seconds

=== Cross-validation ===
=== Summary ===

Correlation coefficient                  0.9578
Mean absolute error                    131.8909
Root mean squared error                330.838

Relative absolute error                 15.9919 %
Root relative squared error             28.9914 %
Total Number of Instances            29856    

What are the results if you actually use LinearRegression. 
I might say if you were using that that you had a lot of collinearity - highly correlated but unusually large seeming errors -  but I am not used to using nearest neighbor for regression.


_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Evaluating results for a regression problem

Ayşe Arslan
The results when I use Linear Regression as follows

Linear Regression Model

bandwidth =

      1.2611 * number_of_processes +
      0      * bytes +
    -51.983  * romio_cb_read=automatic,enable +
    342.82   * romio_cb_write=automatic,enable +
     58.442  * striping_factor +
     -7.8023 * striping_unit +
    -31.5218

Time taken to build model: 0.12 seconds

=== Cross-validation ===
=== Summary ===

Correlation coefficient                  0.5877
Mean absolute error                    654.5445
Root mean squared error                923.2566
Relative absolute error                 79.3644 %
Root relative squared error             80.9051 %
Total Number of Instances            29856    


Thanks.

Michael Hall <[hidden email]>, 26 Oca 2020 Paz, 00:47 tarihinde şunu yazdı:

=== Run information ===

Scheme:       weka.classifiers.lazy.IBk -K 1 -W 0 -A "weka.core.neighboursearch.LinearNNSearch -A \"weka.core.EuclideanDistance -R first-last\""
Relation:     morethantriple-weka.filters.unsupervised.attribute.Remove-R2
Instances:    29856
Attributes:   7
              number_of_processes
              bytes
              romio_cb_read
              romio_cb_write
              striping_factor
              striping_unit
              bandwidth
Test mode:    10-fold cross-validation

=== Classifier model (full training set) ===

IB1 instance-based classifier
using 1 nearest neighbour(s) for classification


Time taken to build model: 0.01 seconds

=== Cross-validation ===
=== Summary ===

Correlation coefficient                  0.9578
Mean absolute error                    131.8909
Root mean squared error                330.838

Relative absolute error                 15.9919 %
Root relative squared error             28.9914 %
Total Number of Instances            29856    

What are the results if you actually use LinearRegression. 
I might say if you were using that that you had a lot of collinearity - highly correlated but unusually large seeming errors -  but I am not used to using nearest neighbor for regression.

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Evaluating results for a regression problem

Michael Hall
In reply to this post by Ayşe Arslan


On Jan 24, 2020, at 11:12 AM, Ayşe Arslan <[hidden email]> wrote:

Hello,

I start to use weka for a regression problem. According to some parameters, I want to predict a result value (bandwidth [MB/s] ).
Can you suggest me a source to understand whether the selected algorithm fits my dataset and results are good.

IBK 

Correlation coefficient                  0.9578
Mean absolute error                    131.8909
Root mean squared error                330.838 

Relative absolute error                 15.9919 %
Root relative squared error             28.9914 %

Linear Regression

Correlation coefficient                  0.5877
Mean absolute error                    654.5445
Root mean squared error                923.2566
Relative absolute error                 79.3644 %
Root relative squared error             80.9051 %

Whether the results are good or bad might depend on the data? How well it predicts the bandwidth? Relatively speaking I would say IBK  is giving you much better results than Linear Regression. Better correlation and lower errors. Have you tried varying the k value number of neighbors? There are other classifiers that can do regression, SMOReg is like SVM for regression if I remember right. You could try a few others, this might give you something to decide which gives you best results. 

Whether the final errors are acceptable might be up to you.


_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html