Polynomial Regression in Weka

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Polynomial Regression in Weka

Aftab Akram
Hi 
I was looking for some functions in weka through which I can perform polynomial regression on my dataset. I have already performed linear regression which resulted in high bias. Now I plan to incrementally increase degree of polynomial until reaching some acceptable level of bias. Now I am looking some function(s) where I can first test polynomial of degree 2 and so on. 
Thanks.  

AFTAB AKRAM
Doctrate Student 
South China Normal University 
Guangzhou, P.R. China 

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Polynomial Regression in Weka

Eibe Frank-2
Administrator
You could use a kernel-based method in conjunction with a polynomial kernel, e.g., GaussianProcesses or SMOreg (or LibSVM, as a potentially faster alternative to SMOreg).

Make sure you enable “lower order terms” when setting up the polynomial kernel.

Cheers,
Eibe
 

> On 9/03/2017, at 10:20 PM, Aftab Akram <[hidden email]> wrote:
>
> Hi
> I was looking for some functions in weka through which I can perform polynomial regression on my dataset. I have already performed linear regression which resulted in high bias. Now I plan to incrementally increase degree of polynomial until reaching some acceptable level of bias. Now I am looking some function(s) where I can first test polynomial of degree 2 and so on.
> Thanks.  
>
> AFTAB AKRAM
> Doctrate Student
> South China Normal University
> Guangzhou, P.R. China
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Polynomial Regression in Weka

Jenna
Hi. Is there a way to see the model that is generated in the form
<https://weka.8497.n7.nabble.com/file/t341/higher_degree.png>



--
Sent from: https://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Polynomial Regression in Weka

Michael Hall


On Jun 10, 2020, at 2:52 AM, Jenna <[hidden email]> wrote:

Hi. Is there a way to see the model that is generated in the form
<https://weka.8497.n7.nabble.com/file/t341/higher_degree.png>


Curve fitting?

Not sure how you would accomplish that with Weka. But with Apache commons math 3 maybe something like the below.
It converts Weka instances to double arrays and then uses commons math.
This was code I didn’t end up using and didn’t finish.
I instead did a data transform taking the natural log of both x and y and then doing a linear regression.
This assumes the relationship is a power law like…

Y = aX^b

And you can figure out ‘a’ and ‘b’ with linear regression of the logs..
I’ve been tinkering with that for a while and still am. Might post something to my site or GitHub if I finish.

I’m not that familiar with how to work the polynomial kernels in Weka. I would be interested if it provides an answer to your question.

Possibly of interest…


import org.apache.commons.math3.fitting.PolynomialCurveFitter;
import org.apache.commons.math3.fitting.WeightedObservedPoints;
WeightedObservedPoints obs = new WeightedObservedPoints();

System.out.println(yName + "  by " + xName);
double[] xA = data.attributeToDoubleArray(data.attribute(xName).index());
double[] yA = data.attributeToDoubleArray(data.attribute(yName).index());
/*
double[] lnXA = new double[xA.length];
double[] lnYA = new double[yA.length];
for (int i = 0; i < lnXA.length; i++) {
lnXA[i] = Math.log(xA[i]);
lnYA[i] = Math.log(yA[i]);
}
*/
for (int i = 0; i < xA.length; i++) {
obs.add(xA[i],yA[i]);
}
for (int degree = 1; degree < 4; degree++) {
PolynomialCurveFitter fitter = PolynomialCurveFitter.create(degree);
double[] coeff = fitter.fit(obs.toList());
System.out.println("Degree: " + degree + " SSE: " + sse(xA,yA,coeff));
for (int i = 0; i < coeff.length; i++) {
System.out.println(coeff[i]);
}
System.out.println("_______");
}
private static void sse(double[] xA, double[] yA, double[] coeff) {
double sum = 0;
int max = coeff.length;
for (i = 0; i < yA.length; i++) {
}
}

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Polynomial Regression in Weka

Michael Hall


On Jun 10, 2020, at 5:26 AM, Michael Hall <[hidden email]> wrote:



On Jun 10, 2020, at 2:52 AM, Jenna <[hidden email]> wrote:

Hi. Is there a way to see the model that is generated in the form
<https://weka.8497.n7.nabble.com/file/t341/higher_degree.png>


Curve fitting?

Not sure how you would accomplish that with Weka. But with Apache commons math 3 maybe something like the below.
It converts Weka instances to double arrays and then uses commons math.
This was code I didn’t end up using and didn’t finish.
I instead did a data transform taking the natural log of both x and y and then doing a linear regression.
This assumes the relationship is a power law like…

Y = aX^b


For me I was mainly interested in how a constantly increasing X affected the Y response variable. If you have more than one attribute to include in the polynomial I’m not sure how that would work. This might be too simplistic for you.



_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Polynomial Regression in Weka

Bill Bane
In reply to this post by Aftab Akram
I have run into this need at times, and used a filtered classifier that:

a. adds higher-order attribute(s) using /AddExpression/, then
b. performs /LinearRegression /(with the /EliminateColinearAttributes /flag
turned off).

For example, using a cubic model approach:

weka.classifiers.meta.FilteredClassifier -F "weka.filters.MultiFilter -F
\"weka.filters.unsupervised.attribute.AddExpression -E a1^2 -N X-2\" -F
\"weka.filters.unsupervised.attribute.AddExpression -E a1^3 -N X-3\" -F
\"weka.filters.AllFilter \"" -W weka.classifiers.functions.LinearRegression
-- -S 1 -C -R 1.0E-8 -additional-stats -num-decimal-places 4

In a couple of synthetic examples, this returns very similar accuracy to
SMOReg with a Poly Kernel of the same order (in this example, 3 -- with
UseLowerOrder = True).  The advantage of the Linear Regression approach is
that the outputs are more interpretable, and the coefficients can easily be
used offline for scenario modeling.



--
Sent from: https://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Polynomial Regression in Weka

Michael Hall


> On Jun 10, 2020, at 11:22 AM, Bill Bane <[hidden email]> wrote:
>
> I have run into this need at times, and used a filtered classifier that:
>
> a. adds higher-order attribute(s) using /AddExpression/, then
> b. performs /LinearRegression /(with the /EliminateColinearAttributes /flag
> turned off).
>
> For example, using a cubic model approach:
>
> weka.classifiers.meta.FilteredClassifier -F "weka.filters.MultiFilter -F
> \"weka.filters.unsupervised.attribute.AddExpression -E a1^2 -N X-2\" -F
> \"weka.filters.unsupervised.attribute.AddExpression -E a1^3 -N X-3\" -F
> \"weka.filters.AllFilter \"" -W weka.classifiers.functions.LinearRegression
> -- -S 1 -C -R 1.0E-8 -additional-stats -num-decimal-places 4
>
> In a couple of synthetic examples, this returns very similar accuracy to
> SMOReg with a Poly Kernel of the same order (in this example, 3 -- with
> UseLowerOrder = True).  The advantage of the Linear Regression approach is
> that the outputs are more interpretable, and the coefficients can easily be
> used offline for scenario modeling.

I’m not quite following how this allows you to do LinearRegression on nonlinear, higher order.
What I’m currently doing is copying the Instances and then taking the logs using Weka MathExpression, this flattens the higher order to linear, then do LinearRegression.
The R^2 indicates decent results for that. So I am assuming I am getting reasonable results and this is giving me a somewhat valid estimate of the degree of the nonlinear power law/polynomial, which is all I really want. I see using it on a comparison basis where higher degrees indicate more complexity and less scalability. So an ordering metric. If it correctly indicates which cases scale better then accuracy on the exact degree doesn’t really matter.
I did see somewhere that R^2  should not be used with nonlinear but with the log transform this is actually linear and is a metric used in the Empirical Complexity paper which uses the same approach.
Right now I’m doing some refactoring on the code to separate out a part that seems could be reusable. I have another use in mind. Also packaging it since it is getting beyond the simple command line tool I originally intended. Even considering a first attempt at modular, basically, just because on that.
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Polynomial Regression in Weka

Bill Bane
Hi -- this little write-up attached may help describe how linear regression
can be performed on nonlinear data.  Of course we need to be careful of
overfitting or extrapolating models using higher-order terms like this, but
for well-contained data sets, it can work satisfactorily.
Cubic_regression_example.pdf
<https://weka.8497.n7.nabble.com/file/t5855/Cubic_regression_example.pdf>  

For reference, here is the synthetic data:
Polynomial.csv <https://weka.8497.n7.nabble.com/file/t5855/Polynomial.csv>  



--
Sent from: https://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Polynomial Regression in Weka

Michael Hall


> On Jun 10, 2020, at 3:26 PM, Bill Bane <[hidden email]> wrote:
>
> Hi -- this little write-up attached may help describe how linear regression
> can be performed on nonlinear data.  Of course we need to be careful of
> overfitting or extrapolating models using higher-order terms like this, but
> for well-contained data sets, it can work satisfactorily.
> Cubic_regression_example.pdf
> <https://weka.8497.n7.nabble.com/file/t5855/Cubic_regression_example.pdf>  
>
> For reference, here is the synthetic data:
> Polynomial.csv <https://weka.8497.n7.nabble.com/file/t5855/Polynomial.csv>  
>

Nice. I’ve been doing some verifying of this in R and it seems to hold up.


              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -94.215687  33.590877  -2.805  0.00735 **
X           -48.130120   5.647738  -8.522 5.09e-11 ***
X2           -0.066272   0.255958  -0.259  0.79685    
X3            0.021859   0.003301   6.622 3.37e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 54.98 on 46 degrees of freedom
Multiple R-squared:  0.9707, Adjusted R-squared:  0.9688
F-statistic: 507.8 on 3 and 46 DF,  p-value: < 2.2e-16

With the X value higher orders included regression seems able to adjust things to get a good linear model.

This might work well for the OP to determine coefficients.

In my case I was more trying to determine degree, which ended up being around 14 if I remember right.

Maybe a small matter of doing some looping with yours to determine best fit, it might make a good optional check against mine.  
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Polynomial Regression in Weka

Michael Hall


On Jun 10, 2020, at 5:26 PM, Michael Hall <[hidden email]> wrote:



On Jun 10, 2020, at 3:26 PM, Bill Bane <[hidden email]> wrote:

Hi -- this little write-up attached may help describe how linear regression
can be performed on nonlinear data.  Of course we need to be careful of
overfitting or extrapolating models using higher-order terms like this, but
for well-contained data sets, it can work satisfactorily.
Cubic_regression_example.pdf
<https://weka.8497.n7.nabble.com/file/t5855/Cubic_regression_example.pdf>  

For reference, here is the synthetic data:
Polynomial.csv <https://weka.8497.n7.nabble.com/file/t5855/Polynomial.csv>  


Fwiw, I posted my current data. 

This follows up on something earlier where I thought GraalVM improved Weka memory management but it turned out to just be different gc settings. 
I thought you could come up with a tool to tune gc along the lines of what I had already been doing. Just keep increasing RandomForest iterations until you run out of memory. The settings that allow more iterations might offer improved memory management. Although, I don’t really have anything to prove that would generalize to other classifiers and their parameters. 
I also had the code record information about memory and garbage collection. Either to a csv or arff file.
default.csv is command line invocations with no gc  parameters. It ran out of memory doing RandomForest at about 6000 iterations.
test.csv is the current with different gc parms. It made it to 7000 iterations. 
So this code alone somewhat serves the original purpose it can to some extent indicate how well gc is working. 
However, I noticed that in either case increasing iterations seemed to go along in a very linear way as long as there was free memory. When free memory ran out things got nonlinear as gc tried to manage things on its own. The nonlinear still looked like it might be following a fairly well formed exponential type curve. I wondered if that could be modeled. 
To that end, if interested, you could look at either
X = iteration, Y = elapsed
or…
X = iteration Y = old_count
When things go nonlinear most of the action starts occurring with gc in the old gen memory pool.
You can see in the current dataset I did some extra runs to fill in the nonlinear part a little. The code sorts the instances by iteration to allow for this.
The analysis code removes from the instances all attributes except the X and Y.
It also makes sure the attribute of interest (e.g. elapsed or old_count) is strictly increasing. The last run can get an outofmemory error early. It eliminates instances from the back where increasing isn’t the case. 
The code then tries to determine the nonlinear break. It removes an instance from the back and checks to see if that improves linearity. If it does it adds the instance to a different nonlinear Instances. Repeating until removing doesn’t improve linearity.
Then we have our nonlinear instances ready for modeling.

For a second use I am considering a version that uses increasing sized dataset splits for any given classifier that will handle that dataset. Then see if at some point it goes nonlinear and model the complexity. To get an idea of how well different classifiers scale with increasing data.
If I finish this I mean at some point to put something together that explains this more clearly and looks a little better. 
The visualizations you had were nice. I wasn’t aware you could do some of those with Weka.

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html