Current pull requests for weka

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Current pull requests for weka

Thomas Diesler
Dear Folks/Peter,

I’m looking into a possible integration of Camel with Weka for RedHat Fuse. The idea is to bring data mining capabilities to enterprise integration workflows. 
This would potentially benefit thousands of Camel users and eventually also RedHat’s enterprise customers. 
Weka would run on EAP and various other platforms that Fuse supports.

While exploring this in more detail, I stumbled across some (possibly minor) issues, which I collected here.

But first of all, I’d like to get the maven build going without failures, which would be addressed by …

#2 Improve regression diff and update some test fixtures
#3 Evaluation removes two lines from the info
#4 PrincipalComponents may have significant regression
#5 GausianProcessTest fails in numerous ways

The proposed changes for a given PR can reviewed/commented on like like this.
For some of these issues I’m not quite sure whether my proposed changes are valid.
What normally works well is that PRs are still created, which then get commented on/modified until the change can really be accepted upstream. 

Github supports issue tracking, code review process, continuous integration, etc quite nicely.
For example, if for some reason you need a diff or patch for a given PR, simply add the respective suffix to the PR’s URL.

The issues above, I currently track in my own fork of weka, which is of course less than ideal. 
Perhaps I missed it, is there an “official” weka issue tracker somewhere? 
If not, would it perhaps be possible to track weka issues here
This mailing list could then receive automated messages for issue/pr activities.

There is also this little improvement, which you may want to consider.

That’s all for & thanks for all the good work you’ve done already - its massive ;-)

cheers
— thomas

--------------------------------------------------
Pricipal Software Engineer at RedHat
Currently working on Enterprise Integration


_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Current pull requests for weka

Peter Reutemann

I’m looking into a possible integration of Camel with Weka for RedHat Fuse. The idea is to bring data mining capabilities to enterprise integration workflows. 
This would potentially benefit thousands of Camel users and eventually also RedHat’s enterprise customers. 
Weka would run on EAP and various other platforms that Fuse supports.

While exploring this in more detail, I stumbled across some (possibly minor) issues, which I collected here.

But first of all, I’d like to get the maven build going without failures, which would be addressed by …

#2 Improve regression diff and update some test fixtures
#3 Evaluation removes two lines from the info
#4 PrincipalComponents may have significant regression
#5 GausianProcessTest fails in numerous ways

The proposed changes for a given PR can reviewed/commented on like like this.
For some of these issues I’m not quite sure whether my proposed changes are valid.
What normally works well is that PRs are still created, which then get commented on/modified until the change can really be accepted upstream. 

Github supports issue tracking, code review process, continuous integration, etc quite nicely.
For example, if for some reason you need a diff or patch for a given PR, simply add the respective suffix to the PR’s URL.

The issues above, I currently track in my own fork of weka, which is of course less than ideal. 
Perhaps I missed it, is there an “official” weka issue tracker somewhere? 
If not, would it perhaps be possible to track weka issues here
This mailing list could then receive automated messages for issue/pr activities.

There is also this little improvement, which you may want to consider.



Thanks for PRs!

I'll leave it to Eibe and Mark (the actual maintainers of Weka) to comment on in detail, but here are a few of my own:
1. Weka's primary build system is ant. The Maven support I only added to allow deployments on Maven Central, to make it easier using Weka (and its packages) in other Maven projects.
2. Weka's unit tests are run against Oracle's JDK 1.8. Newer versions of Java produce numerical differences in the results, resulting in failures (thanks, Java, like this hasn't happened before!).
3. Weka 3.8 is tied to the Data Mining book (https://www.cs.waikato.ac.nz/ml/weka/book.html) and only receives bug fixes, the actual development happens in the subversion trunk, not the stable-3.8 branch.
4. The github repos are downstream mirrors of the subversion branches and only provided as a courtesy. Any patches etc should be made against the official subversion repository (Weka has been around much longer than git!):
5. In terms of discussing bugs, we usually use the mailing list for that. However, Pentaho runs a bug tracker for their Weka integration (which Mark Hall looks after):

HTH

Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 858-5174
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/

_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Current pull requests for weka

Peter Reutemann
> I'll leave it to Eibe and Mark (the actual maintainers of Weka) to comment on in detail, but here are a few of my own:

Forgot to mention that most people here are still on their summer
holidays. So replies may take a while...

Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 858-5174
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to: To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit
https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html