Random Forest in Weka vs in Python: Different Results

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Random Forest in Weka vs in Python: Different Results

mcbenly
Hi,
I am getting different result for classifiers in Weka and in Python.

I am interested in ROC-AUC score. Weka is giving me 90% for Random Forest
and Python around 80%.

I understand Weka's ROC score is weighted, but that is huge difference
anyway.

This is the config for Python
forest = RandomForestClassifier(n_estimators=300, random_state=1,
bootstrap=True, criterion='entropy')

I have also tried with no parameter set
forest = RandomForestClassifier()

But still no luck...

Any help here?


Thanks,



--
Sent from: https://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Random Forest in Weka vs in Python: Different Results

Peter Reutemann-3
What Python framework are you using?

Cheers, Peter

On October 24, 2020 6:39:57 AM GMT+13:00, mcbenly <[hidden email]> wrote:

>Hi,
>I am getting different result for classifiers in Weka and in Python.
>
>I am interested in ROC-AUC score. Weka is giving me 90% for Random
>Forest
>and Python around 80%.
>
>I understand Weka's ROC score is weighted, but that is huge difference
>anyway.
>
>This is the config for Python
>forest = RandomForestClassifier(n_estimators=300, random_state=1,
>bootstrap=True, criterion='entropy')
>
>I have also tried with no parameter set
>forest = RandomForestClassifier()
>
>But still no luck...
>
>Any help here?
>
>
>Thanks,
>
>
>
>--
>Sent from: https://weka.8497.n7.nabble.com/
>_______________________________________________
>Wekalist mailing list -- [hidden email]
>Send posts to [hidden email]
>To unsubscribe send an email to [hidden email]
>To subscribe, unsubscribe, etc., visit
>https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
>List etiquette:
>http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 577-5304
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Random Forest in Weka vs in Python: Different Results

mcbenly
Thanks Peter for looking;

I have tried in Python 2.7, 3.8, and Google's Colab.


Thanks,



--
Sent from: https://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Random Forest in Weka vs in Python: Different Results

Michael Hall


> On Oct 24, 2020, at 9:59 AM, mcbenly <[hidden email]> wrote:
>
> Thanks Peter for looking;
>
> I have tried in Python 2.7, 3.8, and Google's Colab.
>
>
> Thanks,

I usually look at accuracy rather than ROC so I am not sure how huge a difference this is. Some difference would be expected. You can get somewhat different results with Weka RandomForest alone just by changing the Random seed. I believe it defaults to 100 iterations/trees rather than 300. There are other hyperparameters that could be different, number of attributes used and depth of trees I believe are a couple. The base tree classifier would probably be different? I’m not sure about the weighting, someone else would have to speak to that.
You found an instance where Weka seems to perform better. Nice?
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Random Forest in Weka vs in Python: Different Results

Peter Reutemann
In reply to this post by mcbenly
> I am getting different result for classifiers in Weka and in Python.
>
> I am interested in ROC-AUC score. Weka is giving me 90% for Random Forest
> and Python around 80%.
>
> I understand Weka's ROC score is weighted, but that is huge difference
> anyway.
>
> This is the config for Python
> forest = RandomForestClassifier(n_estimators=300, random_state=1,
> bootstrap=True, criterion='entropy')
>
> I have also tried with no parameter set
> forest = RandomForestClassifier()
>
> But still no luck...

I'm presuming that you are using sklean's random forest implementation.

Maybe this github issue is still relevant?
https://github.com/scikit-learn/scikit-learn/issues/1696

Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 577-5304
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Random Forest in Weka vs in Python: Different Results

Michael Hall


> On Oct 24, 2020, at 4:37 PM, Peter Reutemann <[hidden email]> wrote:
>
>> I am getting different result for classifiers in Weka and in Python.
>>
>> I am interested in ROC-AUC score. Weka is giving me 90% for Random Forest
>> and Python around 80%.
>>
>> I understand Weka's ROC score is weighted, but that is huge difference
>> anyway.
>>
>> This is the config for Python
>> forest = RandomForestClassifier(n_estimators=300, random_state=1,
>> bootstrap=True, criterion='entropy')
>>
>> I have also tried with no parameter set
>> forest = RandomForestClassifier()
>>
>> But still no luck...
>
> I'm presuming that you are using sklean's random forest implementation.
>
> Maybe this github issue is still relevant?
> https://github.com/scikit-learn/scikit-learn/issues/1696
>
It does seem a little more likely that you would wonder why one is not doing so well, rather than asking the one that is doing better why is it doing so good?
I think I’ve asked about different results before myself. Probably concerning R but don’t remember the thread.
But I think there are a fair number of moving parts and no reference that implementations are supposed to conform to for results for any given dataset.
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Random Forest in Weka vs in Python: Different Results

Michael Hall


On Oct 24, 2020, at 9:41 PM, Michael Hall <[hidden email]> wrote:




Maybe this github issue is still relevant?
https://github.com/scikit-learn/scikit-learn/issues/1696


But looking closer at this there does seem to be a concrete suggestion for getting more similar results if the dataset is binary.




_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Random Forest in Weka vs in Python: Different Results

mcbenly
Thanks Michael, already had looked at this and I am following the same
approach.

I am glad to see good result in Weka. But just curious to know what is
causing this.


Many thanks,



--
Sent from: https://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Random Forest in Weka vs in Python: Different Results

Michael Hall


> On Oct 25, 2020, at 5:28 PM, mcbenly <[hidden email]> wrote:
>
> Thanks Michael, already had looked at this and I am following the same
> approach.
>
> I am glad to see good result in Weka. But just curious to know what is
> causing this.

Did it work for you?
The answer there seemed to be on the Python side.
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Random Forest in Weka vs in Python: Different Results

mcbenly
Apologies for late response here.
Actually I had already applied that approach but had no luck.

Still Weka result is much higher than Python's classifiers. I also tried R
extension in Weka, Its producing same result as Weka and much higher than
Python.

So still no explanation about why Weka is producing higher results.






--
Sent from: https://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html