

Hello all,
Given a set of classification results (accuracy results in percentage)
For example
A  A baseline result with an accuracy of 80% for running a particular learning method (eg SMO) with eg 1000 word unigrams.
And then we run an SMO (or other ML methods) with other features, for example
2000, 3000, 4000, 5000, ..., 10000 word unigrams
and we get various kinds of accuracy results such as 82%, 84%, 86%, ..., 90%, ...
and each of these results will be called Bi i.e. B1, B2, ... Bn
We want at a significance level of 5% (or alternatively 1%)
to know which of the results in Bi is better than A in the statistically significant level at the chosen significance level (which WEKA marks next to the result with 'V')
And which of the results in B is lower than the result of A statistically significant at the chosen significance level (which WEKA marks next to the result with '*').
My question is what method and formula are used by WEKA to get V or * or nothing
for each of the Bi results relative to the A result?
Thank you in advance for investing in a detailed answer,
Yaakov
_______________________________________________
Wekalist mailing list  [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nzList etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


> Given a set of classification results (accuracy results in percentage)
>
> For example
> A  A baseline result with an accuracy of 80% for running a particular learning method (eg SMO) with eg 1000 word unigrams.
>
> And then we run an SMO (or other ML methods) with other features, for example
> 2000, 3000, 4000, 5000, ..., 10000 word unigrams
> and we get various kinds of accuracy results such as 82%, 84%, 86%, ..., 90%, ...
> and each of these results will be called Bi i.e. B1, B2, ... Bn
>
> We want at a significance level of 5% (or alternatively 1%)
> to know which of the results in Bi is better than A in the statistically significant level at the chosen significance level (which WEKA marks next to the result with 'V')
> And which of the results in B is lower than the result of A statistically significant at the chosen significance level (which WEKA marks next to the result with '*').
>
> My question is what method and formula are used by WEKA to get V or * or nothing
> for each of the Bi results relative to the A result?
>
> Thank you in advance for investing in a detailed answer,
Not a detailed answer, but at least some pointers...
Weka's Experimenter uses the PairedCorrectedTTester class for computing these:
https://weka.sourceforge.io/doc.dev/weka/experiment/PairedCorrectedTTester.htmlHowever, for crossvalidation, it expects the evaluation results per fold.
Here's a full example of setting up, running and evaluating an
experiment using the Experimenter API:
https://waikato.github.io/wekawiki/experimenter/using_the_experiment_api/This example code makes use of the PairedCorrectedTTester.
Cheers, Peter

Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 8585174
http://www.cms.waikato.ac.nz/~fracpete/http://www.datamining.co.nz/_______________________________________________
Wekalist mailing list  [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nzList etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


Dear Peter,
Thank you for your answer.
We found the appropriate functions in WEKA.
The problem is that the functions (if we understand correctly) only work
against a set of values, and not at the level of a single baseline result
versus a single new result.
If we try to run a single baseline result versus a single new result value
we get a simple comparison: is the baseline greater than the new result and
if so it gives V and this is not what we meant.
Do we have a mistake?
We just want to know whether
Given two values: a single new result and a baseline result,
whether the new result
is statistically significant level at the chosen significance level
than the single baseline result
Thanks un advance and Best regards,
Yaakov

Sent from: https://weka.8497.n7.nabble.com/_______________________________________________
Wekalist mailing list  [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nzList etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


> We found the appropriate functions in WEKA.
> The problem is that the functions (if we understand correctly) only work
> against a set of values, and not at the level of a single baseline result
> versus a single new result.
> If we try to run a single baseline result versus a single new result value
> we get a simple comparison: is the baseline greater than the new result and
> if so it gives V and this is not what we meant.
>
> Do we have a mistake?
>
> We just want to know whether
> Given two values: a single new result and a baseline result,
> whether the new result
> is statistically significant level at the chosen significance level
> than the single baseline result
As far as I know, the ttest always works on groups, determining the
difference between them.
Not sure what you could use when just comparing single values.
Cheers, Peter

Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 8585174
http://www.cms.waikato.ac.nz/~fracpete/http://www.datamining.co.nz/_______________________________________________
Wekalist mailing list  [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nzList etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


>As far as I know, the ttest always works on groups, determining the
>difference between them.
Yes,
I understand that this true for a group of results that measured the same
thing, e.g,
throwing the same die many times in order to know whether it is forged.
However, here we have results for different experiments, e.g.
A  A baseline result with an accuracy of 80% for running a particular
learning method (eg SMO) with eg *1000 word unigrams.*
And then we run an SMO (or other ML methods) with other features, for
example
2000, 3000, 4000, 5000, ..., 10000 word unigrams
and we get various kinds of accuracy results such as 82%, 84%, 86%, ...,
90%, ...
and each of these results will be called Bi i.e. B1, B2, ... Bn
Each result (A, B1, B2, ...) measured not the same "die" but another "die"
(e.g., with another number of word unigrams).
Is it correct to relate to these different experiments as a group for a
ttest?
Best regards,
Yaakov

Sent from: https://weka.8497.n7.nabble.com/_______________________________________________
Wekalist mailing list  [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nzList etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


> >As far as I know, the ttest always works on groups, determining the
> >difference between them.
>
> Yes,
> I understand that this true for a group of results that measured the same
> thing, e.g,
> throwing the same die many times in order to know whether it is forged.
> However, here we have results for different experiments, e.g.
> A  A baseline result with an accuracy of 80% for running a particular
> learning method (eg SMO) with eg *1000 word unigrams.*
>
> And then we run an SMO (or other ML methods) with other features, for
> example
> 2000, 3000, 4000, 5000, ..., 10000 word unigrams
> and we get various kinds of accuracy results such as 82%, 84%, 86%, ...,
> 90%, ...
> and each of these results will be called Bi i.e. B1, B2, ... Bn
>
> Each result (A, B1, B2, ...) measured not the same "die" but another "die"
> (e.g., with another number of word unigrams).
>
> Is it correct to relate to these different experiments as a group for a
> ttest?
I'll have to admit, that I don't feel qualified enough to comment on that.
Cheers, Peter

Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 8585174
http://www.cms.waikato.ac.nz/~fracpete/http://www.datamining.co.nz/_______________________________________________
Wekalist mailing list  [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nzList etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

