How to calculate statistical significance of results in relation to a baseline result?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

How to calculate statistical significance of results in relation to a baseline result?

Yaakov HaCohen-Kerner
Hello all,

Given a set of classification results (accuracy results in percentage)

For example
A - A baseline result with an accuracy of 80% for running a particular learning method (eg SMO) with eg 1000 word unigrams.

And then we run an SMO (or other ML methods) with other features, for example
2000, 3000, 4000, 5000, ..., 10000 word unigrams
and we get various kinds of accuracy results such as 82%, 84%, 86%, ..., 90%, ...
and each of these results will be called Bi i.e. B1, B2, ... Bn

We want at a significance level of 5% (or alternatively 1%)
to know which of the results in Bi is better than A in the statistically significant level at the chosen significance level (which WEKA marks next to the result with 'V')
And which of the results in B is lower than the result of A statistically significant at the chosen significance level (which WEKA marks next to the result with '*').

My question is what method and formula are used by WEKA to get V or * or nothing
for each of the Bi results relative to the A result?

Thank you in advance for investing in a detailed answer,
Yaakov
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: How to calculate statistical significance of results in relation to a baseline result?

Peter Reutemann
> Given a set of classification results (accuracy results in percentage)
>
> For example
> A - A baseline result with an accuracy of 80% for running a particular learning method (eg SMO) with eg 1000 word unigrams.
>
> And then we run an SMO (or other ML methods) with other features, for example
> 2000, 3000, 4000, 5000, ..., 10000 word unigrams
> and we get various kinds of accuracy results such as 82%, 84%, 86%, ..., 90%, ...
> and each of these results will be called Bi i.e. B1, B2, ... Bn
>
> We want at a significance level of 5% (or alternatively 1%)
> to know which of the results in Bi is better than A in the statistically significant level at the chosen significance level (which WEKA marks next to the result with 'V')
> And which of the results in B is lower than the result of A statistically significant at the chosen significance level (which WEKA marks next to the result with '*').
>
> My question is what method and formula are used by WEKA to get V or * or nothing
> for each of the Bi results relative to the A result?
>
> Thank you in advance for investing in a detailed answer,

Not a detailed answer, but at least some pointers...

Weka's Experimenter uses the PairedCorrectedTTester class for computing these:
https://weka.sourceforge.io/doc.dev/weka/experiment/PairedCorrectedTTester.html

However, for cross-validation, it expects the evaluation results per fold.

Here's a full example of setting up, running and evaluating an
experiment using the Experimenter API:
https://waikato.github.io/weka-wiki/experimenter/using_the_experiment_api/

This example code makes use of the PairedCorrectedTTester.

Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 858-5174
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: How to calculate statistical significance of results in relation to a baseline result?

Yaakov HaCohen-Kerner
Dear Peter,

Thank you for your answer.

We found the appropriate functions in WEKA.
The problem is that the functions (if we understand correctly) only work
against a set of values, and not at the level of a single baseline result
versus a single new result.
If we try to run a single baseline result versus a single new result value
we get a simple comparison: is the baseline greater than the new result and
if so it gives V and this is not what we meant.

Do we have a mistake?

We just want to know whether
Given two values: a single new result and a baseline result,
  whether the new result
  is statistically significant level at the chosen significance level
  than the single baseline result

Thanks un advance and Best regards,
Yaakov



--
Sent from: https://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: How to calculate statistical significance of results in relation to a baseline result?

Peter Reutemann
> We found the appropriate functions in WEKA.
> The problem is that the functions (if we understand correctly) only work
> against a set of values, and not at the level of a single baseline result
> versus a single new result.
> If we try to run a single baseline result versus a single new result value
> we get a simple comparison: is the baseline greater than the new result and
> if so it gives V and this is not what we meant.
>
> Do we have a mistake?
>
> We just want to know whether
> Given two values: a single new result and a baseline result,
>   whether the new result
>   is statistically significant level at the chosen significance level
>   than the single baseline result

As far as I know, the t-test always works on groups, determining the
difference between them.

Not sure what you could use when just comparing single values.

Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 858-5174
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: How to calculate statistical significance of results in relation to a baseline result?

Yaakov HaCohen-Kerner
>As far as I know, the t-test always works on groups, determining the
>difference between them.

Yes,
I understand that this true for a group of results that measured the same
thing, e.g,
throwing the same die many times in order to know whether it is forged.
However, here we have results for different experiments, e.g.
A - A baseline result with an accuracy of 80% for running a particular
learning method (eg SMO) with eg *1000 word unigrams.*

And then we run an SMO (or other ML methods) with other features, for
example
2000, 3000, 4000, 5000, ..., 10000 word unigrams
and we get various kinds of accuracy results such as 82%, 84%, 86%, ...,
90%, ...
and each of these results will be called Bi i.e. B1, B2, ... Bn

Each result (A, B1, B2, ...) measured not the same "die" but another "die"
(e.g., with another number of word unigrams).

Is it correct to relate to these different experiments as a group for a
t-test?

Best regards,
Yaakov



--
Sent from: https://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: How to calculate statistical significance of results in relation to a baseline result?

Peter Reutemann
> >As far as I know, the t-test always works on groups, determining the
> >difference between them.
>
> Yes,
> I understand that this true for a group of results that measured the same
> thing, e.g,
> throwing the same die many times in order to know whether it is forged.
> However, here we have results for different experiments, e.g.
> A - A baseline result with an accuracy of 80% for running a particular
> learning method (eg SMO) with eg *1000 word unigrams.*
>
> And then we run an SMO (or other ML methods) with other features, for
> example
> 2000, 3000, 4000, 5000, ..., 10000 word unigrams
> and we get various kinds of accuracy results such as 82%, 84%, 86%, ...,
> 90%, ...
> and each of these results will be called Bi i.e. B1, B2, ... Bn
>
> Each result (A, B1, B2, ...) measured not the same "die" but another "die"
> (e.g., with another number of word unigrams).
>
> Is it correct to relate to these different experiments as a group for a
> t-test?

I'll have to admit, that I don't feel qualified enough to comment on that.

Cheers, Peter
--
Peter Reutemann
Dept. of Computer Science
University of Waikato, NZ
+64 (7) 858-5174
http://www.cms.waikato.ac.nz/~fracpete/
http://www.data-mining.co.nz/
_______________________________________________
Wekalist mailing list -- [hidden email]
Send posts to [hidden email]
To unsubscribe send an email to [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nz
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html