Calculating Odds Ratio on Toy dataset

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Calculating Odds Ratio on Toy dataset

S_Ali
I have a toy dataset used in someone Ph.D. thesis research. He calculated
balanced accuracy measure(ACC2) on that dataset. Following the same concept
and notations, I also calculated ACC2 values. But given Odds ratio I
couldn't calculate as given in his research. I attached my excel sheet where
I calculated values and his tables. Plz have a look.
<http://weka.8497.n7.nabble.com/file/t6439/table_3.png>
<http://weka.8497.n7.nabble.com/file/t6439/docs_frequencies_table3.png> .
ACC2_calculationthesis.xlsx
<http://weka.8497.n7.nabble.com/file/t6439/ACC2_calculationthesis.xlsx>  .
He plotted different feature selection measures using fpr and tpr. But my OR
values are not upto his assignment given in following images.
<http://weka.8497.n7.nabble.com/file/t6439/scatter_plot.png>



--
Sent from: http://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Calculating Odds Ratio on Toy dataset

S_Ali
Sorry, my question contains ambiguity. I am doing a thesis on filter-based
feature selection metrics. But I have few experts in this field and also
they are hard to find. I have few difficulties in calculating/interpreting
odds ratio on a toy dataset given in a research paper with DFS calculation
on same toy dataset. I calculated DFS accurately according to the formula
given in research paper. But I also don't know how I can apply odds ratio
separately without the intervention of classifier on the dataset. Please
ignore my vague interpretation of the results as I am learning in this
field.
the dataset is given as

doc     contents                                                        
categories
---------------------------------------------------------------------
D1      mouse wolf fish                                                    
C1
D2      mouse wolf tiger horse toad                                   C1
D3      mouse wolf fish tiger toad                                      C1
D4      elephant tiger zebra deer mouse horse                    C2
D5      elephant tiger zebra deer mouse horse                    C2
D6      mouse zebra horse deer                                         C2
D7      elephant tiger zebra buffalo mouse horse                 C2
D8      mouse pelican deer toad                                         C3
D9      pelican toad deer bat mouse duck cow                     C3
D10    pelican deer bat rat horse cow                                 C3
D11    mouse pelican rat                                                  
C3
D12    pelican toad hen bat mouse cow                              C3  
----------------------------------------------------------------------
Table of occurrences of words in each class(only giving wolf and mouse doc
frequencies for brevity)
    mouse      wolf
c1 3          3
c2 4          0
c3 4          0

DFS formula
OR(t_i,C_j)=log_2⁡〖(p(t_i│C_j )[p(t_i│(C_j ) ̅ )])/([1-p(t_i│C_j
)]p(t_i│(C_j ) ̅ ) )〗  
DFS(t_i )=∑_(j=1)^(r=no.of classes)▒(p(C_j│t_i ))/(p((t_i ) ̅│C_j
)+p(t_i│(C_j ) ̅ )+1)                                global score for dfs
metric
DFS(t_i,C_j )=(p(C_j│t_i ))/(p((t_i ) ̅│C_j )+p(t_i│(C_j ) ̅ )+1) local
score calculation for each class

for clear visualization, I put an image to these formulas
<http://weka.8497.n7.nabble.com/file/t6439/forumulas.png>

according to formulas computations are as follows
<http://weka.8497.n7.nabble.com/file/t6439/dfs_wolf.png>
<http://weka.8497.n7.nabble.com/file/t6439/dfs_mouse.png>

Values to match are given in the table as follows:
<http://weka.8497.n7.nabble.com/file/t6439/VGFSS.png>

where class sizes of C1 = 3, C2 = 4, C3 = 5;
But when I try to calculate OR metric values, the formula presented didn't
yield results according to the table.Please give me any suggestions if
possible.



       





--
Sent from: http://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html