> On 11/01/2015, at 6:26 pm, Fernando Bugni <

[hidden email]> wrote:

>

> I know that: InfoGain(Class,Attribute) = H(Class) - H(Class | Attribute)

> and there's a property for entropy: 0 <= H <= log_a (n) ... where (i think) a = 2 and n = number of samples.

> But I don't know know to use this propery in order to calculate the range of InformationGain. If I have a classification in two groups, how could use this property to calculate H(class) and H(class | Attribute) ?

The minimum information gain is zero, when H(Class) = H(Class | Attribute).

The maximum is achieved when H(Class | Attribute) = 0.

Entropy is maximal when all classes are equally likely, in which case it is log_b(c), where b = 2 (if entropy is calculated in bits) and c is the *NUMBER OF CLASS VALUES*.

In the two-class case, the maximum info gain is 1 bit (and occurs when both classes are equally likely a priori, before the attribute is considered).

However, in most datasets, not all classes are equally likely a priori, so H(Class) will be smaller than log_b(c).

You can calculate H(Class) in WEKA, in the Classify panel, by running any classifier (e.g., ZeroR), setting “Use training set” for evaluation, and “Output entropy evaluation measures” under “More options…”.

For example, running ZeroR on the iris data gives:

=== Summary ===

Correctly Classified Instances 50 33.3333 %

Incorrectly Classified Instances 100 66.6667 %

Kappa statistic 0

K&B Relative Info Score 0 %

K&B Information Score 0 bits 0 bits/instance

Class complexity | order 0 237.7444 bits 1.585 bits/instance

Class complexity | scheme 237.7444 bits 1.585 bits/instance

Complexity improvement (Sf) 0 bits 0 bits/instance

Mean absolute error 0.4444

Root mean squared error 0.4714

Relative absolute error 100 %

Root relative squared error 100 %

Coverage of cases (0.95 level) 100 %

Mean rel. region size (0.95 level) 100 %

Total Number of Instances 150

"Class complexity | order 0" gives you H(Class). It is 1.585 bits/instance for the iris data (rounded) because, for this data, all three classes are equally likely, so H(Class)=log_2(3).

Cheers,

Eibe

_______________________________________________

Wekalist mailing list

Send posts to:

[hidden email]
List info and subscription status:

http://list.waikato.ac.nz/mailman/listinfo/wekalistList etiquette:

http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html