The probabilities of a logistic regression are given by:

P(1| a,b,c) = 1 / (1 + Exp(w0 + w1***a** + w2***b** )) (Eq1)

where the ws are the weights, and a, b are the attribute values.

I have a database with 3 attributes and a class: attribute **a =** CATEG (nominal with 3 different values = multipara/primipara/novilha), attribute **b =** RACATO (nominal with 2 different values = nelore/angus). Class (DG is binary: 0/1)

I have run logistic regression in weka and obtain the following weights:

Class

Variable 1

==========================

CATEG=MULTIPARA -0.1952

CATEG=PRIMIPARA 0.1182

CATEG=NOVILHA 0.1953

RACATO=NELORE -0.0599

Intercept 0.4637

I do understand that if I have the following instance

CATEG= MULTIPARA **RACATO = NELORE**

then the probability of this instance having class 1 is

P(DG=1| instance) = 1/ (1+ Exp(0.4637 + -0.1952*1 + -0.0599***1**))

However, if I change **RACATO to ANGUS** then, my expression loses weight -0.0599, this is as if Nelore has a value of 1 and Angus value of 0, which makes sense since I have a binary nominal value.

P(DG=1| instance) = 1/ (1+ Exp(0.4637 + -0.1952*1 + -0.1403*1 + -0.0599***0**))

Therefore, as in Eq 1, weight **w2 = -0.0599 and the** **B**** ****attribute is binary and corresponds to RACATO, being 1 to NELORE and 0 to ANGUS**

However, If I have the instance where CATEG is changed from MULTIPARA to PRIMIPARA

**CATEG= PRIMIPARA** RACATO = NELORE

weka seems to give me another weight. **Instead of -0.1952 for CATEG = MULTIPARA, it shows weight 0.1182 for CATEG = PRIMIPARA**, and the probability is given by

P(DG=1| instance) = 1/ (1+ Exp(**0.1182**+ -0.1952*1 + -0.1403*1 + -0.0599*1))

Therefore, weight** w1 in Eq 1 is not a fixed value.****I expected that w1 was a constant and attribute a assumed value 1 for MULTIPARA, 2 for PRIMIPARA, and 3 for NOVILHA. I**nstead of what I expected, the attribute a in Eq1 for CATEG does not assume values 1, 2, and 3 for MULTIPARA, PRIMIPARA, and NOVILHA respectfully.

That is, the true equation is not Eq1. The true equation is given by Eq2:

P(1| a,b,c) = 1 / (1 + Exp(w0 + w11*a1 + w12***a2**+ w13***a3**** **+ w2***b** + w3***c**)) (Eq2)

where **a1, a2 and a3 are binary variables for CATEG = MULTIPARAS, PRIMIPARAS, AND NOVILHAS**

In R, we can do logistic regression, and the results will appear as Eq. 1 and not as weka's Eq.2, where nominal attributes are binarized.

Is there a way to make the regression similar to R? Because this is easy to interpret when an attribute is nominal with 3 possible values, however, when the attribute has 87 possible values, things start to get messy. In my case, I have a couple more attributes with 87 possible values and 20 possible values. And things get ugly.

Cheers,

Luisa