# Logistic Regression equation - binarization of nominal attributes Classic List Threaded 2 messages The probabilities of a logistic regression are given by:P(1| a,b,c) = 1 / (1 + Exp(w0 + w1*a + w2*b ))   (Eq1)where the ws are the weights, and a, b are the attribute values.I have a database with 3 attributes and a class: attribute a = CATEG (nominal with 3 different values = multipara/primipara/novilha), attribute b = RACATO (nominal with 2 different values = nelore/angus). Class (DG  is binary: 0/1)I have run logistic regression in weka and obtain the following weights:                     ClassVariable                 1==========================CATEG=MULTIPARA    -0.1952CATEG=PRIMIPARA     0.1182CATEG=NOVILHA       0.1953RACATO=NELORE      -0.0599Intercept           0.4637I do understand that if I have the following instanceCATEG= MULTIPARA   RACATO = NELOREthen the probability of this instance having class 1 is P(DG=1| instance)  = 1/ (1+ Exp(0.4637 +  -0.1952*1 +  -0.0599*1))However, if I change RACATO to ANGUS then, my expression loses weight -0.0599, this is as if Nelore has a value of 1 and Angus value of 0, which makes sense since I have a binary nominal value.P(DG=1| instance)  = 1/ (1+ Exp(0.4637 +  -0.1952*1 + -0.1403*1 + -0.0599*0))  Therefore, as in Eq 1, weight w2 = -0.0599 and the B attribute is binary and corresponds to RACATO, being 1 to NELORE and 0 to ANGUSHowever, If I have the instance where CATEG is changed from MULTIPARA to PRIMIPARACATEG= PRIMIPARA  RACATO = NELOREweka seems to give me another weight. Instead of -0.1952 for CATEG = MULTIPARA, it shows weight  0.1182 for CATEG = PRIMIPARA, and the probability is given byP(DG=1| instance)  = 1/ (1+ Exp(0.1182+  -0.1952*1 + -0.1403*1 + -0.0599*1)) Therefore, weight w1 in Eq 1 is not a fixed value.I expected that w1 was a constant and attribute a assumed value 1 for MULTIPARA, 2 for PRIMIPARA, and 3 for NOVILHA. Instead of what I expected, the attribute a in Eq1 for CATEG does not assume values 1, 2, and 3 for MULTIPARA, PRIMIPARA, and NOVILHA respectfully. That is, the true equation is not Eq1. The true equation is given by Eq2:  P(1| a,b,c) = 1 / (1 + Exp(w0 + w11*a1 + w12*a2+ w13*a3 + w2*b + w3*c))   (Eq2)  where a1, a2 and a3 are binary variables for CATEG = MULTIPARAS, PRIMIPARAS, AND NOVILHASIn R, we can do logistic regression, and the results will appear as Eq. 1 and not as weka's Eq.2, where nominal attributes are binarized.Is there a way to make the regression similar to R? Because this is easy to interpret when an attribute is nominal with 3 possible values, however, when the attribute has 87 possible values, things start to get messy. In my case, I have a couple more attributes with 87 possible values and 20 possible values. And things get ugly.Cheers,Luisa _______________________________________________ Wekalist mailing list -- [hidden email] Send posts to [hidden email] To unsubscribe send an email to [hidden email] To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nzList etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
 Administrator Yes, you are right, Logistic in WEKA binarizes nominal attributes. You can see exactly what happens by running the unsupervised NominalToBinary filter on your data, e.g., in the Preprocess panel.Coding the nominal values as integers 1, 2, and 3 seems problematic when using logistic regression because it assumes interval scale. However, there is an “OrdinalToNumeric” filter that you can apply to achieve this.It might be better to merge nominal values instead. There is a supervised MergeNominalValues filter that applies a greedy method based on the chi-squared test to merge categories of nominal attributes. Just make sure you apply this *supervised* filter as part of the FilteredClassifier to avoid optimistic performance estimates.Cheers,EibeOn 5/06/2020, at 1:34 AM, Luisa <[hidden email]> wrote:The probabilities of a logistic regression are given by:P(1| a,b,c) = 1 / (1 + Exp(w0 + w1*a + w2*b ))   (Eq1)where the ws are the weights, and a, b are the attribute values.I have a database with 3 attributes and a class: attribute a = CATEG (nominal with 3 different values = multipara/primipara/novilha), attribute b = RACATO (nominal with 2 different values = nelore/angus). Class (DG  is binary: 0/1)I have run logistic regression in weka and obtain the following weights:                     ClassVariable                 1==========================CATEG=MULTIPARA    -0.1952CATEG=PRIMIPARA     0.1182CATEG=NOVILHA       0.1953RACATO=NELORE      -0.0599Intercept           0.4637I do understand that if I have the following instanceCATEG= MULTIPARA   RACATO = NELOREthen the probability of this instance having class 1 is P(DG=1| instance)  = 1/ (1+ Exp(0.4637 +  -0.1952*1 +  -0.0599*1))However, if I change RACATO to ANGUS then, my expression loses weight -0.0599, this is as if Nelore has a value of 1 and Angus value of 0, which makes sense since I have a binary nominal value.P(DG=1| instance)  = 1/ (1+ Exp(0.4637 +  -0.1952*1 + -0.1403*1 + -0.0599*0))  Therefore, as in Eq 1, weight w2 = -0.0599 and the B attribute is binary and corresponds to RACATO, being 1 to NELORE and 0 to ANGUSHowever, If I have the instance where CATEG is changed from MULTIPARA to PRIMIPARACATEG= PRIMIPARA  RACATO = NELOREweka seems to give me another weight. Instead of -0.1952 for CATEG = MULTIPARA, it shows weight  0.1182 for CATEG = PRIMIPARA, and the probability is given byP(DG=1| instance)  = 1/ (1+ Exp(0.1182+  -0.1952*1 + -0.1403*1 + -0.0599*1)) Therefore, weight w1 in Eq 1 is not a fixed value.I expected that w1 was a constant and attribute a assumed value 1 for MULTIPARA, 2 for PRIMIPARA, and 3 for NOVILHA. Instead of what I expected, the attribute a in Eq1 for CATEG does not assume values 1, 2, and 3 for MULTIPARA, PRIMIPARA, and NOVILHA respectfully. That is, the true equation is not Eq1. The true equation is given by Eq2:  P(1| a,b,c) = 1 / (1 + Exp(w0 + w11*a1 + w12*a2+ w13*a3 + w2*b + w3*c))   (Eq2)  where a1, a2 and a3 are binary variables for CATEG = MULTIPARAS, PRIMIPARAS, AND NOVILHASIn R, we can do logistic regression, and the results will appear as Eq. 1 and not as weka's Eq.2, where nominal attributes are binarized.Is there a way to make the regression similar to R? Because this is easy to interpret when an attribute is nominal with 3 possible values, however, when the attribute has 87 possible values, things start to get messy. In my case, I have a couple more attributes with 87 possible values and 20 possible values. And things get ugly.Cheers,Luisa _______________________________________________Wekalist mailing list -- [hidden email]Send posts to [hidden email]To unsubscribe send an email to [hidden email]To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nzList etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html_______________________________________________ Wekalist mailing list -- [hidden email] Send posts to [hidden email] To unsubscribe send an email to [hidden email] To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/postorius/lists/wekalist.list.waikato.ac.nzList etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html