J48 Decision Tree Interpretation

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

J48 Decision Tree Interpretation

Ewy Mathe
Hello all,
I am using the Cost sensitive matrix on a J48 decision tree classifier
with the default options. Here is an sample of the output tree:

|   |   |   |   |   |   |   |   |   |   |   |   AA180 <= 0.0172
|   |   |   |   |   |   |   |   |   |   |   |   |   AA200 <= -0.30765:
Active (2.34/0.73)
|   |   |   |   |   |   |   |   |   |   |   |   |   AA200 > -0.30765:
Inactive (20.48)
|   |   |   |   |   |   |   |   |   |   |   |   AA180 > 0.0172: Active
(2.34/0.73)
|   |   |   |   |   |   |   |   |   |   AA160 > 0.02918: Active (3.22)
|   |   |   |   |   AA134 > 0.00162
|   |   |   |   |   |   AA287 <= 0.14532: Inactive (24.13)
|   |   |   |   |   |   AA287 > 0.14532: Active (5.56/0.73)

What are the number in parenthesis after "Active" or "Inactive"?
>From what I understand, then using a default matrix, the number in
parenthesis represents the number of correctly classified/incorrectly
classified.  However here the numbers are not whole.  Any help would
be greatly appreciated, thanks in advance,
EM

_______________________________________________
Wekalist mailing list
[hidden email]
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: J48 Decision Tree Interpretation

Mary.Felkin
Even without a cost matrix, the numbers might not be integers if you
have missing attribute values. That comes from the fact that C4.5
distributes examples according to their probable location when it
deals with missing attributes. So half an example can go down one
branch and the other half down the other branch.

This is implemented through the use of weights. When no attribute
value is missing and no cost matrix is used, all examples have
weight = 1. In the example above the weights would become 1/2 and
1/2. When C4.5 "counts" the examples, it is in fact summing up
their weights.

Mary


Ewy Mathe wrote:

> Hello all,
> I am using the Cost sensitive matrix on a J48 decision tree classifier
> with the default options. Here is an sample of the output tree:
>
> |   |   |   |   |   |   |   |   |   |   |   |   AA180 <= 0.0172
> |   |   |   |   |   |   |   |   |   |   |   |   |   AA200 <= -0.30765:
> Active (2.34/0.73)
> |   |   |   |   |   |   |   |   |   |   |   |   |   AA200 > -0.30765:
> Inactive (20.48)
> |   |   |   |   |   |   |   |   |   |   |   |   AA180 > 0.0172: Active
> (2.34/0.73)
> |   |   |   |   |   |   |   |   |   |   AA160 > 0.02918: Active (3.22)
> |   |   |   |   |   AA134 > 0.00162
> |   |   |   |   |   |   AA287 <= 0.14532: Inactive (24.13)
> |   |   |   |   |   |   AA287 > 0.14532: Active (5.56/0.73)
>
> What are the number in parenthesis after "Active" or "Inactive"?
>>From what I understand, then using a default matrix, the number in
> parenthesis represents the number of correctly classified/incorrectly
> classified.  However here the numbers are not whole.  Any help would
> be greatly appreciated, thanks in advance,
> EM
>
> _______________________________________________
> Wekalist mailing list
> [hidden email]
> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>


_______________________________________________
Wekalist mailing list
[hidden email]
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: J48 Decision Tree Interpretation

Ewy Mathe
Hello Mary,
Thanks for your response.  However, I have no missing attributes in my
data.  Furthermore, my cost matrix is:
0  5
1  0
Shouldn't the "counts" then be whole numbers?  I am still unclear.
Thanks,
EM

On 5/11/05, Mary Felkin <[hidden email]> wrote:

> Even without a cost matrix, the numbers might not be integers if you
> have missing attribute values. That comes from the fact that C4.5
> distributes examples according to their probable location when it
> deals with missing attributes. So half an example can go down one
> branch and the other half down the other branch.
>
> This is implemented through the use of weights. When no attribute
> value is missing and no cost matrix is used, all examples have
> weight = 1. In the example above the weights would become 1/2 and
> 1/2. When C4.5 "counts" the examples, it is in fact summing up
> their weights.
>
> Mary
>
>
> Ewy Mathe wrote:
> > Hello all,
> > I am using the Cost sensitive matrix on a J48 decision tree classifier
> > with the default options. Here is an sample of the output tree:
> >
> > |   |   |   |   |   |   |   |   |   |   |   |   AA180 <= 0.0172
> > |   |   |   |   |   |   |   |   |   |   |   |   |   AA200 <= -0.30765:
> > Active (2.34/0.73)
> > |   |   |   |   |   |   |   |   |   |   |   |   |   AA200 > -0.30765:
> > Inactive (20.48)
> > |   |   |   |   |   |   |   |   |   |   |   |   AA180 > 0.0172: Active
> > (2.34/0.73)
> > |   |   |   |   |   |   |   |   |   |   AA160 > 0.02918: Active (3.22)
> > |   |   |   |   |   AA134 > 0.00162
> > |   |   |   |   |   |   AA287 <= 0.14532: Inactive (24.13)
> > |   |   |   |   |   |   AA287 > 0.14532: Active (5.56/0.73)
> >
> > What are the number in parenthesis after "Active" or "Inactive"?
> >>From what I understand, then using a default matrix, the number in
> > parenthesis represents the number of correctly classified/incorrectly
> > classified.  However here the numbers are not whole.  Any help would
> > be greatly appreciated, thanks in advance,
> > EM
> >
> > _______________________________________________
> > Wekalist mailing list
> > [hidden email]
> > https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
> >
>
>

_______________________________________________
Wekalist mailing list
[hidden email]
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: J48 Decision Tree Interpretation

Eibe Frank
Are you using the CostSenstiveClassifier with default options? Then the
data is re-weighted based on the costs. The costs are actually
normalized to sum to one and the class distribution is also taken into
account when the instance weights are set. This is based on:

Ting, K.M. Inducing Cost-Sensitive Trees via Instance Weighting.
Proceedings of The Second European Symposium on Principles of Data
Mining and Knowledge Discovery. LNAI-1510, pp. 139-147, 1998.

Cheers,
Eibe

On May 12, 2005, at 1:56 AM, Ewy Mathe wrote:

> Hello Mary,
> Thanks for your response.  However, I have no missing attributes in my
> data.  Furthermore, my cost matrix is:
> 0  5
> 1  0
> Shouldn't the "counts" then be whole numbers?  I am still unclear.
> Thanks,
> EM
>
> On 5/11/05, Mary Felkin <[hidden email]> wrote:
>> Even without a cost matrix, the numbers might not be integers if you
>> have missing attribute values. That comes from the fact that C4.5
>> distributes examples according to their probable location when it
>> deals with missing attributes. So half an example can go down one
>> branch and the other half down the other branch.
>>
>> This is implemented through the use of weights. When no attribute
>> value is missing and no cost matrix is used, all examples have
>> weight = 1. In the example above the weights would become 1/2 and
>> 1/2. When C4.5 "counts" the examples, it is in fact summing up
>> their weights.
>>
>> Mary
>>
>>
>> Ewy Mathe wrote:
>>> Hello all,
>>> I am using the Cost sensitive matrix on a J48 decision tree
>>> classifier
>>> with the default options. Here is an sample of the output tree:
>>>
>>> |   |   |   |   |   |   |   |   |   |   |   |   AA180 <= 0.0172
>>> |   |   |   |   |   |   |   |   |   |   |   |   |   AA200 <=
>>> -0.30765:
>>> Active (2.34/0.73)
>>> |   |   |   |   |   |   |   |   |   |   |   |   |   AA200 > -0.30765:
>>> Inactive (20.48)
>>> |   |   |   |   |   |   |   |   |   |   |   |   AA180 > 0.0172:
>>> Active
>>> (2.34/0.73)
>>> |   |   |   |   |   |   |   |   |   |   AA160 > 0.02918: Active
>>> (3.22)
>>> |   |   |   |   |   AA134 > 0.00162
>>> |   |   |   |   |   |   AA287 <= 0.14532: Inactive (24.13)
>>> |   |   |   |   |   |   AA287 > 0.14532: Active (5.56/0.73)
>>>
>>> What are the number in parenthesis after "Active" or "Inactive"?
>>>> From what I understand, then using a default matrix, the number in
>>> parenthesis represents the number of correctly classified/incorrectly
>>> classified.  However here the numbers are not whole.  Any help would
>>> be greatly appreciated, thanks in advance,
>>> EM
>>>
>>> _______________________________________________
>>> Wekalist mailing list
>>> [hidden email]
>>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>>>
>>
>>
>
> _______________________________________________
> Wekalist mailing list
> [hidden email]
> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist


_______________________________________________
Wekalist mailing list
[hidden email]
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: J48 Decision Tree Interpretation

Ewy Mathe
Hello,
Yes, I did use the default options for the CostSensitiveClassifier,
that makes sense now....Thank you!
Cheers,
EM

On 5/11/05, Eibe Frank <[hidden email]> wrote:

> Are you using the CostSenstiveClassifier with default options? Then the
> data is re-weighted based on the costs. The costs are actually
> normalized to sum to one and the class distribution is also taken into
> account when the instance weights are set. This is based on:
>
> Ting, K.M. Inducing Cost-Sensitive Trees via Instance Weighting.
> Proceedings of The Second European Symposium on Principles of Data
> Mining and Knowledge Discovery. LNAI-1510, pp. 139-147, 1998.
>
> Cheers,
> Eibe
>
> On May 12, 2005, at 1:56 AM, Ewy Mathe wrote:
>
> > Hello Mary,
> > Thanks for your response.  However, I have no missing attributes in my
> > data.  Furthermore, my cost matrix is:
> > 0  5
> > 1  0
> > Shouldn't the "counts" then be whole numbers?  I am still unclear.
> > Thanks,
> > EM
> >
> > On 5/11/05, Mary Felkin <[hidden email]> wrote:
> >> Even without a cost matrix, the numbers might not be integers if you
> >> have missing attribute values. That comes from the fact that C4.5
> >> distributes examples according to their probable location when it
> >> deals with missing attributes. So half an example can go down one
> >> branch and the other half down the other branch.
> >>
> >> This is implemented through the use of weights. When no attribute
> >> value is missing and no cost matrix is used, all examples have
> >> weight = 1. In the example above the weights would become 1/2 and
> >> 1/2. When C4.5 "counts" the examples, it is in fact summing up
> >> their weights.
> >>
> >> Mary
> >>
> >>
> >> Ewy Mathe wrote:
> >>> Hello all,
> >>> I am using the Cost sensitive matrix on a J48 decision tree
> >>> classifier
> >>> with the default options. Here is an sample of the output tree:
> >>>
> >>> |   |   |   |   |   |   |   |   |   |   |   |   AA180 <= 0.0172
> >>> |   |   |   |   |   |   |   |   |   |   |   |   |   AA200 <=
> >>> -0.30765:
> >>> Active (2.34/0.73)
> >>> |   |   |   |   |   |   |   |   |   |   |   |   |   AA200 > -0.30765:
> >>> Inactive (20.48)
> >>> |   |   |   |   |   |   |   |   |   |   |   |   AA180 > 0.0172:
> >>> Active
> >>> (2.34/0.73)
> >>> |   |   |   |   |   |   |   |   |   |   AA160 > 0.02918: Active
> >>> (3.22)
> >>> |   |   |   |   |   AA134 > 0.00162
> >>> |   |   |   |   |   |   AA287 <= 0.14532: Inactive (24.13)
> >>> |   |   |   |   |   |   AA287 > 0.14532: Active (5.56/0.73)
> >>>
> >>> What are the number in parenthesis after "Active" or "Inactive"?
> >>>> From what I understand, then using a default matrix, the number in
> >>> parenthesis represents the number of correctly classified/incorrectly
> >>> classified.  However here the numbers are not whole.  Any help would
> >>> be greatly appreciated, thanks in advance,
> >>> EM
> >>>
> >>> _______________________________________________
> >>> Wekalist mailing list
> >>> [hidden email]
> >>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
> >>>
> >>
> >>
> >
> > _______________________________________________
> > Wekalist mailing list
> > [hidden email]
> > https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>
>

_______________________________________________
Wekalist mailing list
[hidden email]
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: J48 Decision Tree Interpretation

Joe Burpee
In reply to this post by Ewy Mathe
On Wed, May 11, 2005 at 15:56:42 +0200, Ewy Mathe wrote:
> Shouldn't the "counts" then be whole numbers?  I am still unclear.
 
FWIW, as part of some work I was doing to get J48 to work with large
sample-survey weights, I have a version that displays the actual integer
counts as well as the weighted counts, when they differ.  This seemed
useful for my purposes, but it does use up memory to maintain a second
set of counts.  I would be interested to know whether you would consider
this of any value in the analysis you are doing.

Joe


_______________________________________________
Wekalist mailing list
[hidden email]
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
Loading...