Hello all,
I am using the Cost sensitive matrix on a J48 decision tree classifier with the default options. Here is an sample of the output tree: | | | | | | | | | | | | AA180 <= 0.0172 | | | | | | | | | | | | | AA200 <= -0.30765: Active (2.34/0.73) | | | | | | | | | | | | | AA200 > -0.30765: Inactive (20.48) | | | | | | | | | | | | AA180 > 0.0172: Active (2.34/0.73) | | | | | | | | | | AA160 > 0.02918: Active (3.22) | | | | | AA134 > 0.00162 | | | | | | AA287 <= 0.14532: Inactive (24.13) | | | | | | AA287 > 0.14532: Active (5.56/0.73) What are the number in parenthesis after "Active" or "Inactive"? >From what I understand, then using a default matrix, the number in parenthesis represents the number of correctly classified/incorrectly classified. However here the numbers are not whole. Any help would be greatly appreciated, thanks in advance, EM _______________________________________________ Wekalist mailing list [hidden email] https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist |
Even without a cost matrix, the numbers might not be integers if you
have missing attribute values. That comes from the fact that C4.5 distributes examples according to their probable location when it deals with missing attributes. So half an example can go down one branch and the other half down the other branch. This is implemented through the use of weights. When no attribute value is missing and no cost matrix is used, all examples have weight = 1. In the example above the weights would become 1/2 and 1/2. When C4.5 "counts" the examples, it is in fact summing up their weights. Mary Ewy Mathe wrote: > Hello all, > I am using the Cost sensitive matrix on a J48 decision tree classifier > with the default options. Here is an sample of the output tree: > > | | | | | | | | | | | | AA180 <= 0.0172 > | | | | | | | | | | | | | AA200 <= -0.30765: > Active (2.34/0.73) > | | | | | | | | | | | | | AA200 > -0.30765: > Inactive (20.48) > | | | | | | | | | | | | AA180 > 0.0172: Active > (2.34/0.73) > | | | | | | | | | | AA160 > 0.02918: Active (3.22) > | | | | | AA134 > 0.00162 > | | | | | | AA287 <= 0.14532: Inactive (24.13) > | | | | | | AA287 > 0.14532: Active (5.56/0.73) > > What are the number in parenthesis after "Active" or "Inactive"? >>From what I understand, then using a default matrix, the number in > parenthesis represents the number of correctly classified/incorrectly > classified. However here the numbers are not whole. Any help would > be greatly appreciated, thanks in advance, > EM > > _______________________________________________ > Wekalist mailing list > [hidden email] > https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist > _______________________________________________ Wekalist mailing list [hidden email] https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist |
Hello Mary,
Thanks for your response. However, I have no missing attributes in my data. Furthermore, my cost matrix is: 0 5 1 0 Shouldn't the "counts" then be whole numbers? I am still unclear. Thanks, EM On 5/11/05, Mary Felkin <[hidden email]> wrote: > Even without a cost matrix, the numbers might not be integers if you > have missing attribute values. That comes from the fact that C4.5 > distributes examples according to their probable location when it > deals with missing attributes. So half an example can go down one > branch and the other half down the other branch. > > This is implemented through the use of weights. When no attribute > value is missing and no cost matrix is used, all examples have > weight = 1. In the example above the weights would become 1/2 and > 1/2. When C4.5 "counts" the examples, it is in fact summing up > their weights. > > Mary > > > Ewy Mathe wrote: > > Hello all, > > I am using the Cost sensitive matrix on a J48 decision tree classifier > > with the default options. Here is an sample of the output tree: > > > > | | | | | | | | | | | | AA180 <= 0.0172 > > | | | | | | | | | | | | | AA200 <= -0.30765: > > Active (2.34/0.73) > > | | | | | | | | | | | | | AA200 > -0.30765: > > Inactive (20.48) > > | | | | | | | | | | | | AA180 > 0.0172: Active > > (2.34/0.73) > > | | | | | | | | | | AA160 > 0.02918: Active (3.22) > > | | | | | AA134 > 0.00162 > > | | | | | | AA287 <= 0.14532: Inactive (24.13) > > | | | | | | AA287 > 0.14532: Active (5.56/0.73) > > > > What are the number in parenthesis after "Active" or "Inactive"? > >>From what I understand, then using a default matrix, the number in > > parenthesis represents the number of correctly classified/incorrectly > > classified. However here the numbers are not whole. Any help would > > be greatly appreciated, thanks in advance, > > EM > > > > _______________________________________________ > > Wekalist mailing list > > [hidden email] > > https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist > > > > _______________________________________________ Wekalist mailing list [hidden email] https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist |
Are you using the CostSenstiveClassifier with default options? Then the
data is re-weighted based on the costs. The costs are actually normalized to sum to one and the class distribution is also taken into account when the instance weights are set. This is based on: Ting, K.M. Inducing Cost-Sensitive Trees via Instance Weighting. Proceedings of The Second European Symposium on Principles of Data Mining and Knowledge Discovery. LNAI-1510, pp. 139-147, 1998. Cheers, Eibe On May 12, 2005, at 1:56 AM, Ewy Mathe wrote: > Hello Mary, > Thanks for your response. However, I have no missing attributes in my > data. Furthermore, my cost matrix is: > 0 5 > 1 0 > Shouldn't the "counts" then be whole numbers? I am still unclear. > Thanks, > EM > > On 5/11/05, Mary Felkin <[hidden email]> wrote: >> Even without a cost matrix, the numbers might not be integers if you >> have missing attribute values. That comes from the fact that C4.5 >> distributes examples according to their probable location when it >> deals with missing attributes. So half an example can go down one >> branch and the other half down the other branch. >> >> This is implemented through the use of weights. When no attribute >> value is missing and no cost matrix is used, all examples have >> weight = 1. In the example above the weights would become 1/2 and >> 1/2. When C4.5 "counts" the examples, it is in fact summing up >> their weights. >> >> Mary >> >> >> Ewy Mathe wrote: >>> Hello all, >>> I am using the Cost sensitive matrix on a J48 decision tree >>> classifier >>> with the default options. Here is an sample of the output tree: >>> >>> | | | | | | | | | | | | AA180 <= 0.0172 >>> | | | | | | | | | | | | | AA200 <= >>> -0.30765: >>> Active (2.34/0.73) >>> | | | | | | | | | | | | | AA200 > -0.30765: >>> Inactive (20.48) >>> | | | | | | | | | | | | AA180 > 0.0172: >>> Active >>> (2.34/0.73) >>> | | | | | | | | | | AA160 > 0.02918: Active >>> (3.22) >>> | | | | | AA134 > 0.00162 >>> | | | | | | AA287 <= 0.14532: Inactive (24.13) >>> | | | | | | AA287 > 0.14532: Active (5.56/0.73) >>> >>> What are the number in parenthesis after "Active" or "Inactive"? >>>> From what I understand, then using a default matrix, the number in >>> parenthesis represents the number of correctly classified/incorrectly >>> classified. However here the numbers are not whole. Any help would >>> be greatly appreciated, thanks in advance, >>> EM >>> >>> _______________________________________________ >>> Wekalist mailing list >>> [hidden email] >>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist >>> >> >> > > _______________________________________________ > Wekalist mailing list > [hidden email] > https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist _______________________________________________ Wekalist mailing list [hidden email] https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist |
Hello,
Yes, I did use the default options for the CostSensitiveClassifier, that makes sense now....Thank you! Cheers, EM On 5/11/05, Eibe Frank <[hidden email]> wrote: > Are you using the CostSenstiveClassifier with default options? Then the > data is re-weighted based on the costs. The costs are actually > normalized to sum to one and the class distribution is also taken into > account when the instance weights are set. This is based on: > > Ting, K.M. Inducing Cost-Sensitive Trees via Instance Weighting. > Proceedings of The Second European Symposium on Principles of Data > Mining and Knowledge Discovery. LNAI-1510, pp. 139-147, 1998. > > Cheers, > Eibe > > On May 12, 2005, at 1:56 AM, Ewy Mathe wrote: > > > Hello Mary, > > Thanks for your response. However, I have no missing attributes in my > > data. Furthermore, my cost matrix is: > > 0 5 > > 1 0 > > Shouldn't the "counts" then be whole numbers? I am still unclear. > > Thanks, > > EM > > > > On 5/11/05, Mary Felkin <[hidden email]> wrote: > >> Even without a cost matrix, the numbers might not be integers if you > >> have missing attribute values. That comes from the fact that C4.5 > >> distributes examples according to their probable location when it > >> deals with missing attributes. So half an example can go down one > >> branch and the other half down the other branch. > >> > >> This is implemented through the use of weights. When no attribute > >> value is missing and no cost matrix is used, all examples have > >> weight = 1. In the example above the weights would become 1/2 and > >> 1/2. When C4.5 "counts" the examples, it is in fact summing up > >> their weights. > >> > >> Mary > >> > >> > >> Ewy Mathe wrote: > >>> Hello all, > >>> I am using the Cost sensitive matrix on a J48 decision tree > >>> classifier > >>> with the default options. Here is an sample of the output tree: > >>> > >>> | | | | | | | | | | | | AA180 <= 0.0172 > >>> | | | | | | | | | | | | | AA200 <= > >>> -0.30765: > >>> Active (2.34/0.73) > >>> | | | | | | | | | | | | | AA200 > -0.30765: > >>> Inactive (20.48) > >>> | | | | | | | | | | | | AA180 > 0.0172: > >>> Active > >>> (2.34/0.73) > >>> | | | | | | | | | | AA160 > 0.02918: Active > >>> (3.22) > >>> | | | | | AA134 > 0.00162 > >>> | | | | | | AA287 <= 0.14532: Inactive (24.13) > >>> | | | | | | AA287 > 0.14532: Active (5.56/0.73) > >>> > >>> What are the number in parenthesis after "Active" or "Inactive"? > >>>> From what I understand, then using a default matrix, the number in > >>> parenthesis represents the number of correctly classified/incorrectly > >>> classified. However here the numbers are not whole. Any help would > >>> be greatly appreciated, thanks in advance, > >>> EM > >>> > >>> _______________________________________________ > >>> Wekalist mailing list > >>> [hidden email] > >>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist > >>> > >> > >> > > > > _______________________________________________ > > Wekalist mailing list > > [hidden email] > > https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist > > _______________________________________________ Wekalist mailing list [hidden email] https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist |
In reply to this post by Ewy Mathe
On Wed, May 11, 2005 at 15:56:42 +0200, Ewy Mathe wrote:
> Shouldn't the "counts" then be whole numbers? I am still unclear. FWIW, as part of some work I was doing to get J48 to work with large sample-survey weights, I have a version that displays the actual integer counts as well as the weighted counts, when they differ. This seemed useful for my purposes, but it does use up memory to maintain a second set of counts. I would be interested to know whether you would consider this of any value in the analysis you are doing. Joe _______________________________________________ Wekalist mailing list [hidden email] https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist |
Free forum by Nabble | Edit this page |