C45 - pruning method - raising subtree or replacement subtree

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

C45 - pruning method - raising subtree or replacement subtree

Driver007
Hello

I am reading the book and i got confused with the method C4.5 is using to
prune. The book says its using sub-tree raising, but reading the book
further it says that c4.5 is using the pessimistic pruning and it looks like
a subtree replacement operation.
Which one of them is C4.5 using?

Thanks in advance



--
Sent from: http://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: C45 - pruning method - raising subtree or replacement subtree

Eibe Frank-3
By default, it uses both. In J48, there is an option to turn subtree raising off.

Cheers,
Eibe

On Fri, 1 Sep 2017 at 8:27 AM, Driver007 <[hidden email]> wrote:
Hello

I am reading the book and i got confused with the method C4.5 is using to
prune. The book says its using sub-tree raising, but reading the book
further it says that c4.5 is using the pessimistic pruning and it looks like
a subtree replacement operation.
Which one of them is C4.5 using?

Thanks in advance



--
Sent from: http://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: C45 - pruning method - raising subtree or replacement subtree

Driver007
Dear Eibe,


Could you provide me some links what mathematical formula is used for the
sub-tree raising?


Thanks in advance
John.



--
Sent from: http://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: C45 - pruning method - raising subtree or replacement subtree

Eibe Frank-2
Administrator
It’s the same formula as the one used for subtree replacement: the upper limit of the confidence interval is used as the error estimate.

Cheers,
Eibe

> On 1/09/2017, at 11:28 AM, Driver007 <[hidden email]> wrote:
>
> Dear Eibe,
>
>
> Could you provide me some links what mathematical formula is used for the
> sub-tree raising?
>
>
> Thanks in advance
> John.
>
>
>
> --
> Sent from: http://weka.8497.n7.nabble.com/
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: C45 - pruning method - raising subtree or replacement subtree

Driver007
Dear Eibe,


The pessimistic pruning is considered as a subtree replacement operation, If
i understood correctly.

2 ) In case both techniques (Subtree raising and reduce error pruning) are
true (in j48 options) how the pruning process works?



Thanks in advance,
John



--
Sent from: http://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: C45 - pruning method - raising subtree or replacement subtree

Eibe Frank-3
Essentially, for each internal node of the tree, error estimates for three scenarios are compared : a) leaving the tree as is, b) performing subtree replacement, and c) performing subtree raising (by raising the subtree attached to the largest branch). The best option is chosen. 

The error estimate is computed in the same way in each case, by taking the upper limit of a confidence interval computed from the error on the training data (per leaf node).

Note that, if subtree raising occurs, the whole procedure, including all three options, is performed again on the resulting pruned tree.

Cheers,
Eibe

On Fri, Sep 1, 2017 at 2:38 PM, Driver007 <[hidden email]> wrote:
Dear Eibe,


The pessimistic pruning is considered as a subtree replacement operation, If
i understood correctly.

2 ) In case both techniques (Subtree raising and reduce error pruning) are
true (in j48 options) how the pruning process works?



Thanks in advance,
John



--
Sent from: http://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: C45 - pruning method - raising subtree or replacement subtree

Driver007
Dear Eibe,


Thanks a lot for the answers they were very enlightening. I have still a few
question, but i do not know if i should create a new topic.

In page 214 of the book  or 247 for pdf it writes for sub tree raising:  Of
course, if we perform this raising operation, it is necessary to reclassify
the examples at the nodes marked 4 and 5 into the new subtree headed by C.
This is why the daughters of that node are marked with primes: 1',2', and
3'—to indicate that they are not the same as the original daughters 1, 2,
and 3 but differ by the inclusion of the examples originally covered by 4
and 5. "  

1 ) The  two examples of the internal node "B" will be reclassified to the
internal node 3 the way they are portrayed in the image i uploaded?

<http://weka.8497.n7.nabble.com/file/t6270/example.png>

2) As far as the training set is considered In
http://weka.8497.n7.nabble.com/Use-training-set-td39248.html you answered
that: "Use training set" computes what is also known as the resubstitution
estimate. This estimate is often hopelessly optimistic but it can be useful
for diagnostic purposes."

Does that mean that "training set"  makes its own model and then it test it
on the same data? ( I got a bit confused)

3) In "percentage  Split" data is first randomized by default . Does the
same happen with K - fold cross validation?






--
Sent from: http://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: C45 - pruning method - raising subtree or replacementsubtree

Eibe Frank-2
Administrator
  1. I think you understand the basic mechanism correctly, but the redistributed instances may not necessarily end up in the 2nd and 3rd leaf node respectively. It depends on their values for attribute C.

 

  1. The Explorer’s Classify panel will always build a tree for the full dataset as loaded into the Preprocess panel. This is the tree that is shown at the start of the output, before performance statistics are shown. If you perform evaluation on the training data, no other tree will be built and the tree that is shown will simply be evaluated on the whole dataset as loaded into the Preprocess panel. The performance statistics are then based on this evaluation.

 

  1. Yes, the data is randomized when you perform k-fold cross-validation in WEKA.

 

Cheers,

Eibe

 

From: [hidden email]
Sent: Saturday, 2 September 2017 6:31 AM
To: [hidden email]
Subject: Re: [Wekalist] C45 - pruning method - raising subtree or replacementsubtree

 

Dear Eibe,

 

 

Thanks a lot for the answers they were very enlightening. I have still a few

question, but i do not know if i should create a new topic.

 

In page 214 of the book  or 247 for pdf it writes for sub tree raising:  Of

course, if we perform this raising operation, it is necessary to reclassify

the examples at the nodes marked 4 and 5 into the new subtree headed by C.

This is why the daughters of that node are marked with primes: 1',2', and

3'—to indicate that they are not the same as the original daughters 1, 2,

and 3 but differ by the inclusion of the examples originally covered by 4

and 5. " 

 

1 ) The  two examples of the internal node "B" will be reclassified to the

internal node 3 the way they are portrayed in the image i uploaded?

 

<http://weka.8497.n7.nabble.com/file/t6270/example.png>

 

2) As far as the training set is considered In

http://weka.8497.n7.nabble.com/Use-training-set-td39248.html you answered

that: "Use training set" computes what is also known as the resubstitution

estimate. This estimate is often hopelessly optimistic but it can be useful

for diagnostic purposes."

 

Does that mean that "training set"  makes its own model and then it test it

on the same data? ( I got a bit confused)

 

3) In "percentage  Split" data is first randomized by default . Does the

same happen with K - fold cross validation?

 

 

 

 

 

 

--

Sent from: http://weka.8497.n7.nabble.com/

_______________________________________________

Wekalist mailing list

Send posts to: [hidden email]

List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist

List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

 


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: C45 - pruning method - raising subtree or replacementsubtree

Driver007
Dear Eibe,


I am still confused with what is the purpose of the "Use training set" in
test options. ( I understood how the "supplied test set", "K - fold Cross
validation" and "Percentage Split" works.

If i understood correctly the "Use training set" is not suggested because

1) Does the " Use training set" splits the instances?
2) How exactly works the "Use training set"?
3) Why the "Use training set" is not suggested for learning purposes?

As far as the third question is concerned from my research everyone answers
because of the "resubstitution error".
4) With "resubstitution error" we include all errors? ( Mean absolute error,
Root mean square error, Relative absolute error etc..)

Thanks in advance,
John



--
Sent from: http://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: C45 - pruning method - raising subtree orreplacementsubtree

Eibe Frank-2
Administrator
  1. No.

 

  1. The instances that are used for training the model are also used for testing it.

 

  1. You mean for “testing” purposes? Because the model has been fitted to this data.

 

  1. Yes. It applies whenever the training data is resubstituted into the model for evaluating the model.

 

It’s trivial to build a classifier that achieves high accuracy on the training data (i.e., low resubstitution error). For example, a completely unpruned decision tree will achieve this, or a one-nearest-neighour classifier. However, this doesn’t mean the classifier will be any good on new data drawn from the same domain: it may just be overfitting the training data. Thus, performance estimates on the training data will not give you a reliable indication of performance on new data.

 

Cheers,

Eibe

 

From: [hidden email]
Sent: Saturday, 2 September 2017 9:31 PM
To: [hidden email]
Subject: Re: [Wekalist] C45 - pruning method - raising subtree orreplacementsubtree

 

Dear Eibe,

 

 

I am still confused with what is the purpose of the "Use training set" in

test options. ( I understood how the "supplied test set", "K - fold Cross

validation" and "Percentage Split" works.

 

If i understood correctly the "Use training set" is not suggested because

 

1) Does the " Use training set" splits the instances?

2) How exactly works the "Use training set"?

3) Why the "Use training set" is not suggested for learning purposes?

 

As far as the third question is concerned from my research everyone answers

because of the "resubstitution error".

4) With "resubstitution error" we include all errors? ( Mean absolute error,

Root mean square error, Relative absolute error etc..)

 

Thanks in advance,

John

 

 

 

--

Sent from: http://weka.8497.n7.nabble.com/

_______________________________________________

Wekalist mailing list

Send posts to: [hidden email]

List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist

List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

 


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: C45 - pruning method - raising subtree orreplacementsubtree

Driver007
Dear Eibe,


Thanks a lot your explanation was very clear.


1) As far as J48 algorithm is considered could you explain me what is " φ
"??
( I could not find any pseudo-code similar to this in the book.

Algorithm [1] J48:
INPUT:
        D //Training data
OUTPUT
        T //Decision tree
DTBUILD (*D)
{
       T=φ;
       T= Create root node and label with splitting attribute;
       T= Add arc to root node for each split predicate and
       label;
For each arc do
        D= Database created by applying splitting
        predicate to D;
        If stopping point reached for this path, then
                T’= create leaf node and label with
                appropriate class;
       Else
                T’= DTBUILD(D);
       T= add T’ to arc;
}
 

2)Could you explain me the quadratic loss function (about the error measures
of nominal attributes)?
I have read
http://weka.8497.n7.nabble.com/error-measures-of-nominal-attributes-td503.html#a509
but i am struggling to understand the concept.




--
Sent from: http://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: C45 - pruning method - raising subtree orreplacementsubtree

Eibe Frank-2
Administrator

> On 4/09/2017, at 1:33 AM, Driver007 <[hidden email]> wrote:
>
> 1) As far as J48 algorithm is considered could you explain me what is " φ
> "??
> ( I could not find any pseudo-code similar to this in the book.
>
> Algorithm [1] J48:
> INPUT:
>        D //Training data
> OUTPUT
>        T //Decision tree
> DTBUILD (*D)
> {
>       T=φ;
>       T= Create root node and label with splitting attribute;
>       T= Add arc to root node for each split predicate and
>       label;
> For each arc do
>        D= Database created by applying splitting
>        predicate to D;
>        If stopping point reached for this path, then
>                T’= create leaf node and label with
>                appropriate class;
>       Else
>                T’= DTBUILD(D);
>       T= add T’ to arc;
> }
>

It probably refers to an empty tree?

> 2)Could you explain me the quadratic loss function (about the error measures
> of nominal attributes)?
> I have read
> http://weka.8497.n7.nabble.com/error-measures-of-nominal-attributes-td503.html#a509
> but i am struggling to understand the concept.

It is the multi-class version of the Brier score, i.e., it’s the score as originally defined by Brier. See

  https://en.wikipedia.org/wiki/Brier_score

for more information.

WEKA’s evaluation module takes the average of this score across all test instances, divides it by the number of class values, and then takes the square root to compute the root mean squared error for nominal class prediction problems.

Cheers,
Eibe
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: C45 - pruning method - raising subtree orreplacementsubtree

Driver007
This post was updated on .
Dear Eibe,


Thank you for the answers.

If i understood correctly from the link you sent me Brier's score calculates
the mean squared error. The mean squared error is not the same with root mean squared error.

1) How is calculated the mean absolute error for nominal values in j48?

2) How is calculated the relative absolute error for nominal values in j48?

3) How is calculated the root relative squared error for nominal values in
j48?


Thanks in advance,
John



--
Sent from: http://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@list.waikato.ac.nz
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: C45 - pruning method - raising subtree orreplacementsubtree

Eibe Frank-3
1) Whenever the square of a difference is used in the squared error, the absolute value of the difference is used instead in the absolute error.

2) The absolute error of the learning scheme (e.g., J48) is divided by the absolute error of ZeroR, which takes the proportions of the classes in the training set as the class probability estimates.

3) The same as 2) but using squared error values rather than absolute error values.

Cheers,
Eibe

On Wed, 6 Sep 2017 at 9:35 PM, Driver007 <[hidden email]> wrote:
Dear Eibe,


Thank you for the answers.

If i understood correctly from the link you sent me Brier's score calculates
the root mean squarer error.

1) How is calculated the mean absolute error for nominal values in j48?

2) How is calculated the relative absolute error for nominal values in j48?

3) How is calculated the root relative squared error for nominal values in
j48?


Thanks in advance,
John



--
Sent from: http://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: C45 - pruning method - raising subtree or replacement subtree

awp233
In reply to this post by Driver007
HI there - i was just wondering - what book are you referring to? I am
working on implimenting c4.5 and the book that you are referring to sounds
very useful.

Kind regards

Alex



--
Sent from: https://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: C45 - pruning method - raising subtree or replacement subtree

Eibe Frank-3
I'm not entirely sure which book is being referred to in this thread. There is our data mining book (the book by Ian Witten et al.), which discusses the C4.5 algorithm, but the original and authoritative reference for C4.5 is obviously Ross Quinlan's book "C4.5: Programs for Machine Learning".

Cheers,
Eibe

On Sat, Aug 10, 2019 at 2:00 AM awp233 <[hidden email]> wrote:
HI there - i was just wondering - what book are you referring to? I am
working on implimenting c4.5 and the book that you are referring to sounds
very useful.

Kind regards

Alex



--
Sent from: https://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html