J48 vs. Random Forest in attribute selection

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

J48 vs. Random Forest in attribute selection

Jussi Salmi
Hello!

I'm doing attribute selection by using my own code with WEKA. I use a
cost sensitive classifier, set the weights and then try to use attribute
selection with best first search with 10-fold cross-validation. My
problem is, that I can use J48 but not Random Forest classifier with
this set up.

When I set the classifier to be J48, then everything works fine, but if
I only make 1 change in the code, that Random Forest is used instead of
J48, then I get an exception saying, that RandomTree requires at least
one attribute besides the class attribute.

I have 17 attributes in the data when I start the attribute selection
from my code, several hundred instances. Any ideas, what could be wrong?
Random Forest produces better classification results for my data, so I
would prefer to use it here as well.

Jussi

--
Jussi Salmi
http://staff.cs.utu.fi/~jussalmi/


_______________________________________________
Wekalist mailing list
[hidden email]
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
Reply | Threaded
Open this post in threaded view
|

Re: J48 vs. Random Forest in attribute selection

Jussi Salmi
To be more specific, I have the following code, which I copied from
somewhere else in the Weka source code:

for (int fold = 0; fold < numFolds;fold++) {
   Instances train = data.trainCV(numFolds, fold, random);
   eval.selectAttributesCVSplit(train);
}

There are 17 attributes (including a 0/1-valued class attribute) between
the lines "Instances..." and "eval...". Instances variable data holds
the data (I've checked this), eval is of type Evaluation. With J48 this
works, with RandomForest this gives the exception when
"eval.selectAttributesCVSplit(train);" is executed. numFolds is 10,
random is an instance of Random with a seed from the system clock.

Jussi Salmi wrote:

> Hello!
>
> I'm doing attribute selection by using my own code with WEKA. I use a
> cost sensitive classifier, set the weights and then try to use attribute
> selection with best first search with 10-fold cross-validation. My
> problem is, that I can use J48 but not Random Forest classifier with
> this set up.
>
> When I set the classifier to be J48, then everything works fine, but if
> I only make 1 change in the code, that Random Forest is used instead of
> J48, then I get an exception saying, that RandomTree requires at least
> one attribute besides the class attribute.
>
> I have 17 attributes in the data when I start the attribute selection
> from my code, several hundred instances. Any ideas, what could be wrong?
> Random Forest produces better classification results for my data, so I
> would prefer to use it here as well.
>
> Jussi
>


--
Jussi Salmi
http://staff.cs.utu.fi/~jussalmi/


_______________________________________________
Wekalist mailing list
[hidden email]
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
Reply | Threaded
Open this post in threaded view
|

Re: Re: J48 vs. Random Forest in attribute selection

Eibe Frank
As the exception says, RandomForest can't deal with a dataset that
doesn't have any attributes (excluding the class). It looks like the
BestFirst search is trying to evaluate the empty set while searching
for the best subset of attributes. This causes the exception.

I can't think of any way of getting around this problem without
modifying RandomForest. You could make it use ZeroR for classification
if the set of attributes is empty. We should probably make this change
in the Weka distribution at some stage.

Cheers,
Eibe


On Jun 28, 2005, at 2:49 AM, Jussi Salmi wrote:

> To be more specific, I have the following code, which I copied from
> somewhere else in the Weka source code:
>
> for (int fold = 0; fold < numFolds;fold++) {
>   Instances train = data.trainCV(numFolds, fold, random);
>   eval.selectAttributesCVSplit(train);
> }
>
> There are 17 attributes (including a 0/1-valued class attribute)
> between the lines "Instances..." and "eval...". Instances variable
> data holds the data (I've checked this), eval is of type Evaluation.
> With J48 this works, with RandomForest this gives the exception when
> "eval.selectAttributesCVSplit(train);" is executed. numFolds is 10,
> random is an instance of Random with a seed from the system clock.
>
> Jussi Salmi wrote:
>> Hello!
>> I'm doing attribute selection by using my own code with WEKA. I use a
>> cost sensitive classifier, set the weights and then try to use
>> attribute selection with best first search with 10-fold
>> cross-validation. My problem is, that I can use J48 but not Random
>> Forest classifier with this set up.
>> When I set the classifier to be J48, then everything works fine, but
>> if I only make 1 change in the code, that Random Forest is used
>> instead of J48, then I get an exception saying, that RandomTree
>> requires at least one attribute besides the class attribute.
>> I have 17 attributes in the data when I start the attribute selection
>> from my code, several hundred instances. Any ideas, what could be
>> wrong? Random Forest produces better classification results for my
>> data, so I would prefer to use it here as well.
>> Jussi
>
>
> --
> Jussi Salmi
> http://staff.cs.utu.fi/~jussalmi/
>
>
> _______________________________________________
> Wekalist mailing list
> [hidden email]
> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist


_______________________________________________
Wekalist mailing list
[hidden email]
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
Reply | Threaded
Open this post in threaded view
|

Re: Re: J48 vs. Random Forest in attribute selection

Jussi Salmi
Eibe Frank wrote:
> As the exception says, RandomForest can't deal with a dataset that
> doesn't have any attributes (excluding the class). It looks like the
> BestFirst search is trying to evaluate the empty set while searching for
> the best subset of attributes. This causes the exception.


Yes, but the dataset has 17 attributes, including the class. J48 works
perfectly with this dataset with this code. And the dataset has real
data in it, I've checked that by printing data.toString() before that
call to Evaluation.selectAttributesCVSplit()


> I can't think of any way of getting around this problem without
> modifying RandomForest. You could make it use ZeroR for classification
> if the set of attributes is empty. We should probably make this change
> in the Weka distribution at some stage.
>
> Cheers,
> Eibe
>
>
> On Jun 28, 2005, at 2:49 AM, Jussi Salmi wrote:
>
>> To be more specific, I have the following code, which I copied from
>> somewhere else in the Weka source code:
>>
>> for (int fold = 0; fold < numFolds;fold++) {
>>   Instances train = data.trainCV(numFolds, fold, random);
>>   eval.selectAttributesCVSplit(train);
>> }
>>
>> There are 17 attributes (including a 0/1-valued class attribute)
>> between the lines "Instances..." and "eval...". Instances variable
>> data holds the data (I've checked this), eval is of type Evaluation.
>> With J48 this works, with RandomForest this gives the exception when
>> "eval.selectAttributesCVSplit(train);" is executed. numFolds is 10,
>> random is an instance of Random with a seed from the system clock.
>>
>> Jussi Salmi wrote:
>>
>>> Hello!
>>> I'm doing attribute selection by using my own code with WEKA. I use a
>>> cost sensitive classifier, set the weights and then try to use
>>> attribute selection with best first search with 10-fold
>>> cross-validation. My problem is, that I can use J48 but not Random
>>> Forest classifier with this set up.
>>> When I set the classifier to be J48, then everything works fine, but
>>> if I only make 1 change in the code, that Random Forest is used
>>> instead of J48, then I get an exception saying, that RandomTree
>>> requires at least one attribute besides the class attribute.
>>> I have 17 attributes in the data when I start the attribute selection
>>> from my code, several hundred instances. Any ideas, what could be
>>> wrong? Random Forest produces better classification results for my
>>> data, so I would prefer to use it here as well.
>>> Jussi
>>
>>
>>
>> --
>> Jussi Salmi
>> http://staff.cs.utu.fi/~jussalmi/
>>
>>
>> _______________________________________________
>> Wekalist mailing list
>> [hidden email]
>> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>
>


--
Jussi Salmi
http://staff.cs.utu.fi/~jussalmi/


_______________________________________________
Wekalist mailing list
[hidden email]
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
Reply | Threaded
Open this post in threaded view
|

Re: Re: J48 vs. Random Forest in attribute selection

Mark Hall-11
Eibe is correct - BestFirst creates new data sets internally that
correspond to subsets of features to be evaluated. One of the subsets
evaluated is the empty (i.e. just the class attribute) set of
features. This is to check whether just predicting the majority class
outperforms any model built using any of the features.

Cheers,
Mark.

On 6/28/05, Jussi Salmi <[hidden email]> wrote:

> Eibe Frank wrote:
> > As the exception says, RandomForest can't deal with a dataset that
> > doesn't have any attributes (excluding the class). It looks like the
> > BestFirst search is trying to evaluate the empty set while searching for
> > the best subset of attributes. This causes the exception.
>
>
> Yes, but the dataset has 17 attributes, including the class. J48 works
> perfectly with this dataset with this code. And the dataset has real
> data in it, I've checked that by printing data.toString() before that
> call to Evaluation.selectAttributesCVSplit()
>
>
> > I can't think of any way of getting around this problem without
> > modifying RandomForest. You could make it use ZeroR for classification
> > if the set of attributes is empty. We should probably make this change
> > in the Weka distribution at some stage.
> >
> > Cheers,
> > Eibe
> >
> >
> > On Jun 28, 2005, at 2:49 AM, Jussi Salmi wrote:
> >
> >> To be more specific, I have the following code, which I copied from
> >> somewhere else in the Weka source code:
> >>
> >> for (int fold = 0; fold < numFolds;fold++) {
> >>   Instances train = data.trainCV(numFolds, fold, random);
> >>   eval.selectAttributesCVSplit(train);
> >> }
> >>
> >> There are 17 attributes (including a 0/1-valued class attribute)
> >> between the lines "Instances..." and "eval...". Instances variable
> >> data holds the data (I've checked this), eval is of type Evaluation.
> >> With J48 this works, with RandomForest this gives the exception when
> >> "eval.selectAttributesCVSplit(train);" is executed. numFolds is 10,
> >> random is an instance of Random with a seed from the system clock.
> >>
> >> Jussi Salmi wrote:
> >>
> >>> Hello!
> >>> I'm doing attribute selection by using my own code with WEKA. I use a
> >>> cost sensitive classifier, set the weights and then try to use
> >>> attribute selection with best first search with 10-fold
> >>> cross-validation. My problem is, that I can use J48 but not Random
> >>> Forest classifier with this set up.
> >>> When I set the classifier to be J48, then everything works fine, but
> >>> if I only make 1 change in the code, that Random Forest is used
> >>> instead of J48, then I get an exception saying, that RandomTree
> >>> requires at least one attribute besides the class attribute.
> >>> I have 17 attributes in the data when I start the attribute selection
> >>> from my code, several hundred instances. Any ideas, what could be
> >>> wrong? Random Forest produces better classification results for my
> >>> data, so I would prefer to use it here as well.
> >>> Jussi
> >>
> >>
> >>
> >> --
> >> Jussi Salmi
> >> http://staff.cs.utu.fi/~jussalmi/
> >>
> >>
> >> _______________________________________________
> >> Wekalist mailing list
> >> [hidden email]
> >> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
> >
> >
>
>
> --
> Jussi Salmi
> http://staff.cs.utu.fi/~jussalmi/
>
>
> _______________________________________________
> Wekalist mailing list
> [hidden email]
> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>

_______________________________________________
Wekalist mailing list
[hidden email]
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist