RE: Wekalist Digest, Vol 27, Issue 11

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

RE: Wekalist Digest, Vol 27, Issue 11

ashish godbole

hey Satanjeev,

that sounds ok to me, but it brings me to my second question.

cant i just skip the 'buildClassifier' line (and the corresponding object output stream technique), and simply use Evaluation class ?

since that way i could use the Evaluation class options to save the model, and use it for test instances too

Evaluation eval = new Evaluation(trainfile);

String options[] = {"-d",str4[0],"-t",str2};

System.out.println(eval.evaluateModel(dtree, options));    // or classify test instances if needed

Ashish.





>From: [hidden email]
>Reply-To: [hidden email]
>To: [hidden email]
>Subject: Wekalist Digest, Vol 27, Issue 11
>Date: 10 May 2005 12:59:40 -0700
>
>Send Wekalist mailing list submissions to
> [hidden email]
>
>To subscribe or unsubscribe via the World Wide Web, visit
> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>or, via email, send a message with subject or body 'help' to
> [hidden email]
>
>You can reach the person managing the list at
> [hidden email]
>
>When replying, please edit your Subject line so it is more specific
>than "Re: Contents of Wekalist digest..."
>
>
>Today's Topics:
>
>    1. scalability with PrincipleComponents.. can we find more
>       optimal attribute selector (praveen K)
>    2. unsolvable memory size problems (michel.plantie)
>    3. ROC Plot comparision (Praveen Boinee)
>    4. discretizing the class variable (Friedman, Craig)
>    5. ROC Curves Comparision (Praveen Boinee)
>    6. Complement Naive Bayes (Jenny Wang)
>    7. Re: unsolvable memory size problems (christian schulz)
>    8. evaluation and buildclassifier confusion  (ashish godbole)
>
>
>----------------------------------------------------------------------
>
>Message: 1
>Date: Tue, 10 May 2005 00:07:11 -0700 (PDT)
>From: praveen K <[hidden email]>
>Subject: [Wekalist] scalability with PrincipleComponents.. can we find
> more optimal attribute selector
>To: [hidden email]
>Message-ID: <[hidden email]>
>Content-Type: text/plain; charset="us-ascii"
>
>Hi all,
>
>I have a strange, (may be trivial for some people) problem. Need to select attributes from a pool of a very large DataSet, say 3000X3000 (3000 attributes for 3000 instances) with no class variable. I need an unsupervised Attribute selector, so started working with PrincipleComponents.
>
>Unfortunately i see that it demands more memory and speed (it got STUCK with 2.5 gHz processor with max heap size set to 1g, on linux). Can any one suggest me some other methods/attribute selectors, or tuning parameters for PrincipleComponents to be used for my application?  Are there any other unsupervised attrubute selectors in WEKA which are fast and dont demand much resources?
>
>Your suggesstions are greatly admired, Thanks in Advance,
>
>Praveen
>
>
>---------------------------------
>Discover Yahoo!
>  Use Yahoo! to plan a weekend, have fun online & more.  Check it out!
>-------------- next part --------------
>An HTML attachment was scrubbed...
>URL: https://list.scms.waikato.ac.nz/pipermail/wekalist/attachments/20050510/e762623f/attachment-0001.htm
>
>------------------------------
>
>Message: 2
>Date: Tue, 10 May 2005 11:04:44 +0200
>From: "michel.plantie" <[hidden email]>
>Subject: [Wekalist] unsolvable memory size problems
>To: [hidden email]
>Message-ID: <[hidden email]>
>Content-Type: text/plain; charset="iso-8859-1"
>
>hello
>
>I am using weka since some time now
>
>I wanted to use the randomforest classifier,
>with 2000 data each one of 12100 numbers size
>
>unfortunately the heap size of the java machine cannot afford the
>running of the algorithm.
>
>I used the -Xmx2000m option on windows platform
>and even -Xmx3750m option on solaris platform
>
>but even with this big memory size the algorithm crashes with the
>following message :
>
>"Not enough memory.... load a smaller data set or use larger heap size"
>
>I think the randomforest algorithm comsumes a lot of memory due probably to recursive algorithm
>is there any mean to improve the algorithm ?
>
>
>kind regards
>
>michel
>
>
>--
>
>
>=========================================================
>Michel Planti?
>Laboratoire LGI2P
>Site EERIE, Ecole des Mines d'Ales
>Parc Scientifique Georges Besse
>30035 N?mes Cedex 1 - France
>t?l?phone : 33 466387035, fax : 33 466387099
>email : [hidden email]
>=========================================================
>
>
>
>-------------- next part --------------
>An HTML attachment was scrubbed...
>URL: https://list.scms.waikato.ac.nz/pipermail/wekalist/attachments/20050510/2cffed94/attachment-0001.htm
>
>------------------------------
>
>Message: 3
>Date: Tue, 10 May 2005 14:44:00 +0200
>From: Praveen Boinee <[hidden email]>
>Subject: [Wekalist] ROC Plot comparision
>To: [hidden email]
>Message-ID: <[hidden email]>
>Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
>Dear WEKA People
>
>1) How to plot multiple ROC 's on single plot ... I see nice graphics
>roc plot in WEKA for individual classifiers
>is thery any way to put all the ROC's in one plot
>
>2) I want to compare MLP with SVM and random forests
>     my test set has 6400 rows ...i am using the explorer in WEKA
>I can see the False positive rate[FPR]  and true pos.rate[TPR] for all
>6400 rows only in MLP !! With SVM there  only 2 rows  of FPR and TPR and
>with random forests only 30 rows  of FPR and TPR  ..
>
>         is it common to get like this ..!!
>
>
>
>
>
>------------------------------
>
>Message: 4
>Date: Tue, 10 May 2005 11:00:14 -0400
>From: "Friedman, Craig" <[hidden email]>
>Subject: [Wekalist] discretizing the class variable
>To: <[hidden email]>
>Message-ID:
> <[hidden email]>
>Content-Type: text/plain; charset="iso-8859-1"
>
>Hi,
>
>I'm a new weka user and I'd like to run Tertius on the cpu.arff.
>
>I got the error message "can't handle numeric values!"
>
>I was not able to discretize the class variable (though I was able to discretize the other attributes), even using the unsupervised filters.
>
>Any suggestions would be greatly appreciated.
>
>Craig
>
>--------------------------------------------------------
>
>The information contained in this message is intended only for the recipient, and may be a confidential attorney-client communication or may otherwise be privileged and confidential and protected from disclosure. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, please be aware that any dissemination or copying of this communication is strictly prohibited. If you have received this communication in error, please immediately notify us by replying to the message and deleting it from your computer.
>
>--------------------------------------------------------
>-------------- next part --------------
>An HTML attachment was scrubbed...
>URL: https://list.scms.waikato.ac.nz/pipermail/wekalist/attachments/20050510/e243b8fd/attachment-0001.htm
>
>------------------------------
>
>Message: 5
>Date: Tue, 10 May 2005 17:10:35 +0200
>From: Praveen Boinee <[hidden email]>
>Subject: [Wekalist] ROC Curves Comparision
>To: [hidden email]
>Message-ID: <[hidden email]>
>Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
>Dear WEKA People
>
>1) How to plot multiple ROC 's on single plot ... I see nice graphics
>roc plot in WEKA for individual classifiers
>is thery any way to put all the ROC's in one plot
>
>2) I want to compare MLP with SVM and random forests
>    my test set has 6400 rows ...i am using the explorer in WEKA
>I can see the False positive rate[FPR]  and true pos.rate[TPR] for all
>6400 rows only in MLP !! With SVM there  only 2 rows  of FPR and TPR and
>with random forests only 30 rows  of FPR and TPR  ..
>
>        is it common to get like this ..!!
>
>
>
>------------------------------
>
>Message: 6
>Date: Wed, 11 May 2005 00:48:19 +0800
>From: "Jenny Wang" <[hidden email]>
>Subject: [Wekalist] Complement Naive Bayes
>To: [hidden email]
>Message-ID: <[hidden email]>
>Content-Type: text/plain; charset=big5
>
>Dear all,
>I used the command line to run the complement naive bayes.
>The command is as follows:
>C:\Program Files\Weka-3-4>java weka.classifiers.bayes.ComplementNaiveBayes -
>t d:\mine.arff -c 1 -x 10 -S -N -o -i
>
>However, I found that the result is not the same as that when I use the WEKA
>explorer to run CNB.
>
>I would like to know if the CNB make use of other programs in WEKA?
>and how can I get the probability of each class when classifying each
>document?
>
>THANKS!!
>
>Jenny
>
>
>
>------------------------------
>
>Message: 7
>Date: Tue, 10 May 2005 19:03:59 +0200
>From: christian schulz <[hidden email]>
>Subject: Re: [Wekalist] unsolvable memory size problems
>To: "michel.plantie" <[hidden email]>
>Cc: [hidden email]
>Message-ID: <[hidden email]>
>Content-Type: text/plain; charset="iso-8859-1"
>
>Hi,
>
>because i have these days same problems with large datasets  in weka i
>give you
>a small framework for r-project which seems for me less memory-hungry
>for randomForest
>with many trees.
>
>(1):Take the read.arff function from:
>
>  Craig A. Struble ( http://www.cs.waikato.ac.nz/ml/weka/example_code/readarff.r )
>
>(2): Install R and randomForest from  r-project.org
>
>data  <- read.arff("c:/yourdata.arff)
>
>splitP_GESAMT <- sample(2,nrow(P_GESAMTarff),replace=T,prob=c(0.7,0.3))
>rfP_GESAMT <- randomForest(CLASS  ~ .
>,data=P_GESAMT[splitP_GESAMT==1,],na.action=na.omit,importance=T,ntree=1000))
>P_GESAMTpred <- predict(rfP_GESAMT,P_GESAMT[splitP_GESAMT==2,])
>
>regards, christian
>
>
>
>
>michel.plantie schrieb:
>
> > hello
> >
> > I am using weka since some time now
> >
> > I wanted to use the randomforest classifier,
> > with 2000 data each one of 12100 numbers size
> >
> > unfortunately the heap size of the java machine cannot afford the
> > running of the algorithm.
> >
> > I used the -Xmx2000m option on windows platform
> > and even -Xmx3750m option on solaris platform
> >
> > but even with this big memory size the algorithm crashes with the
> > following message :
> >
> >"Not enough memory.... load a smaller data set or use larger heap size"
> >
> >I think the randomforest algorithm comsumes a lot of memory due probably to recursive algorithm
> >is there any mean to improve the algorithm ?
> >
> >
> >kind regards
> >
> >michel
> >
> >
> >
> >--
> >
> >
> >=========================================================
> >Michel Planti?
> >Laboratoire LGI2P
> >Site EERIE, Ecole des Mines d'Ales
> >Parc Scientifique Georges Besse
> >30035 N?mes Cedex 1 - France
> >t?l?phone : 33 466387035, fax : 33 466387099
> >email : [hidden email]
> >=========================================================
> >
> >
> >
> >------------------------------------------------------------------------
> >
> >_______________________________________________
> >Wekalist mailing list
> >[hidden email]
> >https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
> >
> >
>
>-------------- next part --------------
>An HTML attachment was scrubbed...
>URL: https://list.scms.waikato.ac.nz/pipermail/wekalist/attachments/20050510/e96ab318/attachment-0001.htm
>
>------------------------------
>
>Message: 8
>Date: Tue, 10 May 2005 15:54:38 -0400
>From: "ashish godbole" <[hidden email]>
>Subject: [Wekalist] evaluation and buildclassifier confusion
>To: [hidden email]
>Message-ID: <[hidden email]>
>Content-Type: text/plain; charset="us-ascii"
>
>An HTML attachment was scrubbed...
>URL: https://list.scms.waikato.ac.nz/pipermail/wekalist/attachments/20050510/9ed1769a/attachment.htm
>
>------------------------------
>
>_______________________________________________
>Wekalist mailing list
>[hidden email]
>https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>
>
>End of Wekalist Digest, Vol 27, Issue 11
>****************************************

_______________________________________________
Wekalist mailing list
[hidden email]
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist