Re : CorrelationAttibuteEval

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re : CorrelationAttibuteEval

Betha Nurina Sari
Dear all,i need explaiation about CorrelationAttributeEval. In documentation,we can read this :

CorrelationAttributeEval:
Evaluates the worth of an attribute by measuring the correlation (Pearson's) between it and the class.

Nominal attributes are considered on a value by value basis by treating each value as an indicator. An overall correlation for a nominal attribute is arrived at via a weighted average.

So,the class here is nominal attribute?
But,i am still confuse with this sentence :An overall correlation for a nominal attribute is arrived at via a weighted average. What is the meaning of weighted average, how we can calculate it from nominal attribute (target class)?

Thanks.

Pada tanggal 2 Mei 2017 16.02, <[hidden email]> menulis:
Send Wekalist mailing list submissions to
        [hidden email]

To subscribe or unsubscribe via the World Wide Web, visit
        https://list.waikato.ac.nz/mailman/listinfo/wekalist
or, via email, send a message with subject or body 'help' to
        [hidden email]

You can reach the person managing the list at
        [hidden email]

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Wekalist digest..."


Today's Topics:

   1. Re: Java errors - Retrieving data from instances (Bob Matthews)
   2. Re: Predictive markers (Santosh Bhosale)
   3. Re: Predictive markers (Santosh Bhosale)


----------------------------------------------------------------------

Message: 1
Date: Tue, 2 May 2017 21:00:35 +1200
From: Bob Matthews <[hidden email]>
To: [hidden email]
Subject: Re: [Wekalist] Java errors - Retrieving data from instances
Message-ID: <[hidden email]>
Content-Type: text/plain; charset=utf-8; format=flowed

Hi Mark

now have the following code.............

// get index, OHLC prices and the take profit from (l)th instance
sim_trading_date = lthInstance.value(0);
sim_trading_time = lthInstance.value(1);
sim_open_price = lthInstance.value(2);
sim_high_price = lthInstance.value(3);
sim_low_price = lthInstance.value(4);
sim_close_price = lthInstance.value(5);
sim_take_profit = lthInstance.value(15);

seems to be OK

so my only problem is why it is not recognizing fcs_all

Bob M

On 5/2/17 8:13 PM, Mark Hall wrote:
> My mail client had a fit ? the rest of my message was going to be:
>
> See:
> http://weka.sourceforge.net/doc.stable-3-8/weka/core/Instance.html
>
> Cheers,
> Mark.
>
> On 2/05/17, 7:27 PM, "Bob Matthews" <[hidden email] on behalf of [hidden email]> wrote:
>
>
>
>
>
>
>          Hello Eibe
>
>          I have the following code.......................
>
>          ObjectInputStream ois =
>              new ObjectInputStream(
>                                         new
>              FileInputStream("C:/............/KStar1.model"));
>              FilteredClassifier fcs_all = (FilteredClassifier)
>              ois.readObject();
>              ois.close();
>
>          //retrieve the lth instance in the testing set
>          Instance lthInstance = testing.instance(l);
>
>          // get index, OHLC prices and the take profit from (l)th instance
>          sim_trading_date = lthInstance.attribute(Trading_Date, 0);
>          sim_trading_time = lthInstance.attribute(Trading_Time, 1);
>          In the above 2 lines, it cannot find the
>            variables
>
>          sim_open_price = lthInstance.attribute(open_price, 2);
>          sim_high_price = lthInstance.attribute(high_price, 3);
>          sim_low_price = lthInstance.attribute(low_price, 4);
>          sim_close_price = lthInstance.attribute(close_price, 5);
>          sim_take_profit = lthInstance.attribute(take_profit, 15);
>          In the above 5 lines, method attribute cannot
>            be applied
>
>          // classifyInstance() just returns the index of the predicted label
>          (the one with the highest probability) as a double
>          pred = fcs_all.classifyInstance(lthInstance);
>            Cannot find variable fcs_all
>
>            Is there something obvious that I am doing
>              wrong here?
>
>            Bob M
>
>
>
>      _______________________________________________
>      Wekalist mailing list
>      Send posts to: [hidden email]
>      List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
>      List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



------------------------------

Message: 2
Date: Tue, 2 May 2017 12:00:59 +0300
From: Santosh Bhosale <[hidden email]>
To: "Weka machine learning workbench list."
        <[hidden email]>
Subject: Re: [Wekalist] Predictive markers
Message-ID:
        <CA+=[hidden email]>
Content-Type: text/plain; charset="utf-8"

Hi Eibe,

I tried the one you said below

"Try the AttributeSelectedClassifier with WrapperSubsetEval, choosing
RandomForest as the evaluator in WrapperSubsetEval and also as the base
classifier in AttributeSelectedClassifier. The default BestFirstSearch
method will probably work fine on your data because it is quite a small
dataset"

The SPSS, I just used for ROC curve plotting not for classification. So the
only concern is why such discrepancy in the AUCs. Did I make the correct
use of WEKA? Did my CSV input ot WEKA was correct?

Thanks
Santosh



On Tue, May 2, 2017 at 11:52 AM, Eibe Frank <[hidden email]> wrote:

> Which classifier did you use in WEKA and SPSS and which evaluation method?
> 10-fold cross-validation?
>
> Cheers,
> Eibe
>
> On 2/05/2017, at 6:58 PM, Santosh Bhosale <[hidden email]>
> wrote:
>
> Hi Eibe,
>
> I tried your mentioned workflow. In result, WEKA showed a panel of 13
> attributes (Protein biomarkers) classifying cases from controls. On the
> same data, I drew ROC curve using WEKA, which gave AUC value of 0.937. But
> when I took the same combination and drew the ROC curve using SPSS, it was
> giving me AUC of 0.73. I am not understanding this discrepancy.
>
> Please see attached example of CSV file used as input for WEKA.
>
> Thanks in advance
> Santosh
>
> On Wed, Apr 26, 2017 at 12:18 PM, Eibe Frank <[hidden email]> wrote:
>
>> Try the AttributeSelectedClassifier with WrapperSubsetEval, choosing
>> RandomForest as the evaluator in WrapperSubsetEval and also as the base
>> classifier in AttributeSelectedClassifier. The default BestFirstSearch
>> method will probably work fine on your data because it is quite a small
>> dataset.
>>
>> Cheers,
>> Eibe
>>
>> > On 26 Apr 2017, at 00:39, Santosh Bhosale <[hidden email]>
>> wrote:
>> >
>> > Dear All,
>> >
>> > I did following steps in WEKA.
>> >
>> > - Uploaded the data in CSV file format
>> > - Ran classifier using J48 and RandomForest
>> > - J48 gave about 77% correctly classified instances
>> > - RandomForest gave about 84% correctly classified instances
>> >
>> > I had 264 instances and 24 attributes. However, I was not able to
>> pinpoint which combination of attributes had given the best classification
>> of cases from controls.
>> >
>> > Any help would be highly appreciated.
>> >
>> > Thanks
>> > Santosh
>> >
>> > On Tue, Apr 25, 2017 at 9:52 AM, Santosh Bhosale <
>> [hidden email]> wrote:
>> > Hi Eibe.
>> >
>> > Thanks
>> >
>> > On Tue, Apr 25, 2017 at 6:29 AM, Eibe Frank <[hidden email]> wrote:
>> > You should probably learn about some basics of machine learning first.
>> There are some free on-line courses based on WEKA here:
>> >
>> >   https://weka.waikato.ac.nz/explorer
>> >
>> > Cheers,
>> > Eibe
>> >
>> > > On 25 Apr 2017, at 00:32, Santosh Bhosale <[hidden email]>
>> wrote:
>> > >
>> > > Hi All,
>> > >
>> > > I am proteomics expert and new to machine learning. I have protein
>> expression data between cases and controls where I have already found
>> significant markers. Now I want to predict a panel of markers which will
>> best classify cases from controls. I am not sure how to do that.
>> > >
>> > > It will be really good if someone urgently helps me in the context of
>> how the data-structure to be and what sort of pipeline to follow. So using
>> this information I can plot the ROC curve to best classify cases from
>> controls.
>> > >
>> > > Thanks in advance
>> > >
>> > > -Santosh
>> > > _______________________________________________
>> > > Wekalist mailing list
>> > > Send posts to: [hidden email]
>> > > List info and subscription status: https://list.waikato.ac.nz/mai
>> lman/listinfo/wekalist
>> > > List etiquette: http://www.cs.waikato.ac.nz/~m
>> l/weka/mailinglist_etiquette.html
>> >
>> > _______________________________________________
>> > Wekalist mailing list
>> > Send posts to: [hidden email]
>> > List info and subscription status: https://list.waikato.ac.nz/mai
>> lman/listinfo/wekalist
>> > List etiquette: http://www.cs.waikato.ac.nz/~m
>> l/weka/mailinglist_etiquette.html
>> >
>> >
>> > _______________________________________________
>> > Wekalist mailing list
>> > Send posts to: [hidden email]
>> > List info and subscription status: https://list.waikato.ac.nz/mai
>> lman/listinfo/wekalist
>> > List etiquette: http://www.cs.waikato.ac.nz/~m
>> l/weka/mailinglist_etiquette.html
>>
>> _______________________________________________
>> Wekalist mailing list
>> Send posts to: [hidden email]
>> List info and subscription status: https://list.waikato.ac.nz/mai
>> lman/listinfo/wekalist
>> List etiquette: http://www.cs.waikato.ac.nz/~m
>> l/weka/mailinglist_etiquette.html
>>
>
> <exampleOfWEKAInput.csv>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/
> mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~
> ml/weka/mailinglist_etiquette.html
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/
> mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~
> ml/weka/mailinglist_etiquette.html
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.waikato.ac.nz/pipermail/wekalist/attachments/20170502/d187d27b/attachment-0001.html>

------------------------------

Message: 3
Date: Tue, 2 May 2017 12:02:00 +0300
From: Santosh Bhosale <[hidden email]>
To: "Weka machine learning workbench list."
        <[hidden email]>
Subject: Re: [Wekalist] Predictive markers
Message-ID:
        <CA+=4V7US7+ZMCO8CVXX6kuU13gkT1=[hidden email]>
Content-Type: text/plain; charset="utf-8"

Hi,

Sorry for spamming. Yes, I used 10-fold-cross validation.

Santosh

On Tue, May 2, 2017 at 12:00 PM, Santosh Bhosale <[hidden email]>
wrote:

> Hi Eibe,
>
> I tried the one you said below
>
> "Try the AttributeSelectedClassifier with WrapperSubsetEval, choosing
> RandomForest as the evaluator in WrapperSubsetEval and also as the base
> classifier in AttributeSelectedClassifier. The default BestFirstSearch
> method will probably work fine on your data because it is quite a small
> dataset"
>
> The SPSS, I just used for ROC curve plotting not for classification. So
> the only concern is why such discrepancy in the AUCs. Did I make the
> correct use of WEKA? Did my CSV input ot WEKA was correct?
>
> Thanks
> Santosh
>
>
>
> On Tue, May 2, 2017 at 11:52 AM, Eibe Frank <[hidden email]> wrote:
>
>> Which classifier did you use in WEKA and SPSS and which evaluation
>> method? 10-fold cross-validation?
>>
>> Cheers,
>> Eibe
>>
>> On 2/05/2017, at 6:58 PM, Santosh Bhosale <[hidden email]>
>> wrote:
>>
>> Hi Eibe,
>>
>> I tried your mentioned workflow. In result, WEKA showed a panel of 13
>> attributes (Protein biomarkers) classifying cases from controls. On the
>> same data, I drew ROC curve using WEKA, which gave AUC value of 0.937. But
>> when I took the same combination and drew the ROC curve using SPSS, it was
>> giving me AUC of 0.73. I am not understanding this discrepancy.
>>
>> Please see attached example of CSV file used as input for WEKA.
>>
>> Thanks in advance
>> Santosh
>>
>> On Wed, Apr 26, 2017 at 12:18 PM, Eibe Frank <[hidden email]> wrote:
>>
>>> Try the AttributeSelectedClassifier with WrapperSubsetEval, choosing
>>> RandomForest as the evaluator in WrapperSubsetEval and also as the base
>>> classifier in AttributeSelectedClassifier. The default BestFirstSearch
>>> method will probably work fine on your data because it is quite a small
>>> dataset.
>>>
>>> Cheers,
>>> Eibe
>>>
>>> > On 26 Apr 2017, at 00:39, Santosh Bhosale <[hidden email]>
>>> wrote:
>>> >
>>> > Dear All,
>>> >
>>> > I did following steps in WEKA.
>>> >
>>> > - Uploaded the data in CSV file format
>>> > - Ran classifier using J48 and RandomForest
>>> > - J48 gave about 77% correctly classified instances
>>> > - RandomForest gave about 84% correctly classified instances
>>> >
>>> > I had 264 instances and 24 attributes. However, I was not able to
>>> pinpoint which combination of attributes had given the best classification
>>> of cases from controls.
>>> >
>>> > Any help would be highly appreciated.
>>> >
>>> > Thanks
>>> > Santosh
>>> >
>>> > On Tue, Apr 25, 2017 at 9:52 AM, Santosh Bhosale <
>>> [hidden email]> wrote:
>>> > Hi Eibe.
>>> >
>>> > Thanks
>>> >
>>> > On Tue, Apr 25, 2017 at 6:29 AM, Eibe Frank <[hidden email]>
>>> wrote:
>>> > You should probably learn about some basics of machine learning first.
>>> There are some free on-line courses based on WEKA here:
>>> >
>>> >   https://weka.waikato.ac.nz/explorer
>>> >
>>> > Cheers,
>>> > Eibe
>>> >
>>> > > On 25 Apr 2017, at 00:32, Santosh Bhosale <[hidden email]>
>>> wrote:
>>> > >
>>> > > Hi All,
>>> > >
>>> > > I am proteomics expert and new to machine learning. I have protein
>>> expression data between cases and controls where I have already found
>>> significant markers. Now I want to predict a panel of markers which will
>>> best classify cases from controls. I am not sure how to do that.
>>> > >
>>> > > It will be really good if someone urgently helps me in the context
>>> of how the data-structure to be and what sort of pipeline to follow. So
>>> using this information I can plot the ROC curve to best classify cases from
>>> controls.
>>> > >
>>> > > Thanks in advance
>>> > >
>>> > > -Santosh
>>> > > _______________________________________________
>>> > > Wekalist mailing list
>>> > > Send posts to: [hidden email]
>>> > > List info and subscription status: https://list.waikato.ac.nz/mai
>>> lman/listinfo/wekalist
>>> > > List etiquette: http://www.cs.waikato.ac.nz/~m
>>> l/weka/mailinglist_etiquette.html
>>> >
>>> > _______________________________________________
>>> > Wekalist mailing list
>>> > Send posts to: [hidden email]
>>> > List info and subscription status: https://list.waikato.ac.nz/mai
>>> lman/listinfo/wekalist
>>> > List etiquette: http://www.cs.waikato.ac.nz/~m
>>> l/weka/mailinglist_etiquette.html
>>> >
>>> >
>>> > _______________________________________________
>>> > Wekalist mailing list
>>> > Send posts to: [hidden email]
>>> > List info and subscription status: https://list.waikato.ac.nz/mai
>>> lman/listinfo/wekalist
>>> > List etiquette: http://www.cs.waikato.ac.nz/~m
>>> l/weka/mailinglist_etiquette.html
>>>
>>> _______________________________________________
>>> Wekalist mailing list
>>> Send posts to: [hidden email]
>>> List info and subscription status: https://list.waikato.ac.nz/mai
>>> lman/listinfo/wekalist
>>> List etiquette: http://www.cs.waikato.ac.nz/~m
>>> l/weka/mailinglist_etiquette.html
>>>
>>
>> <exampleOfWEKAInput.csv>
>>
>> _______________________________________________
>> Wekalist mailing list
>> Send posts to: [hidden email]
>> List info and subscription status: https://list.waikato.ac.nz/mai
>> lman/listinfo/wekalist
>> List etiquette: http://www.cs.waikato.ac.nz/~m
>> l/weka/mailinglist_etiquette.html
>>
>>
>> _______________________________________________
>> Wekalist mailing list
>> Send posts to: [hidden email]
>> List info and subscription status: https://list.waikato.ac.nz/mai
>> lman/listinfo/wekalist
>> List etiquette: http://www.cs.waikato.ac.nz/~m
>> l/weka/mailinglist_etiquette.html
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://list.waikato.ac.nz/pipermail/wekalist/attachments/20170502/3835085d/attachment.html>

------------------------------

_______________________________________________
Wekalist mailing list
[hidden email]
https://list.waikato.ac.nz/mailman/listinfo/wekalist


End of Wekalist Digest, Vol 171, Issue 17
*****************************************

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Re : CorrelationAttibuteEval

Mark Hall
A nominal attribute with k values is essentially converted into k separate binary indicator attributes, where each takes the value 1 only when its corresponding nominal value appears in a given instance (and is 0 otherwise). These indicators can then be treated as numeric and Pearsons's correlation can be computed between each and the target. A weighted average of the Pearson's correlation for each indicator is then taken as the overall correlation, where the weights are proportional to the frequency of each nominal value.

Cheers,
Mark.

On 3/05/17, 3:16 AM, "Betha Nurina Sari" <[hidden email] on behalf of [hidden email]> wrote:

    Dear all,i need explaiation about CorrelationAttributeEval. In documentation,we can read this :
    CorrelationAttributeEval:
    Evaluates the worth of an attribute by measuring the correlation (Pearson's) between it and the class.
   
    Nominal attributes are considered on a value by value basis by treating each value as an indicator. An overall correlation for a nominal attribute is arrived at via a weighted average.
    So,the class here is nominal attribute?
    But,i am still confuse with this sentence :An overall correlation for a nominal attribute is arrived at via a weighted average. What is the meaning of weighted average, how we can calculate it from nominal attribute (target class)?
   
    Thanks.
   
   
   
    Pada tanggal 2 Mei 2017 16.02,  <[hidden email]> menulis:
   
    Send Wekalist mailing list submissions to
            [hidden email]
   
    To subscribe or unsubscribe via the World Wide Web, visit
            https://list.waikato.ac.nz/mailman/listinfo/wekalist
    or, via email, send a message with subject or body 'help' to
            [hidden email]
   
    You can reach the person managing the list at
            [hidden email]
   
    When replying, please edit your Subject line so it is more specific
    than "Re: Contents of Wekalist digest..."
   
   
    Today's Topics:
   
       1. Re: Java errors - Retrieving data from instances (Bob Matthews)
       2. Re: Predictive markers (Santosh Bhosale)
       3. Re: Predictive markers (Santosh Bhosale)
   
   
    ----------------------------------------------------------------------
   
    Message: 1
    Date: Tue, 2 May 2017 21:00:35 +1200
    From: Bob Matthews <[hidden email]>
    To: [hidden email]
    Subject: Re: [Wekalist] Java errors - Retrieving data from instances
    Message-ID: <[hidden email]>
    Content-Type: text/plain; charset=utf-8; format=flowed
   
    Hi Mark
   
    now have the following code.............
   
    // get index, OHLC prices and the take profit from (l)th instance
    sim_trading_date = lthInstance.value(0);
    sim_trading_time = lthInstance.value(1);
    sim_open_price = lthInstance.value(2);
    sim_high_price = lthInstance.value(3);
    sim_low_price = lthInstance.value(4);
    sim_close_price = lthInstance.value(5);
    sim_take_profit = lthInstance.value(15);
   
    seems to be OK
   
    so my only problem is why it is not recognizing fcs_all
   
    Bob M
   
    On 5/2/17 8:13 PM, Mark Hall wrote:
    > My mail client had a fit ? the rest of my message was going to be:
    >
    > See:
    > http://weka.sourceforge.net/doc.stable-3-8/weka/core/Instance.html
    >
    > Cheers,
    > Mark.
    >
    > On 2/05/17, 7:27 PM, "Bob Matthews" <[hidden email] on behalf of [hidden email]> wrote:
    >
    >
    >
    >
    >
    >
    >          Hello Eibe
    >
    >          I have the following code.......................
    >
    >          ObjectInputStream ois =
    >              new ObjectInputStream(
    >                                         new
    >              FileInputStream("C:/............/KStar1.model"));
    >              FilteredClassifier fcs_all = (FilteredClassifier)
    >              ois.readObject();
    >              ois.close();
    >
    >          //retrieve the lth instance in the testing set
    >          Instance lthInstance = testing.instance(l);
    >
    >          // get index, OHLC prices and the take profit from (l)th instance
    >          sim_trading_date = lthInstance.attribute(Trading_Date, 0);
    >          sim_trading_time = lthInstance.attribute(Trading_Time, 1);
    >          In the above 2 lines, it cannot find the
    >            variables
    >
    >          sim_open_price = lthInstance.attribute(open_price, 2);
    >          sim_high_price = lthInstance.attribute(high_price, 3);
    >          sim_low_price = lthInstance.attribute(low_price, 4);
    >          sim_close_price = lthInstance.attribute(close_price, 5);
    >          sim_take_profit = lthInstance.attribute(take_profit, 15);
    >          In the above 5 lines, method attribute cannot
    >            be applied
    >
    >          // classifyInstance() just returns the index of the predicted label
    >          (the one with the highest probability) as a double
    >          pred = fcs_all.classifyInstance(lthInstance);
    >            Cannot find variable fcs_all
    >
    >            Is there something obvious that I am doing
    >              wrong here?
    >
    >            Bob M
    >
    >
    >
    >      _______________________________________________
    >      Wekalist mailing list
    >      Send posts to: [hidden email]
    >      List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
    >      List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
    >
    >
    >
    > _______________________________________________
    > Wekalist mailing list
    > Send posts to: [hidden email]
    > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
    > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
   
   
   
    ------------------------------
   
    Message: 2
    Date: Tue, 2 May 2017 12:00:59 +0300
    From: Santosh Bhosale <[hidden email]>
    To: "Weka machine learning workbench list."
            <[hidden email]>
    Subject: Re: [Wekalist] Predictive markers
    Message-ID:
            <CA+=[hidden email]>
    Content-Type: text/plain; charset="utf-8"
   
    Hi Eibe,
   
    I tried the one you said below
   
    "Try the AttributeSelectedClassifier with WrapperSubsetEval, choosing
    RandomForest as the evaluator in WrapperSubsetEval and also as the base
    classifier in AttributeSelectedClassifier. The default BestFirstSearch
    method will probably work fine on your data because it is quite a small
    dataset"
   
    The SPSS, I just used for ROC curve plotting not for classification. So the
    only concern is why such discrepancy in the AUCs. Did I make the correct
    use of WEKA? Did my CSV input ot WEKA was correct?
   
    Thanks
    Santosh
   
   
   
    On Tue, May 2, 2017 at 11:52 AM, Eibe Frank <[hidden email]> wrote:
   
    > Which classifier did you use in WEKA and SPSS and which evaluation method?
    > 10-fold cross-validation?
    >
    > Cheers,
    > Eibe
    >
    > On 2/05/2017, at 6:58 PM, Santosh Bhosale <[hidden email]>
    > wrote:
    >
    > Hi Eibe,
    >
    > I tried your mentioned workflow. In result, WEKA showed a panel of 13
    > attributes (Protein biomarkers) classifying cases from controls. On the
    > same data, I drew ROC curve using WEKA, which gave AUC value of 0.937. But
    > when I took the same combination and drew the ROC curve using SPSS, it was
    > giving me AUC of 0.73. I am not understanding this discrepancy.
    >
    > Please see attached example of CSV file used as input for WEKA.
    >
    > Thanks in advance
    > Santosh
    >
    > On Wed, Apr 26, 2017 at 12:18 PM, Eibe Frank <[hidden email]> wrote:
    >
    >> Try the AttributeSelectedClassifier with WrapperSubsetEval, choosing
    >> RandomForest as the evaluator in WrapperSubsetEval and also as the base
    >> classifier in AttributeSelectedClassifier. The default BestFirstSearch
    >> method will probably work fine on your data because it is quite a small
    >> dataset.
    >>
    >> Cheers,
    >> Eibe
    >>
    >> > On 26 Apr 2017, at 00:39, Santosh Bhosale <[hidden email]>
    >> wrote:
    >> >
    >> > Dear All,
    >> >
    >> > I did following steps in WEKA.
    >> >
    >> > - Uploaded the data in CSV file format
    >> > - Ran classifier using J48 and RandomForest
    >> > - J48 gave about 77% correctly classified instances
    >> > - RandomForest gave about 84% correctly classified instances
    >> >
    >> > I had 264 instances and 24 attributes. However, I was not able to
    >> pinpoint which combination of attributes had given the best classification
    >> of cases from controls.
    >> >
    >> > Any help would be highly appreciated.
    >> >
    >> > Thanks
    >> > Santosh
    >> >
    >> > On Tue, Apr 25, 2017 at 9:52 AM, Santosh Bhosale <
    >> [hidden email]> wrote:
    >> > Hi Eibe.
    >> >
    >> > Thanks
    >> >
    >> > On Tue, Apr 25, 2017 at 6:29 AM, Eibe Frank <[hidden email]> wrote:
    >> > You should probably learn about some basics of machine learning first.
    >> There are some free on-line courses based on WEKA here:
    >> >
    >> >   https://weka.waikato.ac.nz/explorer
    >> >
    >> > Cheers,
    >> > Eibe
    >> >
    >> > > On 25 Apr 2017, at 00:32, Santosh Bhosale <[hidden email]>
    >> wrote:
    >> > >
    >> > > Hi All,
    >> > >
    >> > > I am proteomics expert and new to machine learning. I have protein
    >> expression data between cases and controls where I have already found
    >> significant markers. Now I want to predict a panel of markers which will
    >> best classify cases from controls. I am not sure how to do that.
    >> > >
    >> > > It will be really good if someone urgently helps me in the context of
    >> how the data-structure to be and what sort of pipeline to follow. So using
    >> this information I can plot the ROC curve to best classify cases from
    >> controls.
    >> > >
    >> > > Thanks in advance
    >> > >
    >> > > -Santosh
    >> > > _______________________________________________
    >> > > Wekalist mailing list
    >> > > Send posts to: [hidden email]
    >> > > List info and subscription status: https://list.waikato.ac.nz/mai
    >> lman/listinfo/wekalist
    >> > > List etiquette: http://www.cs.waikato.ac.nz/~m
    >> l/weka/mailinglist_etiquette.html
    >> >
    >> > _______________________________________________
    >> > Wekalist mailing list
    >> > Send posts to: [hidden email]
    >> > List info and subscription status: https://list.waikato.ac.nz/mai
    >> lman/listinfo/wekalist
    >> > List etiquette: http://www.cs.waikato.ac.nz/~m
    >> l/weka/mailinglist_etiquette.html
    >> >
    >> >
    >> > _______________________________________________
    >> > Wekalist mailing list
    >> > Send posts to: [hidden email]
    >> > List info and subscription status: https://list.waikato.ac.nz/mai
    >> lman/listinfo/wekalist
    >> > List etiquette: http://www.cs.waikato.ac.nz/~m
    >> l/weka/mailinglist_etiquette.html
    >>
    >> _______________________________________________
    >> Wekalist mailing list
    >> Send posts to: [hidden email]
    >> List info and subscription status: https://list.waikato.ac.nz/mai
    >> lman/listinfo/wekalist
    >> List etiquette: http://www.cs.waikato.ac.nz/~m
    >> l/weka/mailinglist_etiquette.html
    >>
    >
    > <exampleOfWEKAInput.csv>
    >
    > _______________________________________________
    > Wekalist mailing list
    > Send posts to: [hidden email]
    > List info and subscription status: https://list.waikato.ac.nz/
    > mailman/listinfo/wekalist
    > List etiquette: http://www.cs.waikato.ac.nz/~
    > ml/weka/mailinglist_etiquette.html
    >
    >
    > _______________________________________________
    > Wekalist mailing list
    > Send posts to: [hidden email]
    > List info and subscription status: https://list.waikato.ac.nz/
    > mailman/listinfo/wekalist
    > List etiquette: http://www.cs.waikato.ac.nz/~
    > ml/weka/mailinglist_etiquette.html
    >
    >
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://list.waikato.ac.nz/pipermail/wekalist/attachments/20170502/d187d27b/attachment-0001.html>
   
    ------------------------------
   
    Message: 3
    Date: Tue, 2 May 2017 12:02:00 +0300
    From: Santosh Bhosale <[hidden email]>
    To: "Weka machine learning workbench list."
            <[hidden email]>
    Subject: Re: [Wekalist] Predictive markers
    Message-ID:
            <CA+=4V7US7+ZMCO8CVXX6kuU13gkT1=[hidden email]>
    Content-Type: text/plain; charset="utf-8"
   
    Hi,
   
    Sorry for spamming. Yes, I used 10-fold-cross validation.
   
    Santosh
   
    On Tue, May 2, 2017 at 12:00 PM, Santosh Bhosale <[hidden email]>
    wrote:
   
    > Hi Eibe,
    >
    > I tried the one you said below
    >
    > "Try the AttributeSelectedClassifier with WrapperSubsetEval, choosing
    > RandomForest as the evaluator in WrapperSubsetEval and also as the base
    > classifier in AttributeSelectedClassifier. The default BestFirstSearch
    > method will probably work fine on your data because it is quite a small
    > dataset"
    >
    > The SPSS, I just used for ROC curve plotting not for classification. So
    > the only concern is why such discrepancy in the AUCs. Did I make the
    > correct use of WEKA? Did my CSV input ot WEKA was correct?
    >
    > Thanks
    > Santosh
    >
    >
    >
    > On Tue, May 2, 2017 at 11:52 AM, Eibe Frank <[hidden email]> wrote:
    >
    >> Which classifier did you use in WEKA and SPSS and which evaluation
    >> method? 10-fold cross-validation?
    >>
    >> Cheers,
    >> Eibe
    >>
    >> On 2/05/2017, at 6:58 PM, Santosh Bhosale <[hidden email]>
    >> wrote:
    >>
    >> Hi Eibe,
    >>
    >> I tried your mentioned workflow. In result, WEKA showed a panel of 13
    >> attributes (Protein biomarkers) classifying cases from controls. On the
    >> same data, I drew ROC curve using WEKA, which gave AUC value of 0.937. But
    >> when I took the same combination and drew the ROC curve using SPSS, it was
    >> giving me AUC of 0.73. I am not understanding this discrepancy.
    >>
    >> Please see attached example of CSV file used as input for WEKA.
    >>
    >> Thanks in advance
    >> Santosh
    >>
    >> On Wed, Apr 26, 2017 at 12:18 PM, Eibe Frank <[hidden email]> wrote:
    >>
    >>> Try the AttributeSelectedClassifier with WrapperSubsetEval, choosing
    >>> RandomForest as the evaluator in WrapperSubsetEval and also as the base
    >>> classifier in AttributeSelectedClassifier. The default BestFirstSearch
    >>> method will probably work fine on your data because it is quite a small
    >>> dataset.
    >>>
    >>> Cheers,
    >>> Eibe
    >>>
    >>> > On 26 Apr 2017, at 00:39, Santosh Bhosale <[hidden email]>
    >>> wrote:
    >>> >
    >>> > Dear All,
    >>> >
    >>> > I did following steps in WEKA.
    >>> >
    >>> > - Uploaded the data in CSV file format
    >>> > - Ran classifier using J48 and RandomForest
    >>> > - J48 gave about 77% correctly classified instances
    >>> > - RandomForest gave about 84% correctly classified instances
    >>> >
    >>> > I had 264 instances and 24 attributes. However, I was not able to
    >>> pinpoint which combination of attributes had given the best classification
    >>> of cases from controls.
    >>> >
    >>> > Any help would be highly appreciated.
    >>> >
    >>> > Thanks
    >>> > Santosh
    >>> >
    >>> > On Tue, Apr 25, 2017 at 9:52 AM, Santosh Bhosale <
    >>> [hidden email]> wrote:
    >>> > Hi Eibe.
    >>> >
    >>> > Thanks
    >>> >
    >>> > On Tue, Apr 25, 2017 at 6:29 AM, Eibe Frank <[hidden email]>
    >>> wrote:
    >>> > You should probably learn about some basics of machine learning first.
    >>> There are some free on-line courses based on WEKA here:
    >>> >
    >>> >   https://weka.waikato.ac.nz/explorer
    >>> >
    >>> > Cheers,
    >>> > Eibe
    >>> >
    >>> > > On 25 Apr 2017, at 00:32, Santosh Bhosale <[hidden email]>
    >>> wrote:
    >>> > >
    >>> > > Hi All,
    >>> > >
    >>> > > I am proteomics expert and new to machine learning. I have protein
    >>> expression data between cases and controls where I have already found
    >>> significant markers. Now I want to predict a panel of markers which will
    >>> best classify cases from controls. I am not sure how to do that.
    >>> > >
    >>> > > It will be really good if someone urgently helps me in the context
    >>> of how the data-structure to be and what sort of pipeline to follow. So
    >>> using this information I can plot the ROC curve to best classify cases from
    >>> controls.
    >>> > >
    >>> > > Thanks in advance
    >>> > >
    >>> > > -Santosh
    >>> > > _______________________________________________
    >>> > > Wekalist mailing list
    >>> > > Send posts to: [hidden email]
    >>> > > List info and subscription status: https://list.waikato.ac.nz/mai
    >>> lman/listinfo/wekalist
    >>> > > List etiquette: http://www.cs.waikato.ac.nz/~m
    >>> l/weka/mailinglist_etiquette.html
    >>> >
    >>> > _______________________________________________
    >>> > Wekalist mailing list
    >>> > Send posts to: [hidden email]
    >>> > List info and subscription status: https://list.waikato.ac.nz/mai
    >>> lman/listinfo/wekalist
    >>> > List etiquette: http://www.cs.waikato.ac.nz/~m
    >>> l/weka/mailinglist_etiquette.html
    >>> >
    >>> >
    >>> > _______________________________________________
    >>> > Wekalist mailing list
    >>> > Send posts to: [hidden email]
    >>> > List info and subscription status: https://list.waikato.ac.nz/mai
    >>> lman/listinfo/wekalist
    >>> > List etiquette: http://www.cs.waikato.ac.nz/~m
    >>> l/weka/mailinglist_etiquette.html
    >>>
    >>> _______________________________________________
    >>> Wekalist mailing list
    >>> Send posts to: [hidden email]
    >>> List info and subscription status: https://list.waikato.ac.nz/mai
    >>> lman/listinfo/wekalist
    >>> List etiquette: http://www.cs.waikato.ac.nz/~m
    >>> l/weka/mailinglist_etiquette.html
    >>>
    >>
    >> <exampleOfWEKAInput.csv>
    >>
    >> _______________________________________________
    >> Wekalist mailing list
    >> Send posts to: [hidden email]
    >> List info and subscription status: https://list.waikato.ac.nz/mai
    >> lman/listinfo/wekalist
    >> List etiquette: http://www.cs.waikato.ac.nz/~m
    >> l/weka/mailinglist_etiquette.html
    >>
    >>
    >> _______________________________________________
    >> Wekalist mailing list
    >> Send posts to: [hidden email]
    >> List info and subscription status: https://list.waikato.ac.nz/mai
    >> lman/listinfo/wekalist
    >> List etiquette: http://www.cs.waikato.ac.nz/~m
    >> l/weka/mailinglist_etiquette.html
    >>
    >>
    >
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: <http://list.waikato.ac.nz/pipermail/wekalist/attachments/20170502/3835085d/attachment.html>
   
    ------------------------------
   
    _______________________________________________
    Wekalist mailing list
    [hidden email]
    https://list.waikato.ac.nz/mailman/listinfo/wekalist
   
   
    End of Wekalist Digest, Vol 171, Issue 17
    *****************************************
   
   
   
   
    _______________________________________________
    Wekalist mailing list
    Send posts to: [hidden email]
    List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
    List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
   


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Loading...