Predictive markers

classic Classic list List threaded Threaded
28 messages Options
12
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Predictive markers

Santosh Bhosale
Hi All,

I am proteomics expert and new to machine learning. I have protein expression data between cases and controls where I have already found significant markers. Now I want to predict a panel of markers which will best classify cases from controls. I am not sure how to do that. 

It will be really good if someone urgently helps me in the context of how the data-structure to be and what sort of pipeline to follow. So using this information I can plot the ROC curve to best classify cases from controls.

Thanks in advance

-Santosh

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Predictive markers

Eibe Frank-2
Administrator
You should probably learn about some basics of machine learning first. There are some free on-line courses based on WEKA here:

  https://weka.waikato.ac.nz/explorer

Cheers,
Eibe

> On 25 Apr 2017, at 00:32, Santosh Bhosale <[hidden email]> wrote:
>
> Hi All,
>
> I am proteomics expert and new to machine learning. I have protein expression data between cases and controls where I have already found significant markers. Now I want to predict a panel of markers which will best classify cases from controls. I am not sure how to do that.
>
> It will be really good if someone urgently helps me in the context of how the data-structure to be and what sort of pipeline to follow. So using this information I can plot the ROC curve to best classify cases from controls.
>
> Thanks in advance
>
> -Santosh
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Predictive markers

Santosh Bhosale
Hi Eibe. 

Thanks

On Tue, Apr 25, 2017 at 6:29 AM, Eibe Frank <[hidden email]> wrote:
You should probably learn about some basics of machine learning first. There are some free on-line courses based on WEKA here:

  https://weka.waikato.ac.nz/explorer

Cheers,
Eibe

> On 25 Apr 2017, at 00:32, Santosh Bhosale <[hidden email]> wrote:
>
> Hi All,
>
> I am proteomics expert and new to machine learning. I have protein expression data between cases and controls where I have already found significant markers. Now I want to predict a panel of markers which will best classify cases from controls. I am not sure how to do that.
>
> It will be really good if someone urgently helps me in the context of how the data-structure to be and what sort of pipeline to follow. So using this information I can plot the ROC curve to best classify cases from controls.
>
> Thanks in advance
>
> -Santosh
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Predictive markers

Santosh Bhosale
Dear All,

I did following steps in WEKA.

- Uploaded the data in CSV file format
- Ran classifier using J48 and RandomForest
- J48 gave about 77% correctly classified instances
- RandomForest gave about 84% correctly classified instances

I had 264 instances and 24 attributes. However, I was not able to pinpoint which combination of attributes had given the best classification of cases from controls. 

Any help would be highly appreciated.

Thanks
Santosh

On Tue, Apr 25, 2017 at 9:52 AM, Santosh Bhosale <[hidden email]> wrote:
Hi Eibe. 

Thanks

On Tue, Apr 25, 2017 at 6:29 AM, Eibe Frank <[hidden email]> wrote:
You should probably learn about some basics of machine learning first. There are some free on-line courses based on WEKA here:

  https://weka.waikato.ac.nz/explorer

Cheers,
Eibe

> On 25 Apr 2017, at 00:32, Santosh Bhosale <[hidden email]> wrote:
>
> Hi All,
>
> I am proteomics expert and new to machine learning. I have protein expression data between cases and controls where I have already found significant markers. Now I want to predict a panel of markers which will best classify cases from controls. I am not sure how to do that.
>
> It will be really good if someone urgently helps me in the context of how the data-structure to be and what sort of pipeline to follow. So using this information I can plot the ROC curve to best classify cases from controls.
>
> Thanks in advance
>
> -Santosh
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Predictive markers

Eibe Frank-2
Administrator
Try the AttributeSelectedClassifier with WrapperSubsetEval, choosing RandomForest as the evaluator in WrapperSubsetEval and also as the base classifier in AttributeSelectedClassifier. The default BestFirstSearch method will probably work fine on your data because it is quite a small dataset.

Cheers,
Eibe

> On 26 Apr 2017, at 00:39, Santosh Bhosale <[hidden email]> wrote:
>
> Dear All,
>
> I did following steps in WEKA.
>
> - Uploaded the data in CSV file format
> - Ran classifier using J48 and RandomForest
> - J48 gave about 77% correctly classified instances
> - RandomForest gave about 84% correctly classified instances
>
> I had 264 instances and 24 attributes. However, I was not able to pinpoint which combination of attributes had given the best classification of cases from controls.
>
> Any help would be highly appreciated.
>
> Thanks
> Santosh
>
> On Tue, Apr 25, 2017 at 9:52 AM, Santosh Bhosale <[hidden email]> wrote:
> Hi Eibe.
>
> Thanks
>
> On Tue, Apr 25, 2017 at 6:29 AM, Eibe Frank <[hidden email]> wrote:
> You should probably learn about some basics of machine learning first. There are some free on-line courses based on WEKA here:
>
>   https://weka.waikato.ac.nz/explorer
>
> Cheers,
> Eibe
>
> > On 25 Apr 2017, at 00:32, Santosh Bhosale <[hidden email]> wrote:
> >
> > Hi All,
> >
> > I am proteomics expert and new to machine learning. I have protein expression data between cases and controls where I have already found significant markers. Now I want to predict a panel of markers which will best classify cases from controls. I am not sure how to do that.
> >
> > It will be really good if someone urgently helps me in the context of how the data-structure to be and what sort of pipeline to follow. So using this information I can plot the ROC curve to best classify cases from controls.
> >
> > Thanks in advance
> >
> > -Santosh
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Predictive markers

Santosh Bhosale
Hi Eibe,

Thank you very much. It worked.

-Santosh

On Wed, Apr 26, 2017 at 12:18 PM, Eibe Frank <[hidden email]> wrote:
Try the AttributeSelectedClassifier with WrapperSubsetEval, choosing RandomForest as the evaluator in WrapperSubsetEval and also as the base classifier in AttributeSelectedClassifier. The default BestFirstSearch method will probably work fine on your data because it is quite a small dataset.

Cheers,
Eibe

> On 26 Apr 2017, at 00:39, Santosh Bhosale <[hidden email]> wrote:
>
> Dear All,
>
> I did following steps in WEKA.
>
> - Uploaded the data in CSV file format
> - Ran classifier using J48 and RandomForest
> - J48 gave about 77% correctly classified instances
> - RandomForest gave about 84% correctly classified instances
>
> I had 264 instances and 24 attributes. However, I was not able to pinpoint which combination of attributes had given the best classification of cases from controls.
>
> Any help would be highly appreciated.
>
> Thanks
> Santosh
>
> On Tue, Apr 25, 2017 at 9:52 AM, Santosh Bhosale <[hidden email]> wrote:
> Hi Eibe.
>
> Thanks
>
> On Tue, Apr 25, 2017 at 6:29 AM, Eibe Frank <[hidden email]> wrote:
> You should probably learn about some basics of machine learning first. There are some free on-line courses based on WEKA here:
>
>   https://weka.waikato.ac.nz/explorer
>
> Cheers,
> Eibe
>
> > On 25 Apr 2017, at 00:32, Santosh Bhosale <[hidden email]> wrote:
> >
> > Hi All,
> >
> > I am proteomics expert and new to machine learning. I have protein expression data between cases and controls where I have already found significant markers. Now I want to predict a panel of markers which will best classify cases from controls. I am not sure how to do that.
> >
> > It will be really good if someone urgently helps me in the context of how the data-structure to be and what sort of pipeline to follow. So using this information I can plot the ROC curve to best classify cases from controls.
> >
> > Thanks in advance
> >
> > -Santosh
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Predictive markers

Santosh Bhosale
In reply to this post by Eibe Frank-2
Hi Eibe,

I tried your mentioned workflow. In result, WEKA showed a panel of 13 attributes (Protein biomarkers) classifying cases from controls. On the same data, I drew ROC curve using WEKA, which gave AUC value of 0.937. But when I took the same combination and drew the ROC curve using SPSS, it was giving me AUC of 0.73. I am not understanding this discrepancy. 

Please see attached example of CSV file used as input for WEKA.

Thanks in advance
Santosh

On Wed, Apr 26, 2017 at 12:18 PM, Eibe Frank <[hidden email]> wrote:
Try the AttributeSelectedClassifier with WrapperSubsetEval, choosing RandomForest as the evaluator in WrapperSubsetEval and also as the base classifier in AttributeSelectedClassifier. The default BestFirstSearch method will probably work fine on your data because it is quite a small dataset.

Cheers,
Eibe

> On 26 Apr 2017, at 00:39, Santosh Bhosale <[hidden email]> wrote:
>
> Dear All,
>
> I did following steps in WEKA.
>
> - Uploaded the data in CSV file format
> - Ran classifier using J48 and RandomForest
> - J48 gave about 77% correctly classified instances
> - RandomForest gave about 84% correctly classified instances
>
> I had 264 instances and 24 attributes. However, I was not able to pinpoint which combination of attributes had given the best classification of cases from controls.
>
> Any help would be highly appreciated.
>
> Thanks
> Santosh
>
> On Tue, Apr 25, 2017 at 9:52 AM, Santosh Bhosale <[hidden email]> wrote:
> Hi Eibe.
>
> Thanks
>
> On Tue, Apr 25, 2017 at 6:29 AM, Eibe Frank <[hidden email]> wrote:
> You should probably learn about some basics of machine learning first. There are some free on-line courses based on WEKA here:
>
>   https://weka.waikato.ac.nz/explorer
>
> Cheers,
> Eibe
>
> > On 25 Apr 2017, at 00:32, Santosh Bhosale <[hidden email]> wrote:
> >
> > Hi All,
> >
> > I am proteomics expert and new to machine learning. I have protein expression data between cases and controls where I have already found significant markers. Now I want to predict a panel of markers which will best classify cases from controls. I am not sure how to do that.
> >
> > It will be really good if someone urgently helps me in the context of how the data-structure to be and what sort of pipeline to follow. So using this information I can plot the ROC curve to best classify cases from controls.
> >
> > Thanks in advance
> >
> > -Santosh
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

exampleOfWEKAInput.csv (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Predictive markers

Eibe Frank-2
Administrator
Which classifier did you use in WEKA and SPSS and which evaluation method? 10-fold cross-validation?

Cheers,
Eibe

On 2/05/2017, at 6:58 PM, Santosh Bhosale <[hidden email]> wrote:

Hi Eibe,

I tried your mentioned workflow. In result, WEKA showed a panel of 13 attributes (Protein biomarkers) classifying cases from controls. On the same data, I drew ROC curve using WEKA, which gave AUC value of 0.937. But when I took the same combination and drew the ROC curve using SPSS, it was giving me AUC of 0.73. I am not understanding this discrepancy. 

Please see attached example of CSV file used as input for WEKA.

Thanks in advance
Santosh

On Wed, Apr 26, 2017 at 12:18 PM, Eibe Frank <[hidden email]> wrote:
Try the AttributeSelectedClassifier with WrapperSubsetEval, choosing RandomForest as the evaluator in WrapperSubsetEval and also as the base classifier in AttributeSelectedClassifier. The default BestFirstSearch method will probably work fine on your data because it is quite a small dataset.

Cheers,
Eibe

> On 26 Apr 2017, at 00:39, Santosh Bhosale <[hidden email]> wrote:
>
> Dear All,
>
> I did following steps in WEKA.
>
> - Uploaded the data in CSV file format
> - Ran classifier using J48 and RandomForest
> - J48 gave about 77% correctly classified instances
> - RandomForest gave about 84% correctly classified instances
>
> I had 264 instances and 24 attributes. However, I was not able to pinpoint which combination of attributes had given the best classification of cases from controls.
>
> Any help would be highly appreciated.
>
> Thanks
> Santosh
>
> On Tue, Apr 25, 2017 at 9:52 AM, Santosh Bhosale <[hidden email]> wrote:
> Hi Eibe.
>
> Thanks
>
> On Tue, Apr 25, 2017 at 6:29 AM, Eibe Frank <[hidden email]> wrote:
> You should probably learn about some basics of machine learning first. There are some free on-line courses based on WEKA here:
>
>   https://weka.waikato.ac.nz/explorer
>
> Cheers,
> Eibe
>
> > On 25 Apr 2017, at 00:32, Santosh Bhosale <[hidden email]> wrote:
> >
> > Hi All,
> >
> > I am proteomics expert and new to machine learning. I have protein expression data between cases and controls where I have already found significant markers. Now I want to predict a panel of markers which will best classify cases from controls. I am not sure how to do that.
> >
> > It will be really good if someone urgently helps me in the context of how the data-structure to be and what sort of pipeline to follow. So using this information I can plot the ROC curve to best classify cases from controls.
> >
> > Thanks in advance
> >
> > -Santosh
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

<exampleOfWEKAInput.csv>
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Predictive markers

Santosh Bhosale
Hi Eibe,

I tried the one you said below

"Try the AttributeSelectedClassifier with WrapperSubsetEval, choosing RandomForest as the evaluator in WrapperSubsetEval and also as the base classifier in AttributeSelectedClassifier. The default BestFirstSearch method will probably work fine on your data because it is quite a small dataset"

The SPSS, I just used for ROC curve plotting not for classification. So the only concern is why such discrepancy in the AUCs. Did I make the correct use of WEKA? Did my CSV input ot WEKA was correct?

Thanks
Santosh



On Tue, May 2, 2017 at 11:52 AM, Eibe Frank <[hidden email]> wrote:
Which classifier did you use in WEKA and SPSS and which evaluation method? 10-fold cross-validation?

Cheers,
Eibe

On 2/05/2017, at 6:58 PM, Santosh Bhosale <[hidden email]> wrote:

Hi Eibe,

I tried your mentioned workflow. In result, WEKA showed a panel of 13 attributes (Protein biomarkers) classifying cases from controls. On the same data, I drew ROC curve using WEKA, which gave AUC value of 0.937. But when I took the same combination and drew the ROC curve using SPSS, it was giving me AUC of 0.73. I am not understanding this discrepancy. 

Please see attached example of CSV file used as input for WEKA.

Thanks in advance
Santosh

On Wed, Apr 26, 2017 at 12:18 PM, Eibe Frank <[hidden email]> wrote:
Try the AttributeSelectedClassifier with WrapperSubsetEval, choosing RandomForest as the evaluator in WrapperSubsetEval and also as the base classifier in AttributeSelectedClassifier. The default BestFirstSearch method will probably work fine on your data because it is quite a small dataset.

Cheers,
Eibe

> On 26 Apr 2017, at 00:39, Santosh Bhosale <[hidden email]> wrote:
>
> Dear All,
>
> I did following steps in WEKA.
>
> - Uploaded the data in CSV file format
> - Ran classifier using J48 and RandomForest
> - J48 gave about 77% correctly classified instances
> - RandomForest gave about 84% correctly classified instances
>
> I had 264 instances and 24 attributes. However, I was not able to pinpoint which combination of attributes had given the best classification of cases from controls.
>
> Any help would be highly appreciated.
>
> Thanks
> Santosh
>
> On Tue, Apr 25, 2017 at 9:52 AM, Santosh Bhosale <[hidden email]> wrote:
> Hi Eibe.
>
> Thanks
>
> On Tue, Apr 25, 2017 at 6:29 AM, Eibe Frank <[hidden email]> wrote:
> You should probably learn about some basics of machine learning first. There are some free on-line courses based on WEKA here:
>
>   https://weka.waikato.ac.nz/explorer
>
> Cheers,
> Eibe
>
> > On 25 Apr 2017, at 00:32, Santosh Bhosale <[hidden email]> wrote:
> >
> > Hi All,
> >
> > I am proteomics expert and new to machine learning. I have protein expression data between cases and controls where I have already found significant markers. Now I want to predict a panel of markers which will best classify cases from controls. I am not sure how to do that.
> >
> > It will be really good if someone urgently helps me in the context of how the data-structure to be and what sort of pipeline to follow. So using this information I can plot the ROC curve to best classify cases from controls.
> >
> > Thanks in advance
> >
> > -Santosh
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

<exampleOfWEKAInput.csv>
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Predictive markers

Santosh Bhosale
Hi,

Sorry for spamming. Yes, I used 10-fold-cross validation.

Santosh

On Tue, May 2, 2017 at 12:00 PM, Santosh Bhosale <[hidden email]> wrote:
Hi Eibe,

I tried the one you said below

"Try the AttributeSelectedClassifier with WrapperSubsetEval, choosing RandomForest as the evaluator in WrapperSubsetEval and also as the base classifier in AttributeSelectedClassifier. The default BestFirstSearch method will probably work fine on your data because it is quite a small dataset"

The SPSS, I just used for ROC curve plotting not for classification. So the only concern is why such discrepancy in the AUCs. Did I make the correct use of WEKA? Did my CSV input ot WEKA was correct?

Thanks
Santosh



On Tue, May 2, 2017 at 11:52 AM, Eibe Frank <[hidden email]> wrote:
Which classifier did you use in WEKA and SPSS and which evaluation method? 10-fold cross-validation?

Cheers,
Eibe

On 2/05/2017, at 6:58 PM, Santosh Bhosale <[hidden email]> wrote:

Hi Eibe,

I tried your mentioned workflow. In result, WEKA showed a panel of 13 attributes (Protein biomarkers) classifying cases from controls. On the same data, I drew ROC curve using WEKA, which gave AUC value of 0.937. But when I took the same combination and drew the ROC curve using SPSS, it was giving me AUC of 0.73. I am not understanding this discrepancy. 

Please see attached example of CSV file used as input for WEKA.

Thanks in advance
Santosh

On Wed, Apr 26, 2017 at 12:18 PM, Eibe Frank <[hidden email]> wrote:
Try the AttributeSelectedClassifier with WrapperSubsetEval, choosing RandomForest as the evaluator in WrapperSubsetEval and also as the base classifier in AttributeSelectedClassifier. The default BestFirstSearch method will probably work fine on your data because it is quite a small dataset.

Cheers,
Eibe

> On 26 Apr 2017, at 00:39, Santosh Bhosale <[hidden email]> wrote:
>
> Dear All,
>
> I did following steps in WEKA.
>
> - Uploaded the data in CSV file format
> - Ran classifier using J48 and RandomForest
> - J48 gave about 77% correctly classified instances
> - RandomForest gave about 84% correctly classified instances
>
> I had 264 instances and 24 attributes. However, I was not able to pinpoint which combination of attributes had given the best classification of cases from controls.
>
> Any help would be highly appreciated.
>
> Thanks
> Santosh
>
> On Tue, Apr 25, 2017 at 9:52 AM, Santosh Bhosale <[hidden email]> wrote:
> Hi Eibe.
>
> Thanks
>
> On Tue, Apr 25, 2017 at 6:29 AM, Eibe Frank <[hidden email]> wrote:
> You should probably learn about some basics of machine learning first. There are some free on-line courses based on WEKA here:
>
>   https://weka.waikato.ac.nz/explorer
>
> Cheers,
> Eibe
>
> > On 25 Apr 2017, at 00:32, Santosh Bhosale <[hidden email]> wrote:
> >
> > Hi All,
> >
> > I am proteomics expert and new to machine learning. I have protein expression data between cases and controls where I have already found significant markers. Now I want to predict a panel of markers which will best classify cases from controls. I am not sure how to do that.
> >
> > It will be really good if someone urgently helps me in the context of how the data-structure to be and what sort of pipeline to follow. So using this information I can plot the ROC curve to best classify cases from controls.
> >
> > Thanks in advance
> >
> > -Santosh
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

<exampleOfWEKAInput.csv>
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html




_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Predictive markers

Eibe Frank-2
Administrator
In reply to this post by Santosh Bhosale

> On 2/05/2017, at 9:00 PM, Santosh Bhosale <[hidden email]> wrote:
>
> The SPSS, I just used for ROC curve plotting not for classification. So the only concern is why such discrepancy in the AUCs. Did I make the correct use of WEKA? Did my CSV input ot WEKA was correct?

Does that mean you exported the classifier's class probability estimates for the instances from WEKA to plot the curve in SPSS? Or what is the data that the SPSS curve is based on?

Cheers,
Eibe
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Predictive markers

Santosh Bhosale
Hi Eibe,

I didn't export anything from WEKA. The CSV file which was used as input for WEKA were used in SPSS for calculating the ROC curve.

Santosh

On Wed, May 3, 2017 at 1:19 AM, Eibe Frank <[hidden email]> wrote:

> On 2/05/2017, at 9:00 PM, Santosh Bhosale <[hidden email]> wrote:
>
> The SPSS, I just used for ROC curve plotting not for classification. So the only concern is why such discrepancy in the AUCs. Did I make the correct use of WEKA? Did my CSV input ot WEKA was correct?

Does that mean you exported the classifier's class probability estimates for the instances from WEKA to plot the curve in SPSS? Or what is the data that the SPSS curve is based on?

Cheers,
Eibe
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Predictive markers

Eibe Frank-2
Administrator
It would seem that you must have applied some form of learning algorithm in SPSS as well then.

The ROC curve must have come from somewhere.

Cheers,
Eibe

On 3/05/2017, at 5:13 PM, Santosh Bhosale <[hidden email]> wrote:

Hi Eibe,

I didn't export anything from WEKA. The CSV file which was used as input for WEKA were used in SPSS for calculating the ROC curve.

Santosh

On Wed, May 3, 2017 at 1:19 AM, Eibe Frank <[hidden email]> wrote:

> On 2/05/2017, at 9:00 PM, Santosh Bhosale <[hidden email]> wrote:
>
> The SPSS, I just used for ROC curve plotting not for classification. So the only concern is why such discrepancy in the AUCs. Did I make the correct use of WEKA? Did my CSV input ot WEKA was correct?

Does that mean you exported the classifier's class probability estimates for the instances from WEKA to plot the curve in SPSS? Or what is the data that the SPSS curve is based on?

Cheers,
Eibe
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Predictive markers

Santosh Bhosale
Hi Eiber,

I have not selected cross-validations folds 10 last time. This time I did it and it gave me a list of attributes with the number of folds (%). Now I have sorted out the attributes based on the %. What is the usual cutoff criteria in selecting attributes? I am thinking to include attributes having more than or equal to 20% folds. However, I am not understanding what does this mean?

Thanks
Santosh

On Wed, May 3, 2017 at 12:22 PM, Eibe Frank <[hidden email]> wrote:
It would seem that you must have applied some form of learning algorithm in SPSS as well then.

The ROC curve must have come from somewhere.

Cheers,
Eibe

On 3/05/2017, at 5:13 PM, Santosh Bhosale <[hidden email]> wrote:

Hi Eibe,

I didn't export anything from WEKA. The CSV file which was used as input for WEKA were used in SPSS for calculating the ROC curve.

Santosh

On Wed, May 3, 2017 at 1:19 AM, Eibe Frank <[hidden email]> wrote:

> On 2/05/2017, at 9:00 PM, Santosh Bhosale <[hidden email]> wrote:
>
> The SPSS, I just used for ROC curve plotting not for classification. So the only concern is why such discrepancy in the AUCs. Did I make the correct use of WEKA? Did my CSV input ot WEKA was correct?

Does that mean you exported the classifier's class probability estimates for the instances from WEKA to plot the curve in SPSS? Or what is the data that the SPSS curve is based on?

Cheers,
Eibe
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Predictive markers

Santosh Bhosale
Sorry. One more thing is that which one out of this I should consider as statistically significant since this time I have included all my attributes. 

Thanks
S

On Thu, May 4, 2017 at 2:33 PM, Santosh Bhosale <[hidden email]> wrote:
Hi Eiber,

I have not selected cross-validations folds 10 last time. This time I did it and it gave me a list of attributes with the number of folds (%). Now I have sorted out the attributes based on the %. What is the usual cutoff criteria in selecting attributes? I am thinking to include attributes having more than or equal to 20% folds. However, I am not understanding what does this mean?

Thanks
Santosh

On Wed, May 3, 2017 at 12:22 PM, Eibe Frank <[hidden email]> wrote:
It would seem that you must have applied some form of learning algorithm in SPSS as well then.

The ROC curve must have come from somewhere.

Cheers,
Eibe

On 3/05/2017, at 5:13 PM, Santosh Bhosale <[hidden email]> wrote:

Hi Eibe,

I didn't export anything from WEKA. The CSV file which was used as input for WEKA were used in SPSS for calculating the ROC curve.

Santosh

On Wed, May 3, 2017 at 1:19 AM, Eibe Frank <[hidden email]> wrote:

> On 2/05/2017, at 9:00 PM, Santosh Bhosale <[hidden email]> wrote:
>
> The SPSS, I just used for ROC curve plotting not for classification. So the only concern is why such discrepancy in the AUCs. Did I make the correct use of WEKA? Did my CSV input ot WEKA was correct?

Does that mean you exported the classifier's class probability estimates for the instances from WEKA to plot the curve in SPSS? Or what is the data that the SPSS curve is based on?

Cheers,
Eibe
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html




_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Predictive markers

Eibe Frank-2
Administrator
In reply to this post by Santosh Bhosale
It looks like you performed k-fold cross-validation-based evaluation in the attribute selection panel of the Explorer. 20% means that the corresponding attribute was selected by the chosen attribute selector in 20% of the k folds of the cross-validation.

It’s normally best to perform attribute selection for regression or classification using the AttributeSelectedClassifier instead of the attribute selection panel. This is to ensure that the attributes are selected based on the training set only and that the test data is not used for this.

Cheers,
Eibe

> On 4/05/2017, at 11:33 PM, Santosh Bhosale <[hidden email]> wrote:
>
> Hi Eiber,
>
> I have not selected cross-validations folds 10 last time. This time I did it and it gave me a list of attributes with the number of folds (%). Now I have sorted out the attributes based on the %. What is the usual cutoff criteria in selecting attributes? I am thinking to include attributes having more than or equal to 20% folds. However, I am not understanding what does this mean?
>
> Thanks
> Santosh
>
> On Wed, May 3, 2017 at 12:22 PM, Eibe Frank <[hidden email]> wrote:
> It would seem that you must have applied some form of learning algorithm in SPSS as well then.
>
> The ROC curve must have come from somewhere.
>
> Cheers,
> Eibe
>
> On 3/05/2017, at 5:13 PM, Santosh Bhosale <[hidden email]> wrote:
>
>> Hi Eibe,
>>
>> I didn't export anything from WEKA. The CSV file which was used as input for WEKA were used in SPSS for calculating the ROC curve.
>>
>> Santosh
>>
>> On Wed, May 3, 2017 at 1:19 AM, Eibe Frank <[hidden email]> wrote:
>>
>> > On 2/05/2017, at 9:00 PM, Santosh Bhosale <[hidden email]> wrote:
>> >
>> > The SPSS, I just used for ROC curve plotting not for classification. So the only concern is why such discrepancy in the AUCs. Did I make the correct use of WEKA? Did my CSV input ot WEKA was correct?
>>
>> Does that mean you exported the classifier's class probability estimates for the instances from WEKA to plot the curve in SPSS? Or what is the data that the SPSS curve is based on?
>>
>> Cheers,
>> Eibe
>> _______________________________________________
>> Wekalist mailing list
>> Send posts to: [hidden email]
>> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
>> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>
>> _______________________________________________
>> Wekalist mailing list
>> Send posts to: [hidden email]
>> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
>> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Predictive markers

Eibe Frank-2
Administrator
In reply to this post by Santosh Bhosale
Generally, in machine learning, the goal is to select the set of attributes that yields the greatest predictive performance. Statistical significance of individual attributes is not a common concern and cannot be computed directly in WEKA.

Having said, you could use the Experimenter to perform a significance test that compares the performance of a certain base classifier trained both with and without a chosen attribute of interest. This can be implemented by using the FilteredClassifier in conjunction with the Remove filter and the base classifier of your choice (e.g., RandomForest).

Cheers,
Eibe

> On 4/05/2017, at 11:43 PM, Santosh Bhosale <[hidden email]> wrote:
>
> Sorry. One more thing is that which one out of this I should consider as statistically significant since this time I have included all my attributes.
>
> Thanks
> S
>
> On Thu, May 4, 2017 at 2:33 PM, Santosh Bhosale <[hidden email]> wrote:
> Hi Eiber,
>
> I have not selected cross-validations folds 10 last time. This time I did it and it gave me a list of attributes with the number of folds (%). Now I have sorted out the attributes based on the %. What is the usual cutoff criteria in selecting attributes? I am thinking to include attributes having more than or equal to 20% folds. However, I am not understanding what does this mean?
>
> Thanks
> Santosh
>
> On Wed, May 3, 2017 at 12:22 PM, Eibe Frank <[hidden email]> wrote:
> It would seem that you must have applied some form of learning algorithm in SPSS as well then.
>
> The ROC curve must have come from somewhere.
>
> Cheers,
> Eibe
>
> On 3/05/2017, at 5:13 PM, Santosh Bhosale <[hidden email]> wrote:
>
>> Hi Eibe,
>>
>> I didn't export anything from WEKA. The CSV file which was used as input for WEKA were used in SPSS for calculating the ROC curve.
>>
>> Santosh
>>
>> On Wed, May 3, 2017 at 1:19 AM, Eibe Frank <[hidden email]> wrote:
>>
>> > On 2/05/2017, at 9:00 PM, Santosh Bhosale <[hidden email]> wrote:
>> >
>> > The SPSS, I just used for ROC curve plotting not for classification. So the only concern is why such discrepancy in the AUCs. Did I make the correct use of WEKA? Did my CSV input ot WEKA was correct?
>>
>> Does that mean you exported the classifier's class probability estimates for the instances from WEKA to plot the curve in SPSS? Or what is the data that the SPSS curve is based on?
>>
>> Cheers,
>> Eibe
>> _______________________________________________
>> Wekalist mailing list
>> Send posts to: [hidden email]
>> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
>> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>
>> _______________________________________________
>> Wekalist mailing list
>> Send posts to: [hidden email]
>> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
>> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Predictive markers

Santosh Bhosale
Thank you Eibe. 
On Fri, 5 May 2017 at 6:17 AM, Eibe Frank <[hidden email]> wrote:
Generally, in machine learning, the goal is to select the set of attributes that yields the greatest predictive performance. Statistical significance of individual attributes is not a common concern and cannot be computed directly in WEKA.

Having said, you could use the Experimenter to perform a significance test that compares the performance of a certain base classifier trained both with and without a chosen attribute of interest. This can be implemented by using the FilteredClassifier in conjunction with the Remove filter and the base classifier of your choice (e.g., RandomForest).

Cheers,
Eibe

> On 4/05/2017, at 11:43 PM, Santosh Bhosale <[hidden email]> wrote:
>
> Sorry. One more thing is that which one out of this I should consider as statistically significant since this time I have included all my attributes.
>
> Thanks
> S
>
> On Thu, May 4, 2017 at 2:33 PM, Santosh Bhosale <[hidden email]> wrote:
> Hi Eiber,
>
> I have not selected cross-validations folds 10 last time. This time I did it and it gave me a list of attributes with the number of folds (%). Now I have sorted out the attributes based on the %. What is the usual cutoff criteria in selecting attributes? I am thinking to include attributes having more than or equal to 20% folds. However, I am not understanding what does this mean?
>
> Thanks
> Santosh
>
> On Wed, May 3, 2017 at 12:22 PM, Eibe Frank <[hidden email]> wrote:
> It would seem that you must have applied some form of learning algorithm in SPSS as well then.
>
> The ROC curve must have come from somewhere.
>
> Cheers,
> Eibe
>
> On 3/05/2017, at 5:13 PM, Santosh Bhosale <[hidden email]> wrote:
>
>> Hi Eibe,
>>
>> I didn't export anything from WEKA. The CSV file which was used as input for WEKA were used in SPSS for calculating the ROC curve.
>>
>> Santosh
>>
>> On Wed, May 3, 2017 at 1:19 AM, Eibe Frank <[hidden email]> wrote:
>>
>> > On 2/05/2017, at 9:00 PM, Santosh Bhosale <[hidden email]> wrote:
>> >
>> > The SPSS, I just used for ROC curve plotting not for classification. So the only concern is why such discrepancy in the AUCs. Did I make the correct use of WEKA? Did my CSV input ot WEKA was correct?
>>
>> Does that mean you exported the classifier's class probability estimates for the instances from WEKA to plot the curve in SPSS? Or what is the data that the SPSS curve is based on?
>>
>> Cheers,
>> Eibe
>> _______________________________________________
>> Wekalist mailing list
>> Send posts to: [hidden email]
>> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
>> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>
>> _______________________________________________
>> Wekalist mailing list
>> Send posts to: [hidden email]
>> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
>> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Predictive markers

Santosh Bhosale
Hi Eibe,

With your inputs, I was able to identify a set of attributes (15) which gives better predictability. Then I did ROC curve analysis for those 15 attributes which gave me 0.80 AUC value. Now the next question is, out of 15 attributes, one is known risk factor which gives AUC value of 0.71 and the rest 14 ROC curve AUC value is 0.09. So I want to test whether the rest 14 attributes the contribution to ROC curve is significant or not? I think I can do that using WEKA experimenter tab but not sure how to do it? 

Additional question is that how I can check the AUC value in KnowledgeFlow of WEKA. Apparently, it doesn't display.

Thanks a lot!!
Santosh

On Sat, May 6, 2017 at 11:34 AM, Santosh Bhosale <[hidden email]> wrote:
Thank you Eibe. 

On Fri, 5 May 2017 at 6:17 AM, Eibe Frank <[hidden email]> wrote:
Generally, in machine learning, the goal is to select the set of attributes that yields the greatest predictive performance. Statistical significance of individual attributes is not a common concern and cannot be computed directly in WEKA.

Having said, you could use the Experimenter to perform a significance test that compares the performance of a certain base classifier trained both with and without a chosen attribute of interest. This can be implemented by using the FilteredClassifier in conjunction with the Remove filter and the base classifier of your choice (e.g., RandomForest).

Cheers,
Eibe

> On 4/05/2017, at 11:43 PM, Santosh Bhosale <[hidden email]> wrote:
>
> Sorry. One more thing is that which one out of this I should consider as statistically significant since this time I have included all my attributes.
>
> Thanks
> S
>
> On Thu, May 4, 2017 at 2:33 PM, Santosh Bhosale <[hidden email]> wrote:
> Hi Eiber,
>
> I have not selected cross-validations folds 10 last time. This time I did it and it gave me a list of attributes with the number of folds (%). Now I have sorted out the attributes based on the %. What is the usual cutoff criteria in selecting attributes? I am thinking to include attributes having more than or equal to 20% folds. However, I am not understanding what does this mean?
>
> Thanks
> Santosh
>
> On Wed, May 3, 2017 at 12:22 PM, Eibe Frank <[hidden email]> wrote:
> It would seem that you must have applied some form of learning algorithm in SPSS as well then.
>
> The ROC curve must have come from somewhere.
>
> Cheers,
> Eibe
>
> On 3/05/2017, at 5:13 PM, Santosh Bhosale <[hidden email]> wrote:
>
>> Hi Eibe,
>>
>> I didn't export anything from WEKA. The CSV file which was used as input for WEKA were used in SPSS for calculating the ROC curve.
>>
>> Santosh
>>
>> On Wed, May 3, 2017 at 1:19 AM, Eibe Frank <[hidden email]> wrote:
>>
>> > On 2/05/2017, at 9:00 PM, Santosh Bhosale <[hidden email]> wrote:
>> >
>> > The SPSS, I just used for ROC curve plotting not for classification. So the only concern is why such discrepancy in the AUCs. Did I make the correct use of WEKA? Did my CSV input ot WEKA was correct?
>>
>> Does that mean you exported the classifier's class probability estimates for the instances from WEKA to plot the curve in SPSS? Or what is the data that the SPSS curve is based on?
>>
>> Cheers,
>> Eibe
>> _______________________________________________
>> Wekalist mailing list
>> Send posts to: [hidden email]
>> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
>> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>>
>> _______________________________________________
>> Wekalist mailing list
>> Send posts to: [hidden email]
>> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
>> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Predictive markers

Eibe Frank-2
Administrator
One way to do this is to run WEKA's FilteredClassifier in conjunction with the Remove filter, and your chosen base classifier, to create classifier configurations corresponding to different subsets of attributes. You can then compare those classifier configurations in the Results panel of the Experimenter and perform a (corrected) paired t-test to check for significance of differences.

BTW: Regarding your statement "I was able to identify a set of attributes (15) which gives better predictability”: You have to be careful not to choose attributes based on performance on the final test data used to establish performance estimates because this will optimistically bias your accuracy estimates. Unless you have a separate test set for evaluating your final models after attribute selection, which you have not used for selecting attributes at all, use the AttributeSelectedClassifier rather than the attribute selection panel to perform attribute selection when building models. This will ensure that only the training data is used for selecting attributes, e.g., in a k-fold cross-validation. Alternatively, use a completely fresh set of test data to evaluate your models once you have manually selected attributes in the attribute selection panel.

Cheers,
Eibe

> On 11/05/2017, at 7:56 PM, Santosh Bhosale <[hidden email]> wrote:
>
> Hi Eibe,
>
> With your inputs, I was able to identify a set of attributes (15) which gives better predictability. Then I did ROC curve analysis for those 15 attributes which gave me 0.80 AUC value. Now the next question is, out of 15 attributes, one is known risk factor which gives AUC value of 0.71 and the rest 14 ROC curve AUC value is 0.09. So I want to test whether the rest 14 attributes the contribution to ROC curve is significant or not? I think I can do that using WEKA experimenter tab but not sure how to do it?
>
> Additional question is that how I can check the AUC value in KnowledgeFlow of WEKA. Apparently, it doesn't display.
>
> Thanks a lot!!
> Santosh
>
> On Sat, May 6, 2017 at 11:34 AM, Santosh Bhosale <[hidden email]> wrote:
> Thank you Eibe.
>
> On Fri, 5 May 2017 at 6:17 AM, Eibe Frank <[hidden email]> wrote:
> Generally, in machine learning, the goal is to select the set of attributes that yields the greatest predictive performance. Statistical significance of individual attributes is not a common concern and cannot be computed directly in WEKA.
>
> Having said, you could use the Experimenter to perform a significance test that compares the performance of a certain base classifier trained both with and without a chosen attribute of interest. This can be implemented by using the FilteredClassifier in conjunction with the Remove filter and the base classifier of your choice (e.g., RandomForest).
>
> Cheers,
> Eibe
>
> > On 4/05/2017, at 11:43 PM, Santosh Bhosale <[hidden email]> wrote:
> >
> > Sorry. One more thing is that which one out of this I should consider as statistically significant since this time I have included all my attributes.
> >
> > Thanks
> > S
> >
> > On Thu, May 4, 2017 at 2:33 PM, Santosh Bhosale <[hidden email]> wrote:
> > Hi Eiber,
> >
> > I have not selected cross-validations folds 10 last time. This time I did it and it gave me a list of attributes with the number of folds (%). Now I have sorted out the attributes based on the %. What is the usual cutoff criteria in selecting attributes? I am thinking to include attributes having more than or equal to 20% folds. However, I am not understanding what does this mean?
> >
> > Thanks
> > Santosh
> >
> > On Wed, May 3, 2017 at 12:22 PM, Eibe Frank <[hidden email]> wrote:
> > It would seem that you must have applied some form of learning algorithm in SPSS as well then.
> >
> > The ROC curve must have come from somewhere.
> >
> > Cheers,
> > Eibe
> >
> > On 3/05/2017, at 5:13 PM, Santosh Bhosale <[hidden email]> wrote:
> >
> >> Hi Eibe,
> >>
> >> I didn't export anything from WEKA. The CSV file which was used as input for WEKA were used in SPSS for calculating the ROC curve.
> >>
> >> Santosh
> >>
> >> On Wed, May 3, 2017 at 1:19 AM, Eibe Frank <[hidden email]> wrote:
> >>
> >> > On 2/05/2017, at 9:00 PM, Santosh Bhosale <[hidden email]> wrote:
> >> >
> >> > The SPSS, I just used for ROC curve plotting not for classification. So the only concern is why such discrepancy in the AUCs. Did I make the correct use of WEKA? Did my CSV input ot WEKA was correct?
> >>
> >> Does that mean you exported the classifier's class probability estimates for the instances from WEKA to plot the curve in SPSS? Or what is the data that the SPSS curve is based on?
> >>
> >> Cheers,
> >> Eibe
> >> _______________________________________________
> >> Wekalist mailing list
> >> Send posts to: [hidden email]
> >> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> >> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> >>
> >> _______________________________________________
> >> Wekalist mailing list
> >> Send posts to: [hidden email]
> >> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> >> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> >
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
> >
> >
> >
> > _______________________________________________
> > Wekalist mailing list
> > Send posts to: [hidden email]
> > List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> > List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
12
Loading...