Discretize data into categories

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Discretize data into categories

mkarmi
Hi all,

I need a help in discretizing the data into categorical data ( over expression, baseline, under-expression) using the standard deviation and the mean. Can anyone guide me by steps. Shall I use supervised or unsupervised discretization?

Thank you

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Discretize data into categories

Eibe Frank-3
You can implement that by combining

with the NumericToNominal filter and finally RenameNominalValues.

Here is a possible configuration of MathExpression:

  weka.filters.unsupervised.attribute.MathExpression -E "ifelse(A<MEAN-1.96*SD,-1,ifelse(A>MEAN+1.96*SD,1, 0))"

This will replace all values of an attribute A that are smaller than

   mean_of_A - 1.96 * standard_deviation_of_A

with the value -1, all values of A that are greater than

  mean_of_A + 1.96 * standard_deviation_of_A

with the value 1, and all other values of A with value 0.

Cheers,
Eibe

On Sat, Dec 23, 2017 at 2:48 AM, Murad Al-Rajab <[hidden email]> wrote:
Hi all,

I need a help in discretizing the data into categorical data ( over expression, baseline, under-expression) using the standard deviation and the mean. Can anyone guide me by steps. Shall I use supervised or unsupervised discretization?

Thank you

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html



_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Discretize data into categories

Abdrahman0x
Hi...

I found in a paper that the authors had pre-processed the data so each
attribute has zero mean value and unit variance. Then they also discretized
the data into categorical data so that each attribute expression variable
using the respective σ (standard deviation) and μ (mean).
I had used the unsupervised "Standardize" and "MathExpression" as
multifilter inside the FilteredClassifier in the classify panel, but got
results which are so far a different from those presented in the paper
though I am using the same data set.

Can anyone explain to me how to implement such pre-processing in the correct
manner inside Weka as I am afraid that the way I am using is not correct.

Thank you,
AR



--
Sent from: https://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Discretize data into categories

Abdrahman0x
Abdrahman0x wrote

> Hi...
>
> I found in a paper that the authors had pre-processed the data so each
> attribute has zero mean value and unit variance. Then they also
> discretized
> the data into categorical data so that each attribute expression variable
> using the respective σ (standard deviation) and μ (mean).
> I had used the unsupervised "Standardize" and "MathExpression" as
> multifilter inside the FilteredClassifier in the classify panel, but got
> results which are so far a different from those presented in the paper
> though I am using the same data set.
>
> Can anyone explain to me how to implement such pre-processing in the
> correct
> manner inside Weka as I am afraid that the way I am using is not correct.
>
> Thank you,
> AR
>
>
>
> --
> Sent from: https://weka.8497.n7.nabble.com/
> _______________________________________________
> Wekalist mailing list
> Send posts to:

> Wekalist@.ac

> To subscribe, unsubscribe, etc., visit
> https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette:
> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Can anyone explain to me how to perform than?

Thank you,
AR



--
Sent from: https://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Discretize data into categories

Eibe Frank-3
In reply to this post by Abdrahman0x
" so that each attribute expression variable using the respective σ (standard deviation) and μ (mean)."

I don't understand this. Can you perhaps rephrase this or give a reference to the paper (including page numbers)?

Cheers,
Eibe

On Sat, Aug 10, 2019 at 7:59 PM Abdrahman0x <[hidden email]> wrote:
Hi...

I found in a paper that the authors had pre-processed the data so each
attribute has zero mean value and unit variance. Then they also discretized
the data into categorical data so that each attribute expression variable
using the respective σ (standard deviation) and μ (mean).
I had used the unsupervised "Standardize" and "MathExpression" as
multifilter inside the FilteredClassifier in the classify panel, but got
results which are so far a different from those presented in the paper
though I am using the same data set.

Can anyone explain to me how to implement such pre-processing in the correct
manner inside Weka as I am afraid that the way I am using is not correct.

Thank you,
AR



--
Sent from: https://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Discretize data into categories

Abdrahman0x
Thank you Eibe,

Here are the reference to the papers which preprocessed the data as
indicated:

Sujata Dash, Bichitrananda Patra, B.K. Tripathy,Study of Classification
Accuracy of Microarray Data for Cancer Classification using Multivariate and
Hybrid Feature Selection Method, IOSR Journal of Engineering (IOSRJEN), Page
116

A New Gene Selection Approach Based on Minimum Redundancy-Maximum Relevance
(MRMR) and Genetic Algorithm (GA), Page 72

Thank you,
AR



--
Sent from: https://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Discretize data into categories

Eibe Frank-2
Administrator
It looks like the data was first standardised to zero mean and unit variance. Then, attribute values smaller than -0.5 were set to -1, values greater than 0.5 were set to 1, and the rest were set to zero.

One way to do this is to use the MultiFilter, applying the Standardize filter followed by the MathExpression filter. Here is a command-line example:

  java weka.Run .MultiFilter -F .Standardize -F ".MathExpression -E \"ifelse((A>0.5), 1, ifelse((A<-0.5), -1,0))\"" -i ~/datasets/UCI/diabetes.arff

Cheers,
Eibe

> On 29/08/2019, at 5:41 AM, Abdrahman0x <[hidden email]> wrote:
>
> Thank you Eibe,
>
> Here are the reference to the papers which preprocessed the data as
> indicated:
>
> Sujata Dash, Bichitrananda Patra, B.K. Tripathy,Study of Classification
> Accuracy of Microarray Data for Cancer Classification using Multivariate and
> Hybrid Feature Selection Method, IOSR Journal of Engineering (IOSRJEN), Page
> 116
>
> A New Gene Selection Approach Based on Minimum Redundancy-Maximum Relevance
> (MRMR) and Genetic Algorithm (GA), Page 72
>
> Thank you,
> AR
>
>
>
> --
> Sent from: https://weka.8497.n7.nabble.com/
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Discretize data into categories

Abdrahman0x
Thank you Eibe for the clarification and efforts. In the papers they are
saying the:any data
larger than μ + σ/2 were transformed to state 1; any data between μ + σ/2
and μ - σ/2 were transformed to state 0; any data smaller than μ - σ/2 were
transformed to state -1.

I think the MathExpression equation shall be different, how can we use the
standard deviation and mean in the equation for each feature.

Thank you,
AR



--
Sent from: https://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Discretize data into categories

Eibe Frank-2
Administrator

They standardise the data to 0 mean and unit variance first, so the inequalities they use are really > 0.5 and < -0.5.

 

Cheers,

Eibe

 

From: [hidden email]
Sent: Saturday, 31 August 2019 1:35 AM
To: [hidden email]
Subject: Re: [Wekalist] Discretize data into categories

 

Thank you Eibe for the clarification and efforts. In the papers they are

saying the:any data

larger than μ + σ/2 were transformed to state 1; any data between μ + σ/2

and μ - σ/2 were transformed to state 0; any data smaller than μ - σ/2 were

transformed to state -1.

 

I think the MathExpression equation shall be different, how can we use the

standard deviation and mean in the equation for each feature.

 

Thank you,

AR

 

 

 

--

Sent from: https://weka.8497.n7.nabble.com/

_______________________________________________

Wekalist mailing list

Send posts to: [hidden email]

To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist

List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

 


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: Discretize data into categories

Abdrahman0x
Thank you Eibe.

I thought we must use an equation like this:

ifelse(A>MEAN*SD/2,1,ifelse(A<MEAN-SD/2,-1, 0))

Thank you,
AR



--
Sent from: https://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html