Classification - Two datasets one missing attribute to classify

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Classification - Two datasets one missing attribute to classify

Meowgi
Hello,

Im writting a program using the weka API but even on explorer i dont know how to make this work. I have a dataset to which i apply a clustering algorithm and a filter to add the cluster information to the dataset.

With that, i want to create a classifier (J48 for example) using the newly created dataset, and lets say i'll classify attribute named "cluster". This all works, but how do i classify a dataset that hasnt been clustered? As in, is missing the cluster attribute, which I'm trying to predict using the classifier?

Thanks.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Classification - Two datasets one missing attribute to classify

Eibe Frank-2
Administrator
Why do you want to build a classifier? You can just get cluster memberships for the new batch of data from the AddCluster filter.

java weka.Run .AddCluster -i ~/IdeaProjects/Weka/trunk/wekadocs/data/segment-challenge.arff -o sc.arff -r ~/IdeaProjects/Weka/trunk/wekadocs/data/segment-test.arff -s st.arff -b

The clustering model will be built from the first batch of data and then used to assign cluster labels to both batches, using the clusterInstance(Instance) method.

Cheers,
Eibe

> On 10/05/2017, at 2:49 PM, Meowgi <[hidden email]> wrote:
>
> Hello,
>
> Im writting a program using the weka API but even on explorer i dont know
> how to make this work. I have a dataset to which i apply a clustering
> algorithm and a filter to add the cluster information to the dataset.
>
> With that, i want to create a classifier (J48 for example) using the newly
> created dataset, and lets say i'll classify attribute named "cluster". This
> all works, but how do i classify a dataset that hasnt been clustered? As in,
> is missing the cluster attribute, which I'm trying to predict using the
> classifier?
>
> Thanks.
>
>
>
> --
> View this message in context: http://weka.8497.n7.nabble.com/Classification-Two-datasets-one-missing-attribute-to-classify-tp40567.html
> Sent from the WEKA mailing list archive at Nabble.com.
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Classification - Two datasets one missing attribute to classify

Meowgi
Thank you for replying.

The idea was that classification can only predict a certain attribute, but in my dataset, I have to predict a value that doesnt exist yet (reason why i was trying to use the clustering). I managed to cluster the dataset and create a new file with it, but i wanted to know if I could create a decision tree or any other classification method.

My dataset is of a power history of a building, that contains time, devices and power consumption. In order to find a context I clusterize it but would then want to create a decision tree for the cluster. I would have a dataset for "training" the cluster, create a decision tree with it, and then just use one or multiple non-clustered instances to predict the cluster.

Maybe what im doing doesnt make sense, but thank you for your time in answering.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Classification - Two datasets one missing attribute to classify

Eibe Frank-2
Administrator
Yes, you can construct a decision tree if you want an interpretable model (although you can also just use the clusterer itself to find cluster memberships for new instances, see my previous example).

To apply the tree to new data, just add the “cluster label” attribute to the new data and set all its values to “missing” (a missing value is represented by a question mark in ARFF files).

Cheers,
Eibe

> On 12/05/2017, at 12:44 AM, Meowgi <[hidden email]> wrote:
>
> Thank you for replying.
>
> The idea was that classification can only predict a certain attribute, but
> in my dataset, I have to predict a value that doesnt exist yet (reason why i
> was trying to use the clustering). I managed to cluster the dataset and
> create a new file with it, but i wanted to know if I could create a decision
> tree or any other classification method.
>
> My dataset is of a power history of a building, that contains time, devices
> and power consumption. In order to find a context I clusterize it but would
> then want to create a decision tree for the cluster. I would have a dataset
> for "training" the cluster, create a decision tree with it, and then just
> use one or multiple non-clustered instances to predict the cluster.
>
> Maybe what im doing doesnt make sense, but thank you for your time in
> answering.
>
>
>
> --
> View this message in context: http://weka.8497.n7.nabble.com/Classification-Two-datasets-one-missing-attribute-to-classify-tp40567p40587.html
> Sent from the WEKA mailing list archive at Nabble.com.
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
List info and subscription status: https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Loading...