using k-means clustering in a subset from the dataset

classic Classic list List threaded Threaded
34 messages Options
12
Reply | Threaded
Open this post in threaded view
|

using k-means clustering in a subset from the dataset

hisham
Hello sirs

iam modifying in the java code in the weka filters SMOTE oversampling,  i
included a condition inside weka to create a subset having some special
meaning, and i was successful in that,  

i created a subset and named it Danger,    

1- i want to call k means clustering filter and apply it to the Danger
dataset,   which is only found and created while smote function is called/
used

2-  i want k-means clustering to create clusters from Danger subset, and
save these clusters in the same way that i created the Danger subset, ex:  
cluster 0  ( might have X samples)   ; cluster 1 ( contain Y samples) and so
on.

3- other tasks and processes I can do them on my own if I could implement
step 2 correctly


can you please help me with the basic method of how i can use the k-means
clustering inside the Smote filter.

Best Regards
Hisham Majzoub



--
Sent from: http://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: using k-means clustering in a subset from the dataset

Eibe Frank-2
Administrator
Here is some example code in the Groovy dialect of Java:

data = (new weka.core.converters.ConverterUtils.DataSource("/Users/eibe/weka-3-8-3/data/vote.arff")).getDataSet()

kmeans = new weka.clusterers.SimpleKMeans()
kmeans.setNumClusters(2)

kmeans.buildClusterer(data)

datasets = new weka.core.Instances[kmeans.getNumClusters()]
for (int i = 0; i < datasets.length; i++) {
  datasets[i] = new weka.core.Instances(data, 0)
}

for (inst in data) {
  datasets[(int)kmeans.clusterInstance(inst)].add(inst)
}

for (dataset in datasets) {
  println dataset.toString()
}

You can execute this code (after adapting the file path) by running it in the Groovy console that will be available in the Tools menu of the GUIChooser if you have the kfGroovy package installed.

Cheers,
Eibe

> On 12/11/2018, at 3:47 AM, hisham <[hidden email]> wrote:
>
> Hello sirs
>
> iam modifying in the java code in the weka filters SMOTE oversampling,  i
> included a condition inside weka to create a subset having some special
> meaning, and i was successful in that,  
>
> i created a subset and named it Danger,    
>
> 1- i want to call k means clustering filter and apply it to the Danger
> dataset,   which is only found and created while smote function is called/
> used
>
> 2-  i want k-means clustering to create clusters from Danger subset, and
> save these clusters in the same way that i created the Danger subset, ex:  
> cluster 0  ( might have X samples)   ; cluster 1 ( contain Y samples) and so
> on.
>
> 3- other tasks and processes I can do them on my own if I could implement
> step 2 correctly
>
>
> can you please help me with the basic method of how i can use the k-means
> clustering inside the Smote filter.
>
> Best Regards
> Hisham Majzoub
>
>
>
> --
> Sent from: http://weka.8497.n7.nabble.com/
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: using k-means clustering in a subset from the dataset

hisham
thanks for your reply.

iam directly modifying in
weka\src\main\java\weka\filters\supervised\instance\SMOTE.java
and then using the cmd *antexejar* to recreate weka application with those
commands. that take effects and will be shown inside weka gui interface.

how can i integrate your commands there ?

shall i import some libraries ?  such as import weka.filters.XYZ ; import
java.uti.XYZ ??



--
Sent from: http://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: using k-means clustering in a subset from the dataset

hisham
In reply to this post by Eibe Frank-2
is it possible to show me or give me a small intro of how to use the groove,

i installed in via the package manager, but i have no idea how to use it.



--
Sent from: http://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: using k-means clustering in a subset from the dataset

Eibe Frank-2
Administrator
When you select the Groovy console from the Tools menu in the GUIChooser, it will just start-up the standard console that comes with the Groovy distribution. Some information on how to use it is here:

http://groovy-lang.org/groovyconsole.html

You can ignore Point 1 under Section 2 Basics because WEKA starts the console for you.

The program I posted can be pasted into the input area of the Groovy console, modified appropriately, and then executed. Output will be printed in the output area.

I use Groovy to provide example code on the mailing list because it does not require explicit compilation like Java. However, it is generally trivial to turn the Groovy code into Java code by adding type declarations etc.

Cheers,
Eibe

> On 12/11/2018, at 9:06 PM, hisham <[hidden email]> wrote:
>
> is it possible to show me or give me a small intro of how to use the groove,
>
> i installed in via the package manager, but i have no idea how to use it.
>
>
>
> --
> Sent from: http://weka.8497.n7.nabble.com/
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: using k-means clustering in a subset from the dataset

hisham
thanks very much, i will look at it, and hope it will work.


Best Regards





--
Sent from: http://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: using k-means clustering in a subset from the dataset

hisham
i tried it in your method:

data = (new weka.core.converters.ConverterUtils.DataSource("/Program
Files/Weka-3-9/data/vote.arff")).getDataSet()

kmeans = new weka.clusterers.SimpleKMeans()
kmeans.setNumClusters(4)

kmeans.buildClusterer(data)

datasets = new weka.core.Instances[kmeans.getNumClusters()]
for (int i = 0; i < datasets.length; i++) {
  datasets[i] = new weka.core.Instances(data, 0)
}

for (inst in data) {
  datasets[(int)kmeans.clusterInstance(inst)].add(inst)
}

for (dataset in datasets) {
  println dataset.toString()
}

/*    

in the output it gave me all the features, and class values for each sample,
it seems that it printed all the features for each instance in every
cluster.

how can we change it so that as an example if we have a number of clusters =
4, then 4 different subsets are created [ cluster0.add(instanceI) ].

so that later i can compare the number of instances in each cluster and
apply my function to a specific cluster
 

- another question: if i need to insert this code in my filter.java that i
will compile more , do i have to import new library to the overall code?



--
Sent from: http://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: using k-means clustering in a subset from the dataset

hisham
when i use

import weka.clusterers;
import weka.clusterers.SimpleKMeans;
...
...
...

kmeans = new weka.clusterers.SimpleKMeans();
kmeans.setNumClusters(4);


/* the compiler is giving me an error in at  (  kmeans = new
weka.clusterers.SimpleKMeans();  )
even when i try it as   kmeans = new SimpleKMeans();
it give me an error and cannot be compiled, is there a way around it  */




--
Sent from: http://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: using k-means clustering in a subset from the dataset

Eibe Frank-2
Administrator
In reply to this post by hisham
The code generates as many datasets (i.e., Instances objects) as there are clusters. The references to those Instances objects are stored in the datasets[] array.

Cheers,
Eibe

> On 14/11/2018, at 8:09 AM, hisham <[hidden email]> wrote:
>
> i tried it in your method:
>
> data = (new weka.core.converters.ConverterUtils.DataSource("/Program
> Files/Weka-3-9/data/vote.arff")).getDataSet()
>
> kmeans = new weka.clusterers.SimpleKMeans()
> kmeans.setNumClusters(4)
>
> kmeans.buildClusterer(data)
>
> datasets = new weka.core.Instances[kmeans.getNumClusters()]
> for (int i = 0; i < datasets.length; i++) {
>  datasets[i] = new weka.core.Instances(data, 0)
> }
>
> for (inst in data) {
>  datasets[(int)kmeans.clusterInstance(inst)].add(inst)
> }
>
> for (dataset in datasets) {
>  println dataset.toString()
> }
>
> /*    
>
> in the output it gave me all the features, and class values for each sample,
> it seems that it printed all the features for each instance in every
> cluster.
>
> how can we change it so that as an example if we have a number of clusters =
> 4, then 4 different subsets are created [ cluster0.add(instanceI) ].
>
> so that later i can compare the number of instances in each cluster and
> apply my function to a specific cluster
>
>
> - another question: if i need to insert this code in my filter.java that i
> will compile more , do i have to import new library to the overall code?
>
>
>
> --
> Sent from: http://weka.8497.n7.nabble.com/
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: using k-means clustering in a subset from the dataset

Eibe Frank-2
Administrator
In reply to this post by hisham
It seems that you are now trying to make the Groovy code into actual Java code? Then you need to add type declarations for the variables.

For example,

kmeans = new SimpleKMeans();

would have to be turned into

SimpleKMeans kmeans = new SimpleKMeans();

unless you have declared the type of the kmeans variable earlier in the current scope.

Cheers,
Eibe

> On 14/11/2018, at 8:53 AM, hisham <[hidden email]> wrote:
>
> when i use
>
> import weka.clusterers;
> import weka.clusterers.SimpleKMeans;
> ...
> ...
> ...
>
> kmeans = new weka.clusterers.SimpleKMeans();
> kmeans.setNumClusters(4);
>
>
> /* the compiler is giving me an error in at  (  kmeans = new
> weka.clusterers.SimpleKMeans();  )
> even when i try it as   kmeans = new SimpleKMeans();
> it give me an error and cannot be compiled, is there a way around it  */
>
>
>
>
> --
> Sent from: http://weka.8497.n7.nabble.com/
> _______________________________________________
> Wekalist mailing list
> Send posts to: [hidden email]
> To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: using k-means clustering in a subset from the dataset

hisham
thanks,
it worked after using it as below

SimpleKMeans kmeans = new SimpleKMeans();      
 //  now if i use kmeans.xyz()     xyz will be anyfunctions and classes
found in SImpleKMeans, right ?

but then, i encountered a problem where i used:

-kmeans.setNumClusters(2);              or any number other than 2,

- kmeans.buildClusterer(danger);       and even on this point i am getting
an error as well  



--
Sent from: http://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: using k-means clustering in a subset from the dataset

Eibe Frank-3
Not sure what’s going on there, the method calls look correct.

Cheers,
Eibe

On Wed, 14 Nov 2018 at 10:07 PM, hisham <[hidden email]> wrote:
thanks,
it worked after using it as below

SimpleKMeans kmeans = new SimpleKMeans();     
 //  now if i use kmeans.xyz()     xyz will be anyfunctions and classes
found in SImpleKMeans, right ?

but then, i encountered a problem where i used:

-kmeans.setNumClusters(2);              or any number other than 2,

- kmeans.buildClusterer(danger);       and even on this point i am getting
an error as well 



--
Sent from: http://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: using k-means clustering in a subset from the dataset

hisham
at line 837: SimpleKMeans kmeans = new SimpleKMeans();    
// line 837 passed without an error,  but in line 838 it shows the error
below

    [javac]
C:\temp\weka\src\main\java\weka\filters\supervised\instance\Dangercluster.java:838:
error: <identifier> expected
    [javac] kmeans.setNumClusters(2);
    [javac]                               ^









--
Sent from: http://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: using k-means clustering in a subset from the dataset

hisham
now iam getting

    [javac]
C:\temp\weka\src\main\java\weka\filters\supervised\instance\Dangercluster.java:845:
error: ';' expected
    [javac] for (inst in danger) {
    [javac]             ^
    [javac]
C:\temp\weka\src\main\java\weka\filters\supervised\instance\Dangercluster.java:845:
error: ';' expected
    [javac] for (inst in danger) {
    [javac]                    ^



/* is it possible to create new subset, where each subset will be a cluster
?


Before i used these commands to create danger subset

-      Instances danger = getInputFormat().stringFreeStructure();

-      if ( condition = true )
- {
- danger.add(instanceI);
- }

*/



--
Sent from: http://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: using k-means clustering in a subset from the dataset

hisham
can you please help me in writing this code in java



--
Sent from: http://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: using k-means clustering in a subset from the dataset

Michael Hall
In reply to this post by hisham

On Nov 17, 2018, at 2:59 AM, hisham <[hidden email]> wrote:

[javac] for (inst in danger) {

for (Instance inst : danger) {
}


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: using k-means clustering in a subset from the dataset

Michael Hall

On Nov 18, 2018, at 5:09 AM, Michael Hall <[hidden email]> wrote:


On Nov 17, 2018, at 2:59 AM, hisham <[hidden email]> wrote:

[javac] for (inst in danger) {

for (Instance inst : danger) {
}


Just for the fun of it…

jshell
|  Welcome to JShell -- Version 9
|  For an introduction type: /help intro

jshell> /env --class-path /Applications/weka-3-8-2/weka.jar 
|  Setting new options and restoring state.

jshell> import weka.core.*

jshell> String dataStr = "/Applications/weka-3-8-2/data/iris.arff"
dataStr ==> "/Applications/weka-3-8-2/data/iris.arff"

jshell> Instances data = new weka.core.converters.ConverterUtils.DataSource(dataStr).getDataSet();
data ==> @relation iris

@attribute sepallength numeric
@a ... 9,3,5.1,1.8,Iris-virginica

jshell> int ctr = 0
ctr ==> 0

jshell> for (Instance inst : data) {
   ...> System.out.println(inst);
   ...> if (ctr++ > 10) break;
   ...> }
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5,3.6,1.4,0.2,Iris-setosa
5.4,3.9,1.7,0.4,Iris-setosa
4.6,3.4,1.4,0.3,Iris-setosa
5,3.4,1.5,0.2,Iris-setosa
4.4,2.9,1.4,0.2,Iris-setosa
4.9,3.1,1.5,0.1,Iris-setosa
5.4,3.7,1.5,0.2,Iris-setosa
4.8,3.4,1.6,0.2,Iris-setosa

jshell> /env
|     --class-path /Applications/weka-3-8-2/weka.jar

_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: using k-means clustering in a subset from the dataset

hisham
thanks for your reply,  but i did not understood it


what i have done, is :
I modified the SMOTE filter from
C:\temp\weka\src\main\java\weka\filters\supervised\instance\SMOTE.java

i successfully changed it in a way it calculates the borderline area, and
create a new subset inside the memory (danger, which is the borderline area)
that can be used for creating new synthetic samples.

what i wanted to do now, is to apply the kmeans-clustering on the danger
subset,
then create new subsets from those clusters ( each cluster will be a subset)
that i want to use them as i used the danger subset in the process below.


iam modifying the filter in java and then compiling back everything by using
*ant exejar* command.
the Groove interface works in a different way than my method, but it is
helpful in a different way as well.

could you please assist me



--
Sent from: http://weka.8497.n7.nabble.com/
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: using k-means clustering in a subset from the dataset

Michael Hall

> On Nov 18, 2018, at 10:50 AM, hisham <[hidden email]> wrote:
>
> thanks for your reply,  but i did not understood it

I was simply showing how to correct the compile error in your prior.

That was not valid java, I’m not sure about Groovy. But your error showed javac was trying to compile it which wouldn’t be groovy and wouldn’t work.

If you are not familiar with the java approach or don’t know java, then jshell might be useful for trying things out.

The rest of your thread I haven’t followed closely enough to have any suggestions.
_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
Reply | Threaded
Open this post in threaded view
|

Re: using k-means clustering in a subset from the dataset

Michael Hall
In reply to this post by hisham

On Nov 18, 2018, at 10:50 AM, hisham <[hidden email]> wrote:

ant exejar

I just noticed this, it would probably be java(/javac). 
I think it was suggested that you use the Weka Groovy console?


_______________________________________________
Wekalist mailing list
Send posts to: [hidden email]
To subscribe, unsubscribe, etc., visit https://list.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
12