Performance of clustering

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Performance of clustering

Stefan Kuhn
Hi all,
I have got an arff file with ~350 attributes and ~4000 instances. I try to
cluster this via the k-means algorithm and to do the classes to clusters
evaluation.
The clustering itself seems to be done, but the program has now been running
with the evaluation for a week (seven days) on my Pentium 4 machine, weka
started with -Xmx1024m. Is this normal? Or is something wrong? Are there any
experiences with performance of clustering? For me, a week seems pretty long.
Thanks for comments,
Stefan
--
Stefan Kuhn M. A.
Cologne University BioInformatics Center (http://www.cubic.uni-koeln.de)
Zülpicher Str. 47, 50674 Cologne
Tel: +49(0)221-470-7428   Fax: +49 (0) 221-470-7786
My public PGP key is available at http://pgp.mit.edu

_______________________________________________
Wekalist mailing list
[hidden email]
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
Reply | Threaded
Open this post in threaded view
|

Re: Performance of clustering

Mark Hall-11
Hi Stefan,

The classes to clusters evaluation in Weka uses a brute-force
algorithm to find the best assignment of class values to clusters. If
you have a lot of clusters and/or classes you will be in for a long
wait :-)

There are several other options you can try to get a feel for how the
clustering has performed. You can try visualizing the cluster
assignments and choose the actual class for one of the axis and the
cluster for the other - this will give you a kind of graphical
confusion matrix. The goodness of fit of the clustering to the data
(independent of the class values) can be evaluated by wrapping k-means
in the MakeDensityBasedClusterer and then looking at the log
likelihood.

Cheers,
Mark.

On 7/19/05, Stefan Kuhn <[hidden email]> wrote:

> Hi all,
> I have got an arff file with ~350 attributes and ~4000 instances. I try to
> cluster this via the k-means algorithm and to do the classes to clusters
> evaluation.
> The clustering itself seems to be done, but the program has now been running
> with the evaluation for a week (seven days) on my Pentium 4 machine, weka
> started with -Xmx1024m. Is this normal? Or is something wrong? Are there any
> experiences with performance of clustering? For me, a week seems pretty long.
> Thanks for comments,
> Stefan
> --
> Stefan Kuhn M. A.
> Cologne University BioInformatics Center (http://www.cubic.uni-koeln.de)
> Zülpicher Str. 47, 50674 Cologne
> Tel: +49(0)221-470-7428   Fax: +49 (0) 221-470-7786
> My public PGP key is available at http://pgp.mit.edu
>
> _______________________________________________
> Wekalist mailing list
> [hidden email]
> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>


--
Mark Hall
Department of Computer Science
University of Waikato
Hamilton
New Zealand
www.cs.waikato.ac.nz

_______________________________________________
Wekalist mailing list
[hidden email]
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
Reply | Threaded
Open this post in threaded view
|

Re: Performance of clustering

Stefan Kuhn
Hi Mark,
thanks for your answer. Since I need to come up with some more or less exact
figures, I'm afraid I need to look for something else.
Thanks,
Stefan

Am Wednesday 20 July 2005 23:31 schrieb Mark Hall:

> Hi Stefan,
>
> The classes to clusters evaluation in Weka uses a brute-force
> algorithm to find the best assignment of class values to clusters. If
> you have a lot of clusters and/or classes you will be in for a long
> wait :-)
>
> There are several other options you can try to get a feel for how the
> clustering has performed. You can try visualizing the cluster
> assignments and choose the actual class for one of the axis and the
> cluster for the other - this will give you a kind of graphical
> confusion matrix. The goodness of fit of the clustering to the data
> (independent of the class values) can be evaluated by wrapping k-means
> in the MakeDensityBasedClusterer and then looking at the log
> likelihood.
>
> Cheers,
> Mark.
>
> On 7/19/05, Stefan Kuhn <[hidden email]> wrote:
> > Hi all,
> > I have got an arff file with ~350 attributes and ~4000 instances. I try
> > to cluster this via the k-means algorithm and to do the classes to
> > clusters evaluation.
> > The clustering itself seems to be done, but the program has now been
> > running with the evaluation for a week (seven days) on my Pentium 4
> > machine, weka started with -Xmx1024m. Is this normal? Or is something
> > wrong? Are there any experiences with performance of clustering? For me,
> > a week seems pretty long. Thanks for comments,
> > Stefan
> > --
> > Stefan Kuhn M. A.
> > Cologne University BioInformatics Center (http://www.cubic.uni-koeln.de)
> > Zülpicher Str. 47, 50674 Cologne
> > Tel: +49(0)221-470-7428   Fax: +49 (0) 221-470-7786
> > My public PGP key is available at http://pgp.mit.edu
> >
> > _______________________________________________
> > Wekalist mailing list
> > [hidden email]
> > https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist

--
Stefan Kuhn M. A.
Cologne University BioInformatics Center (http://www.cubic.uni-koeln.de)
Zülpicher Str. 47, 50674 Cologne
Tel: +49(0)221-470-7428   Fax: +49 (0) 221-470-7786
My public PGP key is available at http://pgp.mit.edu

_______________________________________________
Wekalist mailing list
[hidden email]
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist