Newsgroups: alt.math.undergrad,sci.math,sci.stat.math From: saswss@hotellng.unx.sas.com (Warren Sarle) Subject: Re: Algorithm to identify clusters. Date: Mon, 23 Feb 1998 23:58:09 GMT In article <6csua0$cek$1@lyra.csx.cam.ac.uk>, "Mr K.S.Groves" <94ksg@eng.cam.ac.uk> writes: |> I am in the middle of a project which now requires the identification of |> clusters in a large data-set. Try reading some books on cluster analysis, especially the sections on k-means, such as: Anderberg, M.R. (1973), _Cluster Analysis for Applications_, New York: Academic Press, Inc. Hartigan, J.A. (1975), _Clustering Algorithms_, New York: John Wiley & Sons, Inc. Kaufmann, L. and Rousseeuw, P.J. (1990), _Finding Groups in Data_, New York: John Wiley & Sons, Inc. Massart, D.L. and Kaufman, L. (1983), _The Interpretation of Analytical Chemical Data by the Use of Cluster Analysis_, New York: John Wiley & Sons, Inc. Spath, H. (1980), _Cluster Analysis Algorithms_, Chichester, England: Ellis Horwood. |> I'm using about 1000 data-points each in 3 |> dimensions and would run it a few times for n=3,4 and 5 and want the |> operation to be possible in a half a day on a P200. Should take more like half a second. -- Warren S. Sarle SAS Institute Inc. The opinions expressed here saswss@unx.sas.com SAS Campus Drive are mine and not necessarily (919) 677-8000 Cary, NC 27513, USA those of SAS Institute. ============================================================================== Newsgroups: alt.math.undergrad,sci.math,sci.stat.math From: saswmh@pascal.unx.sas.com (Wolfgang Hartmann) Subject: Re: Algorithm to identify clusters. - Thank you. Date: Tue, 24 Feb 1998 20:08:55 GMT In article <1998Feb24.092310.9952@vmsmail.gov.bc.ca>, Rodger Whitlock writes: |> "Mr K.S.Groves" <94ksg@eng.cam.ac.uk> wrote: |> >>|> I'm using about 1000 data-points each in 3 |> >>|> dimensions and would run it a few times for n=3,4 and 5 and want the |> >>|> operation to be possible in a half a day on a P200. |> >> |> >>Should take more like half a second. |> > |> > |> >Thanks for the list of books. I'll probably be implementing it in |> >MatLab so 1/2 second will be unlikely! |> |> Why not use one of the existing stats packages that can do cluster |> analysis? The one I use offers several different algorithms for this |> purpose. |> |> No point in re-inventing the wheel, you know! |> |> ---- |> Rodger Whitlock |> |> I agree: in my opinion the best what you can get are Peter Rousseeuw's subroutines in S-Plus. Wolfgang -- ----------------------------------------------------------- Dr. Wolfgang M. Hartmann SAS Institute Inc. saswmh @ unx. sas. com (919) 677-8000 x7612 ----------------------------------------------------------- ============================================================================== From: "Neil Judell" Newsgroups: sci.math.num-analysis Subject: Re: Cluster analysis Date: Wed, 1 Apr 1998 06:33:22 -0500 I've had a great deal of success using vector coding methods. You simply treat the problem as if you are trying to pick a minimum representation set for coding. In particular, using Lloyd-Max with simulated annealing has worked VERY well for me. Take a peek at "Vector Quantization." Huseyin Abut, editor. IEEE Press, New York City, 1990. -- --neil "Where are we going, and why am I in this handbasket?" Konrad Hinsen wrote in message ... >I am looking for references to discussions of cluster analysis methods. >A search in the library yielded surprisingly little, but that may be >a problem of the library. > >I need to identify clusters of points in space of 3 to 50 dimensions >with as little manual intervention as possible. The number of clusters >is unknown (and may in fact be zero), and there are usually some points >that cannot reasonably be assigned to any cluster. Ideally I'd be able >to specify a closeness criterion for each dimension and then get a >list of clusters back. >-- >--------------------------------------------------------------------------- ---- >Konrad Hinsen | E-Mail: hinsen@ibs.ibs.fr >Laboratoire de Dynamique Moleculaire | Tel.: +33-4.76.88.99.28 >Institut de Biologie Structurale | Fax: +33-4.76.88.54.94 >41, av. des Martyrs | Deutsch/Esperanto/English/ >38027 Grenoble Cedex 1, France | Nederlands/Francais >--------------------------------------------------------------------------- ============================================================================== From: Peter Shenkin Newsgroups: sci.math.num-analysis Subject: Re: Cluster analysis Date: Tue, 31 Mar 1998 10:59:38 -0500 [[ emailed and posted ]] Konrad Hinsen wrote: [original quoted above -- djr] Hi, Konrad, I have found "Algorithms for Clustering Data", by Jain and Dubos (Prentice-Hall, 1988, ISBN 0-13-022278-X) to be excellent and very readable. You might also look at the discussion of the XCluster program somewhere beneath our home page. -P. -- ************** In memoriam, Grandpa Jones, 1913-1998, R.I.P. ************** * Peter S. Shenkin; Chemistry, Columbia U.; 3000 Broadway, Mail Code 3153 * ** NY, NY 10027; shenkin@columbia.edu; (212)854-5143; FAX: 678-9039 *** *MacroModel WWW page: http://www.columbia.edu/cu/chemistry/mmod/mmod.html * ============================================================================== From: Allan Hayes Newsgroups: sci.math.num-analysis Subject: Re: Cluster analysis Date: Wed, 01 Apr 1998 09:30:15 +0000 Konrad Hinsen wrote: [original quoted above -- djr] Konrad, I have written some short Mathematica code that I could send you - Clusters::usage = "Clusters[lst,d] for a list of points lst in R^n and non-complex number d gives the equivalence classes of the equivalence relation generated by the relation x ~ y <=> Max[Abs[x - y]] <=d Clusters[lst,d] for a list, lst, of non-complex numbers works similarly. Clusters[lst, {d1,d2,..dn}], gives the equivalence classes of the equivalence relation on lst generated by the relation {x1,x2,..}~{y1,y2,..}<=> Abs[x1-y1]<=d1 and Abs[x2,y2] <= d2 and ... Examples Clusters[{{1,0},{1,9},{8,9},{3,2}},2] --> {{{1, 0}, {3, 2}}, {{1, 9}}, {{8, 9}}} Clusters[{1,1,8,3},2] -->{{1,1,3},{8}}"; -- Allan Hayes Mathematica Training and Consulting Leicester, UK hay@haystack.demon.co.uk http://www.haystack.demon.co.uk voice: +44 (0)116 271 4198 fax: +44 (0)116 271 8642 ==============================================================================