| Organizers: | Dennis D. Cox,
dcox@rice.edu
David W. Scott, scottdw@rice.edu |
Speakers
3:45 p.m.
Remarks on Fitting and Interpreting Mixture Models
David W. Scott,
Rice University
Keywords: Cluster analysis; normal mixtures; income
distribution; EM algorithm; L2E algorithm
Fitting and interpreting mixture models is a difficult problem with much success and further potential in cluster analysis. In this talk we introduce a least-squares method of fitting normal mixture models using a nonparametric criterion. A case study comparing the new and EM algorithms involving the analysis of household income distributions is described. The interpretability of the components is the focus of this study. Finally, a particularly interesting extension to single component estimation in a mixture model is described.
4:15 p.m.
Fluorescence Spectroscopy, Quantitative Pathology, and
Classification of Tissue
Dennis D. Cox,
Rice University and Nationaal Center for Atmospheric Research
Keywords: classification, variable selection, discrimination,
dimension reduction, feature selection
A challenging problem is considered which involves the classification of cervical tissue based on ouput from a spectroscopic probe. Several classification methods are considered including linear discriminant analysis, neural nets, naive Bayes, classification trees, and ensemble methods. The main difficulty in all cases is the dimensionality of the feature vector (there are 161 variables per measurement) and various new and classical methods for dimension reduction (feature selection) are compared. Further complications include replicated measurements and confounding variables, e.g. the single variable with the largest effect on the spectroscopic response is apparently age, which is unimportant for the classification.
4:45 p.m.
Simplifying Mixture Models with Applications
William Szewczyk, National Security Agency
(with David W. Scott, Rice University)
Mixture models are widely used to model complex distributions; however, one is faced with the challenge of determining an appropriate number of components. This often involves identifying those components that are close enough to be considered the same. In this talk we introduce a new, easily calculated measure of similarity between distributions and illustrate its use in collapsing components. We also show how this measure naturally leads to a new clustering algorithm.