| Organizer: | Chong Gu, chong@stat.purdue.edu |
| Chair: | Barbara Bailey, babailey@stat.uiuc.edu |
Speakers
10:30 a.m.
Genome Data: Finding Trait Genes Using a Dense Marker Map
Katy Simonsen,
Department of Statistics, Purdue University
(with Lauren McIntyre, Duke University)
The search for genes involved in human disease is a problem with complex biological, computational, and statistical components. Many tests have been devised to detect association between individual genetic markers and disease; these tests are typically performed at markers spaced throughout the genome. Conventional methods used to correct for this multiple testing assume that the markers are independent. However, advances in marker technology are producing more dense marker maps. These closely spaced markers are not independent, since their recent common evolutionary history induces correlation. In this case, the standard Bonferroni correction is inappropriately conservative and results in low power. Instead, Monte-Carlo methods can be used to control Type I error, and result in more powerful testing procedures. A coalescent process is developed to simulate markers and disease genes while building historical correlation into the data according to evolutionary models. These simulated data are analyzed using the Monte-Carlo procedure to assess significance, and contrasted with the results of a Bonferroni correction. As expected,the Monte-Carlo procedure found to be more powerful when markers are correlated, and the change in power can be related to various evolutionary parameters.
11:00 a.m.
Models of Protein Evolution Incorporating Correlation
Kay S. Tatsuoka,
NISS
Francoise Seillier-Moiseiwitsch, UNC Biostatistics
Alex Stark, National Institute of Statistical Sciences
Keywords: phylogeny, MCMC
An evolutionary tree gives a clustering of proteins based on their amino acid sequences. Most methods for reconstructing the evolutionary history (phylogeny) of a set of proteins rely on one or both of the following:
(i) mutations of amino acids at distinct positions occur independently, an assumption that is often violated,
(ii) the substitution matrix modelling the mutations is based on a database of proteins which may be too general for the particular protein(s) under study.
We introduce a computationally tractable model that allows for correlated mutations in two or more positions. The substitution matrix for our model is estimated from the primary sequences. We use MCMC to explore tree space and compute the posterior probabilities of trees. We propose a test statistic for testing independence of positions that takes into account the phylogeny. We illustrate our methods on a large set of amino acid sequences from HIV protease. This is joint work with Francoise Seillier-Moiseiwitsch and Alex Stark.
11:30 a.m.
Group Testing with Blockers and Synergism
Minge Xie,
Department of Statistics, Rutgers University
This paper develops models and estimation procedures to obtain quantitative information from data in group testing practice. It studies several group testing procedures under violations of the standard assumption adopted by Dorfman (1943) and many others that tested items act independently of one another. The violations are: (1) when there are blockers, objects that when placed in a pool with a positive cause the pool to test negative; (2) when there are combination effects, synergism or additive effect, that cause a pool containing no single active compounds to test active. Our investigation is focused on but not limited to the square array pooling method (Phatarfod and Sudbury (1994)). This research is motivated by the need to study group testing (compound) samples in large pharmaceutical companies, but the results have potential applications in other areas such as blood screening and HIV testing. The methodologies are illustrated through simulations and a drug discovery data set from Glaxo Wellcome Inc.