Interface 1999 Invited Session

Computational Biology

Organizer: Jun Liu, jliu@stat.stanford.edu


Saturday, June 12, 8:15 a.m. - 10:00 a.m.

Speakers

8:15 a.m.
Genomic Level Inferences on Protein Function: A Bayesian Approac
Charles (Chip) E. Lawrence, Wadsworth Center

Complete genome sequences offer new opportunities to predict the functions of proteins and to assess their significance. We illustrate this potential through the identification an unusually large and diverse sets of predicted cyclic nucleotide-binding (cNMP) proteins, as well as nucleotide cyclases in the Mycobacterium tuberculosis (MTB) genome. Of the 257 protein sequences in a cNMP-binding protein superfamily assembled using PROBE, a Bayesian multiple sequence alignment and databases search procedure (Neuwald et.al., Nucleic Acids Research, 1997) (Liu et.al., JASA, 1999), 10 are from the MTB genome. Not surprisingly, these 10 protein sequences include 2 transcriptional regulatory proteins of a type that are common in other bacterial genomes. Seven of the remaining 8 MTB cNMP-binding proteins belong to a novel family of cNMP-binding proteins identified using Bayesian classification (Qu et. al., Proceeding of 6-th Inter Conf on Intelligent Systems in Molecular Biology, 1998). A summary of these two Bayesian MCMC algorithms will be presented. Perhaps more interestingly, this set of 7 MTB proteins includes two classes of proteins not previously reported to bind cNMP in any species: 1) two ABC transport proteins, and 2) two putative multi-drug resistance translocases. Examination of the other complete bacterial genomes shows that, with the exception of the cyanobacterium Synechocystis, which is predicted to have 12 cNMP-binding proteins, none of the other 14 genomes contained more than 4 cNMP-binding proteins. Our further findings of unusually large and diverse classes of nucleotide cyclases in MTB, including four cytoplasmic cyclases, which are common in bacteria, and 6 integral membrane cyclases, which are uncommon in bacteria but common in higher eukaryotes, enhances the evidence that cNMPs play an important and unusual role in MTB. These eukaryotic-like nucleotide cyclases and the unusual number and features of cNMP binding proteins indicate that some characteristics of MTB are more like the human macrophage cells in which MTB spends much of its life cycle than like those commonly seen in bacteria. The finding of two large segments in the most human like cyclase that are closer in sequence to there human counterparts than to there MTB counterparts further supports the hypothesis that MTB has stolen genetic material from the human genome.

8:45 a.m.
Computing With Trees
Susan P. Holmes, Stanford University

This paper presents a natural coordinate system for trees using a correspondence with the set of perfect matchings in the complete graph. This correspondence produces a distance between phylogenetic trees, and a way of enumerating all trees in a minimal step order. It is useful in randomized algorithms as it enables moves on the space of trees that make random optimization strategies `mix' quickly. It also promises a generalization to intermediary trees ({\em average trees}) when data are not decisive as to their choice of tree, or when doing perturbation/bootsrap computations, and a new way of constructing Bayesian priors on tree space.
All three implementations of Bayesian methods for phylogenetic trees rely on Monte Carlo Markov Chains on tree space to compute the posterior probabilities, using the transposition moves on matchings will certainly simplify some of the computational technicalities. Coding of trees by matrices instead of pointers simplifies use of higher level languages such as {\tt matlab} instead of {\tt C}, thus enabling researchers to use methods without considering the programs as black boxes. This new representation also has applications to the confidence statements that can be made for Classification and Regression Trees.
(joint work with Persi Diaconis)

9:15 a.m.
Gene Mapping and Fourier Transforms on Groups
Augustine Kong, University of Chicago

Genetic linkage studies lead to some of the most difficult and interesting computational problems. For example, geneticists invented the peeling algorithm which is closely related to the later local-computations/propagation-and-fusion algorithms. Also, linkage data provide some of the most severe challenges to MCMC methods. In this talk, I will describe a recent advance in linkage computations which uses Fourier transforms of measures on discrete groups (Kruglyak and Lander, 1998, J. Comp. Biol.). This is an elegant example of how mathematics and applications work together. Maybe the audience can locate other areas of applications for this method.


Take me back to the main conference page