\documentclass[11pt]{article} \setlength{\oddsidemargin}{0.0truein} \setlength{\evensidemargin}{0.0truein} \setlength{\textwidth}{6.5truein} \setlength{\topmargin}{0.0truein} \setlength{\textheight}{9.0truein} \setlength{\headsep}{0.0truein} \setlength{\headheight}{0.0truein} \setlength{\topskip}{10.0pt} \usepackage{url} \begin{document} \begin{center} \textbf{\textsc{STANFORD UNIVERSITY}}\\[5pt] \textbf{\textsc{DEPARTMENT OF STATISTICS}}\\[5pt] \Large{\textbf\textsc{{DEPARTMENTAL SEMINAR}}} \end{center} \begin{center} 4:15 p.m., Tuesday, May 3rd, 2005\\ Sequoia Hall Room 200\\ (Cookies at 3:45 in 1st Floor Lounge) \end{center} \begin{center} \textsl{Kanti V Mardia}\\ Department of Statistics\\ University of Leeds\\ England \end{center} \begin{center} \textbf{STATISTICAL MATCHING AND ALIGNMENT IN PROTEIN BIOINFORMATICS} \end{center} Among the challenges for statistics posed by proteomics are various alignment and matching problems. Here we consider matching protein gels in 2 dimensions,and aligning active sites of proteins in 3 dimensions. In the latter case, we also want to use information related to the grouping of the amino acids. We introduce hierarchical Bayesian models for matching configurations of points in space, where the points are either unlabelled, or have at most a partial labelling constraining the matching, and in which some points may only appear in one of the configurations. We derive procedures for simultaneous inference about the matching and the transformation, using a Bayesian approach (Green and Mardia, 2004). Our procedure is compared to other methods as a graph theoretic method (Gold, 2003) and EM algorithm (Taylor, Mardia and Kent, 2003,Kent ,Mardia and Taylor,2004). Its connection with the work of Wu ,Schmidler,Hastie and Burtlag (1998) will be discussed. Implementation and performance of these methods on the proteomic tasks is described, and we discuss some open problems and suggest directions for future work. \section*{REFERENCES} Eidhammer, T., Jonassen, T. and Taylor, W.R. (2004). Protein Bioinformatics. Wiley, Chichester. Gold, N.D., Pickering, S.J. and Westhead, D.R. (2003). Predicting protein function from structure using SITEDB: evaluation of a method based on functional site- similarity. Preprint. Green, P.J. and Mardia, K.V. (2004). Bayesian alignment using hierarchical models with applications in protein bioinformatics. \url{http://www.stats.bris.ac.uk/~peter/Research.html#Complex} Kent, J.T. , Mardia, K.V. and Taylor, C.C. (2004).Matching problems for unlabelled configurations .LASR 04 Proceedings of Bioinformatics, Images ,and Wavelets, 33-40. Edited by R.G. Aykroyd,and S.Barber and K.V. Mardia. Leeds University Press. Taylor, C.C., Mardia, K.V. and Kent, J.T. (2003). Matching unlabelled configurations using the EM algorithm.LASR 03 Proceedings of Stochastic Geometry, Biological Structure and Images, 19-21. Edited by R.G. Aykroyd, K.V. Mardia and M.J. Langdon. Leeds University Press. Wu,T.D.Wu, T.D., Schmidler, S.C., Hastie, T. and Brutlag, G. (1998). Regression analysis of multiple protein structures. Journal of Computational Biology, , pp 585--595. \end{document}