\documentclass[11pt]{article} \setlength{\oddsidemargin}{0.0truein} \setlength{\evensidemargin}{0.0truein} \setlength{\textwidth}{6.5truein} \setlength{\topmargin}{0.0truein} \setlength{\textheight}{9.0truein} \setlength{\headsep}{0.0truein} \setlength{\headheight}{0.0truein} \setlength{\topskip}{10.0pt} \setlength{\parskip}{5mm} \usepackage{url} \usepackage{amsmath} \usepackage{amssymb} \pagestyle{empty} \begin{document} \begin{center} \textbf{\Large{\textsc{STANFORD UNIVERSITY}}}\\[5pt] \textbf{\Large{\textsc{DEPARTMENT OF STATISTICS}}}\\[5pt] \Large{\textsc{DEPARTMENTAL SEMINAR}} \end{center} % In the following statements, replace "Time of talk", % "Weekday", and "Date of talk". An example is provided. % If you are not sure about this, just skip this part. \begin{center} 4:15 p.m., Tuesday, August 7, 2007\\ Sequoia Hall Room 200\\ (Cookies at 3:45 in 1st Floor Lounge) \end{center} % In the following statements, replace "Name of the speaker" with your % name, "Department Affiliation" with your department affiliation, and %"University Affiliation" with your university affiliation. \begin{center} \textsl{Balaji S. Srinivasan} \\ Department of Statistics\\ Stanford University \end{center} \begin{center} \subsection*{Automatic Population of Biomedical Ontologies} \end{center} \noindent Biomedical classification systems like the Gene Ontology (GO) have proven invaluable for converting disordered collections of free text into machine-readable knowledge representations. However, the scalability of these ontologies is currently limited because they are populated manually from the literature at great expense. Here, we present an algorithm which removes this limitation by automatically extracting ontological relationships between objects from a massive corpus of more than 425000 full text biomedical articles. Given a small training set of biological objects with known relationships such as ``\verb+is_a+'', ``\verb+localized_to+'', or ``\verb+regulates_a+'', our algorithm finds the lexico-syntactic patterns which specify this relationship in plain text. These learned patterns can then be used to find many more examples of objects that satisfy these relationships, thereby automatically populating an ontology. As a case in point, we show the results of applying the algorithm to locate text snippets specifying gene localizations, taxonomic relationships, and anatomical connections. Our methods greatly reduce the amount of manual curator effort, thereby allowing ontological modeling to scale. \end{document}