Wednesday, November 09, 2005

CMU LTI talk: Natural Language Processing in Bioinformatics: Uncovering Semantic Relations

Speaker: Barbara Rosario, University of California, Berkeley

TITLE: Natural Language Processing in Bioinformatics: Uncovering Semantic Relations

ABSTRACT: Current-generation search engines provide a glimpse of the kinds of activities that can be catalyzed by intelligent processing of large-scale document corpora. Further progress in this area will require the tools of statistical natural language processing, including tools for automatic extraction of propositional information from text. This presentation will explore several lines of research on one of the core problems that arise in this domain---the identification of semantic relations between constituents in sentences. First, I will discuss the problem of identifying relationships between two-word noun compounds (to characterize, for example, the treatment-for-disease relationship between the words of "migraine treatment" versus the method-of-treatment relationship between the words of "aerosol treatment".) Second, I'll describe my work in the area of Information Extraction, in particular the problem of identifying semantic entities such as "treatment" and "disease" from biomedical text. Finally, I will present my recent work on the problem of predicting protein-protein interactions from biological text. A major impediment to such work is the acquisition of appropriately labeled training data; for my experiments I have identified a database that serves as a proxy for training data. In each of these cases I will describe the statistical machine learning methods---both generative and discriminative---used to tackle these tasks.

No comments: