Tara McIntosh
Tara McIntosh
PhD Research

Information Extraction from Unlabeled Biomedical Literature

The focus of this project is to apply Natural Language Processing techniques to develop an Information Extraction system for the biomedical domain from raw text.

The volume of biomedical literature is rapidly expanding, and in turn it is becoming difficult for biologists to keep abreast of their fields.

The majority of biomedical IR techniques search for relevant documents using keyword-based queries. NLP techniques have been used to develop more sophisticated tools, such as Question Answering Systems. These tasks typically involve identifying named entity (NE) classes which are often not found in annotated corpora and thus supervised NE models are not always available. This issue becomes even more apparent in the biomedical domain where new semantic categories are introduced rapidly, and are often poorly represented in resources, if at all.

Automatically acquiring biomedical semantic lexicons from raw text is essential for overcomming this information bottleneck and is the focus of my thesis.

Publications

Tara McIntosh and James R. Curran. Reducing Semantic Drift with Bagging and Distributional Similarity. In Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Singapore, 2009 (to appear).

Tara McIntosh and James R. Curran. Weighted Mutual Exclusion Bootstrapping for Domain Independent Lexicon and Template Acquisition. In Proceedings of the Australasian Language Technology Workshop, Hobart, Australia, 2008 Best Presentation Award. PDF

Tara McIntosh and James R. Curran. Sentence retrieval for extracting biomedical knowledge. In Proceedings of the Conference of the Pacific Association for Computational Linguistics (PACLING), pages 342-349, Melbourne, Australia, 2007. PDF

Tara McIntosh and James R. Curran. Challenges for extracting biomedical knowledge from full text. In Proceedings of the Workshop on BioNLP (BioNLP), pages 171-178 , Prague, Czech Republic, 2007. PDF

Tara Murphy, Tara McIntosh, and James R. Curran. Named entity recognition for astronomy literature. In Australasian Language Technology Workshop, pages 59-66, Sydney, Australia, 2006.