Research Interests

My research interests are in Computational Linguistics (a.k.a. Natural Language Processing (NLP) or Language Technology) specifically statistical approaches to NLP ranging from theoretical and low-level component development through to high-level systems development in Question Answering and Information Extraction, for example:

Combinatory Categorial Grammar Parsing

with: Stephen Clark and Mark Steedman

students: Bojan Djordjevic, Matthew Honnibal and David Vadas

We have developed a wide-coverage parser for Combinatory Categorial Grammar (CCG). Estimating the maximum entropy parsing models is a very computationally intensive task requiring an efficient distributed implementation on a large Beowulf cluster. The parser performs with state-of-the-art accuracy but parses much faster than other linguistically motivated parsers. We are working on improving parsing accuracy and techniques for porting it to new domains.

Question Answering systems

with: Johan Bos, Malvina Nissim and Stephen Clark

We have competed in the last four TREC Question Answering (QA) tracks with steadily improving results using a wide-coverage semantic analysis approach based on our CCG parser. I am interested in improving the QA system components, e.g. the question classifier, and building domain specific QA systems for scientific literature.

Scientific Text Mining

with: Tara Murphy

students: Tara McIntosh

We are developing systems for exploiting the large and ever increasing volumes of scientific literature. In particular, we are focusing on the the development of tools and systems for extracting information and answering questions in two domains: Astronomy and Genomics.

Lexical Semantics

students: James Gorman

We are interested in vector-space (or distributional) models of semantic similarity, which is based on the assumption that similar words appear in similar contexts. I have calculated semantic similarity over extremely large corpora and am interested in methods to do this more efficiently.

Maximum Entropy Language Modelling

with: Stephen Clark

We have applied Maximum Entropy models to many problems including POS tagging, chunking and named entity recognition (the C&C tools). I am working on extending this to other problems, from tokenisation through to question classification, and extending the paradigm to include richer feature representations.