National Research Council of Canada. Information and Communication Technologies
51st Annual Meeting of the Association for Computational Linguistics, August 4-9 2013, Sofia, Bulgaria
Words often gain new senses in new domains. Being able to automatically identify, from a corpus of monolingual text, which word tokens are being used in a previously unseen sense has applications to machine translation and other tasks sensitive to lexical semantics. We define a task, SenseSpotting, in which we build systems to spot tokens that have new senses in new domain text. Instead of difficult and expensive annotation, we build a goldstandard by leveraging cheaply available parallel corpora, targeting our approach to the problem of domain adaptation for machine translation. Our system is able to achieve F-measures of as much as 80%, when applied to word types it has never seen before. Our approach is based on a large set of novel features that capture varied aspects of how words change when used in new domains.
Association for Computational Linguistics
ACL 2013 - 51st Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference1: 1435–1445.