Word Sense Disambiguation by Web Mining for Word Co-Occurrence Probabilities

From National Research Council Canada

Download	View accepted manuscript: Word Sense Disambiguation by Web Mining for Word Co-Occurrence Probabilities (PDF, 201 KiB)
Author	Search for: Turney, Peter¹
Affiliation	National Research Council of Canada. NRC Institute for Information Technology
Format	Text, Article
Conference	The Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text (SENSEVAL-3), July 25-26, 2004, Barcelona, Spain
Abstract	This paper describes the National Research Council (NRC) Word Sense Disambiguation (WSD) system, as applied to the English Lexical Sample (ELS) task in Senseval-3. The NRC system approaches WSD as a classical supervised machine learning problem, using familiar tools such as the Weka machine learning software and Brill's rule-based part-of-speech tagger. Head words are represented as feature vectors with several hundred features. Approximately half of the features are syntactic and the other half are semantic. The main novelty in the system is the method for generating the semantic features, based on word co-occurrence probabilities. The probabilities are estimated using the Waterloo MultiText System with a corpus of about one terabyte of unlabeled text, collected by a web crawler.
Publication date	2004
In	Proceedings of the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text (SENSEVAL-3).
Language	English
NRC number	NRCC 47167
NPARC number	5763802
Export citation	Export as RIS
Report a correction	Report a correction (opens in a new tab)
Record identifier	ad3282e8-edb7-4cab-8367-66ad7e02a7eb
Record created	2009-03-29
Record modified	2021-01-05

Date modified:: 2024-07-27