Coherent Keyphrase Extraction via Web Mining

From National Research Council Canada

Download	View accepted manuscript: Coherent Keyphrase Extraction via Web Mining (PDF, 216 KiB)
Author	Search for: Turney, Peter¹
Affiliation	National Research Council of Canada. NRC Institute for Information Technology
Format	Text, Article
Conference	The Eighteenth International Joint Conference on Artificial Intelligence (IJCAI-03), August 9-15, 2003, Acapulco, Mexico
Abstract	Keyphrases are useful for a variety of purposes, including summarizing, indexing, labeling, categorizing, clustering, highlighting, browsing, and searching. The task of automatic keyphrase extraction is to select keyphrases from within the text of a given document. Automatic keyphrase extraction makes it feasible to generate keyphrases for the huge number of documents that do not have manually assigned keyphrases. A limitation of previous keyphrase extraction algorithms is that the selected keyphrases are occasionally incoherent. That is, the majority of the output keyphrases may fit together well, but there may be a minority that appear to be outliers, with no clear semantic relation to the majority or to each other. This paper presents enhancements to the Kea keyphrase extraction algorithm that are designed to increase the coherence of the extracted keyphrases. The approach is to use the degree of statistical association among candidate keyphrases as evidence that they may be semantically related. The statistical association is measured using web mining. Experiments demonstrate that the enhancements improve the quality of the extracted keyphrases. Furthermore, the enhancements are not domain-specific: the algorithm generalizes well when it is trained on one domain (computer science documents) and tested on another (physics documents).
Publication date	2003
In	Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (IJCAI-03).
Language	English
NRC number	NRCC 46496
NPARC number	5763172
Export citation	Export as RIS
Report a correction	Report a correction (opens in a new tab)
Record identifier	ece26e32-7802-4f15-9b74-94862b0647f0
Record created	2009-03-29
Record modified	2021-01-05

Date modified:: 2024-12-21