Measuring praise and criticism : inference of semantic orientation from association

From National Research Council Canada

Download	View accepted manuscript: Measuring praise and criticism : inference of semantic orientation from association (PDF, 696 KiB)
DOI	Resolve DOI: https://doi.org/10.1145/944012.944013
Author	Search for: Turney, Peter D.¹; Search for: Littman, Michael L.
Affiliation	National Research Council of Canada. NRC Institute for Information Technology
Format	Text, Article
Subject	algorithms; experimentation; semantic orientation; semantic association; web mining; text mining; text classification; unsupervised learning; mutual information; latent semantic analysis; expérimentation; orientation sémantique; association sémantique; exploration du Web; exploration de texte; classification de textes; apprentissage non supervisé; information mutuelle; analyse sémantique latente
Abstract	The evaluative character of a word is called its semantic orientation. Positive semantic orientation indicates praise (e.g., 'honest', 'intrepid') and negative semantic orientation indicates criticism (e.g., 'disturbing', 'superfluous'). Semantic orientation varies in both direction (positive or negative) and degree (mild to strong). An automated system for measuring semantic orientation would have application in text classification, text filtering, tracking opinions in online discussions, analysis of survey responses, and automated chat systems (chatbots). This paper introduces a method for inferring the semantic orientation of a word from its statistical association with a set of positive and negative paradigm words. Two instances of this approach are evaluated, based on two different statistical measures of word association: pointwise mutual information (PMI) and latent semantic analysis (LSA). The method is experimentally tested with 3,596 words (including adjectives, adverbs, nouns, and verbs) that have been manually labeled positive (1,614 words) and negative (1,982 words). The method attains an accuracy of 82.8% on the full test set, but the accuracy rises above 95% when the algorithm is allowed to abstain from classifying mild words.
Publication date	2003-10-04
In	ACM Transactions on Information Systems (TOIS) 21, no. 4 (4 October 2003): 315–346.
Language	English
NRC number	NRCC 46516
NPARC number	5210015
Export citation	Export as RIS
Report a correction	Report a correction (opens in a new tab)
Record identifier	1b268840-85b3-4e70-af95-dc399b6cd2b4
Record created	2008-12-02
Record modified	2020-04-02

Date modified:: 2024-12-21