Sentiment analysis of short informal texts

From National Research Council Canada

Download	View accepted manuscript: Sentiment analysis of short informal texts (PDF, 930 KiB)
DOI	Resolve DOI: https://doi.org/10.1613/jair.4272
Author	Search for: Kiritchenko, Svetlana¹; Search for: Zhu, Xiaodan¹; Search for: Mohammad, Saif M.¹
Affiliation	National Research Council of Canada. Information and Communication Technologies
Format	Text, Article
Subject	Classification (of information); Semantics; Text processing; Ablation experiments; Automatically generated; Percentage points; Sentiment analysis; Sentiment features; Sentiment lexicons; State-of-the-art
Abstract	We describe a state-of-the-art sentiment analysis system that detects (a) the sentiment of short informal textual messages such as tweets and SMS (message-level task) and (b) the sentiment of a word or a phrase within a message (term-level task). The system is based on a supervised statistical text classification approach leveraging a variety of surface-form, semantic, and sentiment features. The sentiment features are primarily derived from novel high-coverage tweet-specific sentiment lexicons. These lexicons are automatically generated from tweets with sentiment-word hashtags and from tweets with emoticons. To adequately capture the sentiment of words in negated contexts, a separate sentiment lexicon is generated for negated words. The system ranked first in the SemEval-2013 shared task `Sentiment Analysis in Twitter' (Task 2), obtaining an F-score of 69.02 in the message-level task and 88.93 in the term-level task. Post-competition improvements boost the performance to an F-score of 70.45 (message-level task) and 89.50 (term-level task). The system also obtains state-of-the-art performance on two additional datasets: the SemEval-2013 SMS test set and a corpus of movie review excerpts. The ablation experiments demonstrate that the use of the automatically generated lexicons results in performance gains of up to 6.5 absolute percentage points.
Publication date	2014-08-01
In	Journal of Artifcial Intelligence Research 50 (1 August 2014): 723–762.
Language	English
Peer reviewed	Yes
NPARC number	21275945
Export citation	Export as RIS
Report a correction	Report a correction (opens in a new tab)
Record identifier	f3c48029-99e0-48c7-9aaf-271e9715465b
Record created	2015-08-12
Record modified	2020-06-04

Date modified:: 2024-07-27