Download | - View accepted manuscript: Sentiment analysis of short informal texts (PDF, 930 KiB)
|
---|
DOI | Resolve DOI: https://doi.org/10.1613/jair.4272 |
---|
Author | Search for: Kiritchenko, Svetlana1; Search for: Zhu, Xiaodan1; Search for: Mohammad, Saif M.1 |
---|
Affiliation | - National Research Council of Canada. Information and Communication Technologies
|
---|
Format | Text, Article |
---|
Subject | Classification (of information); Semantics; Text processing; Ablation experiments; Automatically generated; Percentage points; Sentiment analysis; Sentiment features; Sentiment lexicons; State-of-the-art |
---|
Abstract | We describe a state-of-the-art sentiment analysis system that detects (a) the sentiment of short informal textual messages such as tweets and SMS (message-level task) and (b) the sentiment of a word or a phrase within a message (term-level task). The system is based on a supervised statistical text classification approach leveraging a variety of surface-form, semantic, and sentiment features. The sentiment features are primarily derived from novel high-coverage tweet-specific sentiment lexicons. These lexicons are automatically generated from tweets with sentiment-word hashtags and from tweets with emoticons. To adequately capture the sentiment of words in negated contexts, a separate sentiment lexicon is generated for negated words. The system ranked first in the SemEval-2013 shared task `Sentiment Analysis in Twitter' (Task 2), obtaining an F-score of 69.02 in the message-level task and 88.93 in the term-level task. Post-competition improvements boost the performance to an F-score of 70.45 (message-level task) and 89.50 (term-level task). The system also obtains state-of-the-art performance on two additional datasets: the SemEval-2013 SMS test set and a corpus of movie review excerpts. The ablation experiments demonstrate that the use of the automatically generated lexicons results in performance gains of up to 6.5 absolute percentage points. |
---|
Publication date | 2014-08-01 |
---|
In | |
---|
Language | English |
---|
Peer reviewed | Yes |
---|
NPARC number | 21275945 |
---|
Export citation | Export as RIS |
---|
Report a correction | Report a correction (opens in a new tab) |
---|
Record identifier | f3c48029-99e0-48c7-9aaf-271e9715465b |
---|
Record created | 2015-08-12 |
---|
Record modified | 2020-06-04 |
---|