Semi-Supervised Self-Training for Sentence Subjectivity Classification

From National Research Council Canada

Download	View accepted manuscript: Semi-Supervised Self-Training for Sentence Subjectivity Classification (PDF, 551 KiB)
Author	Search for: Wang, B.; Search for: Spencer, Bruce; Search for: Ling, C.X.; Search for: Zhang, H.
Format	Text, Article
Conference	AI'08, The 21st Canadian Conference on Artificial Intelligence, May 28-30, 2008, Windsor, Ontario
Abstract	Recent natural language processing (NLP) research shows that identifying and extracting subjective information from texts can benefit many NLP applications. In this paper, we address a semi-supervised learning approach, self-training, for sentence subjectivity classification. In self-training, the confidence degree that depends on the ranking of class membership probabilities is commonly used as the selection metric that ranks and selects the unlabeled instances for next training of underlying classifier. Naive Bayes (NB) is often used as the underlying classifier because its class membership probability estimates have good ranking performance. The first contribution of this paper is to study the performance of self-training using decision tree models, such as C4.5, C4.4, and naive Bayes tree (NBTree), as the underlying classifiers. The second contribution is that we propose an adapted Value Difference Metric (VDM) as the selection metric in self-training, which does not depend on class membership probabilities. Based on the Multi-Perspective Question Answering (MPQA) corpus, a set of experiments have been designed to compare the performance of self-training with different underlying classifiers using different selection metrics under various conditions. The experimental results show that the performance of self-training is improved by using VDM instead of the confidence degree, and self-training with NBTree and VDM outperforms self-training with other combinations of underlying classifiers and selection metrics. The results also show that the self-training approach can achieve comparable performance to the supervised learning models.
Publication date	2008
In	AI'08, The 21st Canadian Conference on Artificial Intelligence [Proceedings].
Language	English
NRC number	NRCC 50417
NPARC number	8913184
Export citation	Export as RIS
Report a correction	Report a correction (opens in a new tab)
Record identifier	1256764d-560d-42bb-9ffd-5a36578f7804
Record created	2009-04-22
Record modified	2020-08-12

Date modified:: 2025-05-11