A hybrid model for opinion mining based on domain sentiment dictionary

Autor: Dongping Huang, Kai Yang, Xue Lei, Haoran Xie, Zikai Zhou, Tak-Lam Wong, Yi Cai
Rok vydání: 2017
Předmět:
Zdroj: International Journal of Machine Learning and Cybernetics. 10:2131-2142
ISSN: 1868-808X
1868-8071
DOI: 10.1007/s13042-017-0757-6
Popis: Sentiment classification is an application of sentiment analysis, which is a popular research field in NLP. It can classify documents into different categories according to their sentiments. For a sentiment classification task, the first step is to extract sentimental features from documents, and then classify them using some classifiers. In the first step, a traditional way to extract sentimental features is to apply sentiment dictionaries. However, sentiment words may have different sentiment tendencies in different contexts, and traditional sentiment dictionaries does not consider this situation where wrong sentiment tendencies may be selected for sentiment words. In our research, we find that sentiment words will not have diverse meanings when they associate with the nearby aspects and entities in documents. Then, we propose a three layers sentiment dictionary, which can associate sentiment words with the corresponding entities and aspects together to reduce their multiple meanings. In the second step of the sentiment classification task, many classification models, such as SVM, GBDT, can be used to classify documents according to the extracted sentiment words. However, different classifiers have different weaknesses. A Stacking-based hybrid model is applied to combine SVM and GBDT together to overcome their weaknesses and reach higher performance. This hybrid model contains two layers, and the output of the first layer will become the input of the second layer. The first layer will generate different classification results according to different classifiers, while the second layer will automatically learn how to select a probable one as the final result. The experimental results show that our hybrid model outperforms the baseline single models.
Databáze: OpenAIRE