Popis: |
In recent years, mining micro-blog becomes a hot research field, especially it may create commercial and political values in a fast changing big data era. This paper investigates the sentiment analysis of Chinese micro-blogs (SACM) using a vector space model. With the analysis of the nature properties of the Chinese micro-blogs, a sentiment analysis system has been proposed by formulating it as a two-type classification problem whether positive sentiment or negative sentiment. To achieve robust results, a preprocessing approach has been developed to remove the emotional unrelated words, transform the traditional expression to simplified one, and unify the punctuation by analyzing the dynamic and complicated micro-blog expressions. Besides, with aids of word segmentation and frequency statistical techniques the vector space model has been formed to generate the sentiment-related micro-blog feature vector. The support vector machine (SVM) has been taken as the classifier for its excellent ability in solving two-class classification problem. Experiments have been carried out to evaluate the proposed sentiment analysis system. Three different databases have been used in word segmentation stage including the emotion dictionary from Dalian University of Technology, CNKI-Hownet emotional dictionary and our self-established dictionary. Experimental results show that the proposed SACM system is able to achieve 80.86% classification accuracy using above databases. |