A data processing method based on sequence labeling and syntactic analysis for extracting new sentiment words from product reviews
Autor: | Xiang Chen, Kuang Ching Li, Guang li Zhu, Shunxiang Zhang, Han qing Xu |
---|---|
Rok vydání: | 2021 |
Předmět: |
Parsing
Computer science business.industry Sentiment analysis Context (language use) Mutual information computer.software_genre Sequence labeling Theoretical Computer Science Set (abstract data type) Edit distance Geometry and Topology Artificial intelligence business computer Software Word (computer architecture) Natural language processing |
Zdroj: | Soft Computing. 26:853-866 |
ISSN: | 1433-7479 1432-7643 |
DOI: | 10.1007/s00500-021-06228-9 |
Popis: | New sentiment words in product reviews are valuable resources that are directly close to users. The data processing of new sentiment word extraction can provide information service better for users, and provide theoretical support for the related research of edge computing. Traditional methods for extracting new sentiment words generally ignored the context and syntactic information, which leads to the low accuracy and recall rate in the process of extracting new sentiment words. To tackle the mentioned issue, we proposed a data processing method based on sequence labeling and syntactic analysis for extracting new sentiment words from product reviews. Firstly, the probability that the new word is a sentiment word is calculated through the location rules derived from the sequence labeling result, and the candidate set of new sentiment words is obtained according to the probability. Then, the candidate set of new sentiment words is supplemented with the method of matching appositive words based on edit distance. Finally, the final set of new sentiment words is collected through fine-grained filtering, including the calculation of Point Mutual Information (PMI) and difference coefficient of positive and negative corpus (DC-PNC). The experimental results illustrate the effectiveness of new sentiment words extracted by the proposed method which can obviously improve the accuracy and recall rate of sentiment analysis. |
Databáze: | OpenAIRE |
Externí odkaz: |