A sampling based sentiment mining approach for e-commerce applications
Autor: | R. M. Chandrasekaran, G. Vinodhini |
---|---|
Rok vydání: | 2017 |
Předmět: |
Receiver operating characteristic
business.industry Computer science Emerging technologies Sentiment analysis Sampling (statistics) 02 engineering and technology E-commerce Library and Information Sciences Management Science and Operations Research Machine learning computer.software_genre Computer Science Applications Support vector machine 020204 information systems 0202 electrical engineering electronic engineering information engineering Media Technology Feature (machine learning) Oversampling 020201 artificial intelligence & image processing Artificial intelligence Data mining business computer Information Systems |
Zdroj: | Information Processing & Management. 53:223-236 |
ISSN: | 0306-4573 |
DOI: | 10.1016/j.ipm.2016.08.003 |
Popis: | We propose 3 vector models by varying the feature size.We propose an integrative approach for imbalanced datasets.We analyze the effect of imbalance ratio in sentiment learning.Proposed method performs more accurately than baseline models. Emerging technologies in online commerce, mobile and customer experience have transformed the retail industry so as to enable the marketers to boost sales and the customers with the most efficient online shopping. Online reviews significantly influence the purchase decisions of buyers and marketing strategies employed by vendors in e-commerce. However, the vast amount of reviews makes it difficult for the customers to mine sentiments from online reviews. To address this problem, sentiment mining system is needed to organize the online reviews automatically into different sentiment orientation categories (e.g. positive/negative). Due to the imbalanced nature of positive and negative sentiments, the real time sentiment mining is a challenging machine learning task. The main objective of this research work is to investigate the combined effect of machine learning classifiers and sampling methods in sentiment classification under imbalanced data distributions. A modification is proposed in support vector machine based ensemble algorithm which incorporates both oversampling and undersampling to improve the prediction performance. Extensive experimental comparisons are carried out to show the effectiveness of the proposed method with several other classifiers used in terms of receiver operating characteristic curve (ROC), the area under the ROC curve and geometric mean. |
Databáze: | OpenAIRE |
Externí odkaz: |