Determining the Efficiency of Drugs Under Special Conditions From Users’ Reviews on Healthcare Web Forums
Autor: | Sadia Din, Furqan Rustam, Imran Ashraf, Arif Mehmood, Ramish Jamil, Gyu Sang Choi, Eysha Saad |
---|---|
Rok vydání: | 2021 |
Předmět: |
Feature engineering
General Computer Science business.industry Computer science Feature extraction Sentiment analysis General Engineering 020206 networking & telecommunications 02 engineering and technology Lexicon Machine learning computer.software_genre Random forest Support vector machine Multilayer perceptron Classifier (linguistics) 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing General Materials Science Artificial intelligence business computer |
Zdroj: | IEEE Access. 9:85721-85737 |
ISSN: | 2169-3536 |
DOI: | 10.1109/access.2021.3088838 |
Popis: | Sentiment analysis is the extraction and categorization of sentiments that have been expressed in text data using text analysis techniques. Manifested by earlier studies, sentiment analysis of drug reviews has a large potential for providing valuable insights to assist healthcare professionals and companies for evaluating the safety of drugs after it has been marketed. Such insights help safeguard patients and increase their trust in medical companies. The existing systems either follow a lexicon-based approach or a learning-based approach for sentiment analysis in the medical domain. Learning-based techniques require annotated data while lexicon-based techniques tend to be domain-specific which restricts their wide use. This research embarks on a hybrid technique that utilizes both learning-based and lexicon-based approaches to achieve better results. General-purpose sentiment lexicons, such as AFFIN, TextBlob, and VADER, are used for annotating the reviews. Furthermore, several feature engineering techniques, such as term frequency (TF), term frequency-inverse document frequency (TF-IDF), and union of TF and TF-IDF (TF U TF-IDF) have been incorporated for the extraction of useful features. Finally, the learning models including logistic regression (LR), AdaBoost classifier (AB), random forest (RF), extra tree classifier (ETC), and multilayer perceptron (MLP) are used to classify sentiments of the reviews. The performance of the proposed hybrid approach is evaluated using accuracy, precision, recall, and F1-score. Experimental results indicate that the combination of learning-based and lexicon-based approaches provide improved results than their individual use. Moreover, TextBlob has shown promising results giving an accuracy of 96% with MLP when used with TF-IDF and with LR when used with TF U TF-IDF. |
Databáze: | OpenAIRE |
Externí odkaz: |