Specialists, Scientists, and Sentiments: Word2Vec and Doc2Vec in Analysis of Scientific and Medical Texts
Autor: | Marina Sokolova, Qufei Chen |
---|---|
Rok vydání: | 2020 |
Předmět: |
General Computer Science
Computer Networks and Communications Computer science computer.software_genre Artificial Intelligence Word2vec Word2Vec Reuters science data Semantic lexicon Original Research Doc2Vec business.industry Sentiment analysis Significant difference Computer Graphics and Computer-Aided Design Computer Science Applications Test (assessment) Data set Computational Theory and Mathematics Benchmark (computing) Artificial intelligence Clinical discharge summaries business computer Natural language processing Word (computer architecture) Unsupervised sentiment analysis |
Zdroj: | Sn Computer Science |
ISSN: | 2661-8907 |
Popis: | Analyze performance of unsupervised embedding algorithms in sentiment analysis of knowledge-rich data sets. We apply state-of-the-art embedding algorithms Word2Vec and Doc2Vec as the learning techniques. The algorithms build word and document embeddings in an unsupervised manner. To assess the algorithms’ performance, we define sentiment metrics and use a semantic lexicon SentiWordNet (SWN) to establish the benchmark measures. Our empirical results are obtained on the Obesity data set from i2b2 clinical discharge summaries and the Reuters Science dataset. We use the Welch’s test to analyze the obtained sentiment evaluation. On the Obesity data, the Welch’s test found significant difference between the SWN evaluation of the most positive and most negative texts. On the same data, the Word2Vec results support the SWN results, whereas the Doc2Vec results partially correspond to the Word2Vec and the SWN results. On the Reuters data, the Welch’s test did not find significant difference between the SWN evaluation of the most positive and most negative texts. On the same data, Word2Vec and Doc2Vec results only in part correspond to the SWN results. In unsupervised sentiment analysis of medical and scientific texts, the Word2Vec sentiment analysis has been more consistent with the SentiWordNet sentiment assessment than the Doc2Vec sentiment analysis. The Welch’s test of the SentiWordNet results has been a strong indicator of future correspondence between Word2Vec and SentiWordNet results. |
Databáze: | OpenAIRE |
Externí odkaz: |