Clustering and topic modeling over tweets: A comparison over a health dataset
Autor: | Tina Hernandez-Boussard, Juan Antonio Lossio-Ventura, Juandiego Morzan, Hugo Alatrista-Salas, Jian-Guo Bian |
---|---|
Rok vydání: | 2019 |
Předmět: |
Topic model
Measure (data warehouse) Information retrieval 020205 medical informatics Computer science business.industry 02 engineering and technology Document clustering Article Domain (software engineering) 03 medical and health sciences 0302 clinical medicine Health care 0202 electrical engineering electronic engineering information engineering 030212 general & internal medicine InformationSystems_MISCELLANEOUS business Cluster analysis |
Zdroj: | BIBM Proceedings (IEEE Int Conf Bioinformatics Biomed) |
DOI: | 10.1109/bibm47256.2019.8983167 |
Popis: | Twitter became the most popular form of social interactions in the healthcare domain. Thus, various teams have evaluated Twitter as an additional source where patients share information about their healthcare with the potential goal to improve their outcomes. Several existing topic modeling and document clustering applications have been adapted to assess tweets showing that the performances of the applications are negatively affected due to the nature and characteristics of tweets. Moreover, Twitter health research has become difficult to measure because of the absence of comparisons between the existing applications. In this paper, we perform an evaluation based on internal indexes of different topic modeling and document clustering applications over two Twitter health-related datasets. Our results show that Online Twitter LDA and Gibbs LDA get a better performance for extracting topics and grouping tweets. We want to provide health practitioners this comparison to select the most suitable application for their tasks. |
Databáze: | OpenAIRE |
Externí odkaz: |