Tweet Length Matters: A Comparative Analysis on Topic Detection in Microblogs
Autor: | Cagri Toraman, Furkan Şahinuç |
---|---|
Rok vydání: | 2021 |
Předmět: |
2019-20 coronavirus outbreak
Coronavirus disease 2019 (COVID-19) Computer science business.industry Microblogging Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) Deep learning 02 engineering and technology computer.software_genre Semantics 03 medical and health sciences 0302 clinical medicine 030221 ophthalmology & optometry 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Social media Artificial intelligence Construct (philosophy) business computer Natural language processing |
Zdroj: | Lecture Notes in Computer Science ISBN: 9783030722395 ECIR (2) |
DOI: | 10.1007/978-3-030-72240-1_50 |
Popis: | Microblogs are characterized as short and informal text; and therefore sparse and noisy. To understand topic semantics of short text, supervised and unsupervised methods are investigated, including traditional bag-of-words and deep learning-based models. However, the effectiveness of such methods are not together investigated in short-text topic detection. In this study, we provide a comparative analysis on topic detection in microblogs. We construct a tweet dataset based on the recent and important events worldwide, including the COVID-19 pandemic and BlackLivesMatter movement. We also analyze the effect of varying tweet length in both evaluation and training. Our results show that tweet length matters in terms of the effectiveness of a topic-detection method. |
Databáze: | OpenAIRE |
Externí odkaz: |