Tweet Length Matters: A Comparative Analysis on Topic Detection in Microblogs

Autor: Cagri Toraman, Furkan Şahinuç
Rok vydání: 2021
Předmět:
Zdroj: Lecture Notes in Computer Science ISBN: 9783030722395
ECIR (2)
DOI: 10.1007/978-3-030-72240-1_50
Popis: Microblogs are characterized as short and informal text; and therefore sparse and noisy. To understand topic semantics of short text, supervised and unsupervised methods are investigated, including traditional bag-of-words and deep learning-based models. However, the effectiveness of such methods are not together investigated in short-text topic detection. In this study, we provide a comparative analysis on topic detection in microblogs. We construct a tweet dataset based on the recent and important events worldwide, including the COVID-19 pandemic and BlackLivesMatter movement. We also analyze the effect of varying tweet length in both evaluation and training. Our results show that tweet length matters in terms of the effectiveness of a topic-detection method.
Databáze: OpenAIRE