Tweet Classification Framework for Detecting Events Related to Health Problems

Autor: Marcin Majak, Andrzej Zolnierek, Katarzyna Wegrzyn, Lamine Bougueroua
Přispěvatelé: Wegrzyn-Wolska, Katarzyna
Jazyk: angličtina
Rok vydání: 2017
Předmět:
Zdroj: Advances in Intelligent Systems and Computing ISBN: 9783319591612
CORES
Popis: In this paper we present and validate the MC (Multiclassifier) system for Tweet classification related to flu and its symptoms. Proposed method consists of a preprocessing phase applying NLTK processor with converter from text corpora into feature space and as a last step ensemble of heterogenous classifiers fused at support level for Tweet classification. We have checked two methods for translating text into feature space. The first one uses standard Term Frequency times Inverse Document frequency, while the second one is enriched with hashtag analysis and word reduction after n-grams generation. Our preliminary results prove that Twitter can be an excellent platform for sensing real events. The most important task in proper event detection is a feature extraction technique taking into account not only text corpora, but also sentiment analysis and message intention.
Databáze: OpenAIRE