Tweet Classification Framework for Detecting Events Related to Health Problems
Autor: | Marcin Majak, Andrzej Zolnierek, Katarzyna Wegrzyn, Lamine Bougueroua |
---|---|
Přispěvatelé: | Wegrzyn-Wolska, Katarzyna |
Jazyk: | angličtina |
Rok vydání: | 2017 |
Předmět: |
0301 basic medicine
Text corpus Event (computing) business.industry Computer science Speech recognition Feature vector Sentiment analysis Feature extraction 02 engineering and technology [INFO] Computer Science [cs] computer.software_genre 03 medical and health sciences 030104 developmental biology 0202 electrical engineering electronic engineering information engineering Preprocessor 020201 artificial intelligence & image processing Artificial intelligence tf–idf business computer Natural language processing Word (computer architecture) ComputingMilieux_MISCELLANEOUS |
Zdroj: | Advances in Intelligent Systems and Computing ISBN: 9783319591612 CORES |
Popis: | In this paper we present and validate the MC (Multiclassifier) system for Tweet classification related to flu and its symptoms. Proposed method consists of a preprocessing phase applying NLTK processor with converter from text corpora into feature space and as a last step ensemble of heterogenous classifiers fused at support level for Tweet classification. We have checked two methods for translating text into feature space. The first one uses standard Term Frequency times Inverse Document frequency, while the second one is enriched with hashtag analysis and word reduction after n-grams generation. Our preliminary results prove that Twitter can be an excellent platform for sensing real events. The most important task in proper event detection is a feature extraction technique taking into account not only text corpora, but also sentiment analysis and message intention. |
Databáze: | OpenAIRE |
Externí odkaz: |