Automatic Classification of Tweets for Analyzing Communication Behavior of Museums
Autor: | Nicolas Foucault, Antoine Courtin |
---|---|
Přispěvatelé: | Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur (LIMSI), Université Paris Saclay (COmUE)-Centre National de la Recherche Scientifique (CNRS)-Sorbonne Université - UFR d'Ingénierie (UFR 919), Sorbonne Université (SU)-Sorbonne Université (SU)-Université Paris-Saclay-Université Paris-Sud - Paris 11 (UP11), Institut National d'Histoire de l'Art (INHA), INHA |
Jazyk: | angličtina |
Rok vydání: | 2016 |
Předmět: |
[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing
annotation classification [SHS.INFO]Humanities and Social Sciences/Library and information sciences corpus cultural communication MuseumWeek [SHS.LANGUE]Humanities and Social Sciences/Linguistics museums NLP tweets community managers |
Zdroj: | Tenth International Conference on Language Resources and Evaluation (LREC 2016) Tenth International Conference on Language Resources and Evaluation (LREC 2016), May 2016, Portorož, Slovenia Scopus-Elsevier HAL |
Popis: | International audience; In this paper, we present a study on tweet classification which aims to define the communication behavior of the 103 French museums that participated in 2014 in the Twitter operation: MuseumWeek. The tweets were automatically classified in four communication categories: sharing experience, promoting participation, interacting with the community, and promoting-informing about the institution. Our classification is multi-class. It combines Support Vector Machines and Naive Bayes methods and is supported by a selection of eighteen subtypes of features of four different kinds: metadata information, punctuation marks, tweet-specific and lexical features. It was tested against a corpus of 1,095 tweets manually annotated by two experts in Natural Language Processing and Information Communication and twelve Community Managers of French museums. We obtained an state-of-the-art result of F1-score of 72% by 10-fold cross-validation. This result is very encouraging since is even better than some state-of-the-art results found in the tweet classification literature. |
Databáze: | OpenAIRE |
Externí odkaz: |