A gold-standard social media corpus for urban issues

Autor: Maxwell Guimarães de Oliveira, Cláudio de Souza Baptista, Cláudio E. C. Campelo, Michela Bertolotto
Rok vydání: 2017
Předmět:
Zdroj: SAC
DOI: 10.1145/3019612.3019808
Popis: This paper introduces a gold-standard corpus extracted from manually labeled tweets concerning urban issues. The main contribution is to provide a labeled tweet dataset which can be useful for building machine-learning classifiers in the urban issues domain, including geographical features. Thus, this corpus can also be useful for improving geoparsers to correctly identify place names in urban such as Points-of-Interest (POI), Streets/Roads and Districts. Our method for building the corpus includes human-volunteer quality assessment and human-driven labeling using an ad hoc web application, the Tweet Annotator. The volunteers were asked to complete a feedback survey in order to identify the main difficulties during the labeling task. In this paper, we also report the findings from a case study carried out to analyze the spatial relationships in the generated corpus for the locations which a tweet may refer to: the geocoded, the user home and the mentioned ones.
Databáze: OpenAIRE