A gold-standard social media corpus for urban issues
Autor: | Maxwell Guimarães de Oliveira, Cláudio de Souza Baptista, Cláudio E. C. Campelo, Michela Bertolotto |
---|---|
Rok vydání: | 2017 |
Předmět: |
Information retrieval
business.industry Computer science Quality assessment 0211 other engineering and technologies 02 engineering and technology Gold standard (test) Domain (software engineering) Task (project management) World Wide Web ComputingMethodologies_PATTERNRECOGNITION Geocoding 0202 electrical engineering electronic engineering information engineering Web application 020201 artificial intelligence & image processing Social media business Geoparsing 021101 geological & geomatics engineering |
Zdroj: | SAC |
DOI: | 10.1145/3019612.3019808 |
Popis: | This paper introduces a gold-standard corpus extracted from manually labeled tweets concerning urban issues. The main contribution is to provide a labeled tweet dataset which can be useful for building machine-learning classifiers in the urban issues domain, including geographical features. Thus, this corpus can also be useful for improving geoparsers to correctly identify place names in urban such as Points-of-Interest (POI), Streets/Roads and Districts. Our method for building the corpus includes human-volunteer quality assessment and human-driven labeling using an ad hoc web application, the Tweet Annotator. The volunteers were asked to complete a feedback survey in order to identify the main difficulties during the labeling task. In this paper, we also report the findings from a case study carried out to analyze the spatial relationships in the generated corpus for the locations which a tweet may refer to: the geocoded, the user home and the mentioned ones. |
Databáze: | OpenAIRE |
Externí odkaz: |