Analysis of Named Entity Recognition and Linking for Tweets
Autor: | Derczynski, Leon, Maynard, Diana, Rizzo, Giuseppe, van Erp, Marieke, Gorrell, Genevieve, Troncy, Raphaël, Petrak, Johann, Bontcheva, Kalina |
---|---|
Rok vydání: | 2014 |
Předmět: | |
Zdroj: | Information Processing & Management 51 (2), 32-49, 2014 |
Druh dokumentu: | Working Paper |
DOI: | 10.1016/j.ipm.2014.10.006 |
Popis: | Applying natural language processing for mining and intelligent information access to tweets (a form of microblog) is a challenging, emerging research area. Unlike carefully authored news text and other longer content, tweets pose a number of new challenges, due to their short, noisy, context-dependent, and dynamic nature. Information extraction from tweets is typically performed in a pipeline, comprising consecutive stages of language identification, tokenisation, part-of-speech tagging, named entity recognition and entity disambiguation (e.g. with respect to DBpedia). In this work, we describe a new Twitter entity disambiguation dataset, and conduct an empirical analysis of named entity recognition and disambiguation, investigating how robust a number of state-of-the-art systems are on such noisy texts, what the main sources of error are, and which problems should be further investigated to improve the state of the art. Comment: 35 pages, accepted to journal Information Processing and Management |
Databáze: | arXiv |
Externí odkaz: |