Generating Pseudo-ground Truth for Predicting New Concepts in Social Streams
Autor: | Graus, D., Tsagkias, M., Buitinck, L., de Rijke, M., Kenter, T., de Vries, A.P., Zhai, C.X., de Jong, F., Radinsky, K., Hofmann, K. |
---|---|
Přispěvatelé: | Information and Language Processing Syst (IVI, FNWI) |
Jazyk: | angličtina |
Rok vydání: | 2014 |
Předmět: | |
Zdroj: | Advances in Information Retrieval: 36th European Conference on IR Research, ECIR 2014, Amsterdam, The Netherlands, April 13-16, 2014: proceedings, 286-298 STARTPAGE=286;ENDPAGE=298;TITLE=Advances in Information Retrieval Lecture Notes in Computer Science ISBN: 9783319060279 ECIR |
Popis: | The manual curation of knowledge bases is a bottleneck in fast paced domains where new concepts constantly emerge. Identification of nascent concepts is important for improving early entity linking, content interpretation, and recommendation of new content in real-time applications. We present an unsupervised method for generating pseudo-ground truth for training a named entity recognizer to specifically identify entities that will become concepts in a knowledge base in the setting of social streams. We show that our method is able to deal with missing labels, justifying the use of pseudo-ground truth generation in this task. Finally, we show how our method significantly outperforms a lexical-matching baseline, by leveraging strategies for sampling pseudo-ground truth based on entity confidence scores and textual quality of input documents. |
Databáze: | OpenAIRE |
Externí odkaz: |