Generating Pseudo-ground Truth for Predicting New Concepts in Social Streams

Autor: Graus, D., Tsagkias, M., Buitinck, L., de Rijke, M., Kenter, T., de Vries, A.P., Zhai, C.X., de Jong, F., Radinsky, K., Hofmann, K.
Přispěvatelé: Information and Language Processing Syst (IVI, FNWI)
Jazyk: angličtina
Rok vydání: 2014
Předmět:
Zdroj: Advances in Information Retrieval: 36th European Conference on IR Research, ECIR 2014, Amsterdam, The Netherlands, April 13-16, 2014: proceedings, 286-298
STARTPAGE=286;ENDPAGE=298;TITLE=Advances in Information Retrieval
Lecture Notes in Computer Science ISBN: 9783319060279
ECIR
Popis: The manual curation of knowledge bases is a bottleneck in fast paced domains where new concepts constantly emerge. Identification of nascent concepts is important for improving early entity linking, content interpretation, and recommendation of new content in real-time applications. We present an unsupervised method for generating pseudo-ground truth for training a named entity recognizer to specifically identify entities that will become concepts in a knowledge base in the setting of social streams. We show that our method is able to deal with missing labels, justifying the use of pseudo-ground truth generation in this task. Finally, we show how our method significantly outperforms a lexical-matching baseline, by leveraging strategies for sampling pseudo-ground truth based on entity confidence scores and textual quality of input documents.
Databáze: OpenAIRE