Using Crowdsourcing for Fine-Grained Entity Type Completion in Knowledge Bases

Autor: Xiaoyong Du, Zhaoan Dong, Tok Wang Ling, Ju Fan, Jiaheng Lu
Přispěvatelé: Cai, Yi, Ishikawa, Yoshiharu, Xu, Jianliang, Department of Computer Science, Unified DataBase Management System research group / Jiaheng Lu, Doctoral Programme in Computer Science
Rok vydání: 2018
Předmět:
Zdroj: Web and Big Data ISBN: 9783319968926
APWeb/WAIM (2)
DOI: 10.1007/978-3-319-96893-3_19
Popis: Recent years have witnessed the proliferation of large-scale Knowledge Bases (KBs). However, many entities in KBs have incomplete type information, and some are totally untyped. Even worse, fine-grained types (e.g., BasketballPlayer) containing rich semantic meanings are more likely to be incomplete, as they are more difficult to be obtained. Existing machine-based algorithms use predicates (e.g., birthPlace) of entities to infer their missing types, and they have limitations that the predicates may be insufficient to infer fine-grained types. In this paper, we utilize crowdsourcing to solve the problem, and address the challenge of controlling crowdsourcing cost. To this end, we propose a hybrid machine-crowdsourcing approach for fine-grained entity type completion. It firstly determines the types of some “representative” entities via crowdsourcing and then infers the types for remaining entities based on the crowdsourcing results. To support this approach, we first propose an embedding-based influence for type inference which considers not only the distance between entity embeddings but also the distances between entity and type embeddings. Second, we propose a new difficulty model for entity selection which can better capture the uncertainty of the machine algorithm when identifying the entity types. We demonstrate the effectiveness of our approach through experiments on real crowdsourcing platforms. The results show that our method outperforms the state-of-the-art algorithms by improving the effectiveness of fine-grained type completion at affordable crowdsourcing cost.
Databáze: OpenAIRE