Using Crowdsourcing for Fine-Grained Entity Type Completion in Knowledge Bases
Autor: | Xiaoyong Du, Zhaoan Dong, Tok Wang Ling, Ju Fan, Jiaheng Lu |
---|---|
Přispěvatelé: | Cai, Yi, Ishikawa, Yoshiharu, Xu, Jianliang, Department of Computer Science, Unified DataBase Management System research group / Jiaheng Lu, Doctoral Programme in Computer Science |
Rok vydání: | 2018 |
Předmět: |
Entity type completion
Computer science business.industry Type inference 02 engineering and technology Type (model theory) 113 Computer and information sciences Machine learning computer.software_genre Crowdsourcing Knowledge base Entity type 020204 information systems 0202 electrical engineering electronic engineering information engineering Selection (linguistics) Embedding 020201 artificial intelligence & image processing Artificial intelligence business computer |
Zdroj: | Web and Big Data ISBN: 9783319968926 APWeb/WAIM (2) |
DOI: | 10.1007/978-3-319-96893-3_19 |
Popis: | Recent years have witnessed the proliferation of large-scale Knowledge Bases (KBs). However, many entities in KBs have incomplete type information, and some are totally untyped. Even worse, fine-grained types (e.g., BasketballPlayer) containing rich semantic meanings are more likely to be incomplete, as they are more difficult to be obtained. Existing machine-based algorithms use predicates (e.g., birthPlace) of entities to infer their missing types, and they have limitations that the predicates may be insufficient to infer fine-grained types. In this paper, we utilize crowdsourcing to solve the problem, and address the challenge of controlling crowdsourcing cost. To this end, we propose a hybrid machine-crowdsourcing approach for fine-grained entity type completion. It firstly determines the types of some “representative” entities via crowdsourcing and then infers the types for remaining entities based on the crowdsourcing results. To support this approach, we first propose an embedding-based influence for type inference which considers not only the distance between entity embeddings but also the distances between entity and type embeddings. Second, we propose a new difficulty model for entity selection which can better capture the uncertainty of the machine algorithm when identifying the entity types. We demonstrate the effectiveness of our approach through experiments on real crowdsourcing platforms. The results show that our method outperforms the state-of-the-art algorithms by improving the effectiveness of fine-grained type completion at affordable crowdsourcing cost. |
Databáze: | OpenAIRE |
Externí odkaz: |