{KnowledgeNet}: {A} Benchmark Dataset for Knowledge Base Population

Autor:	Jordan Schmidek, Denilson Barbosa, Filipe Mesquita, Paramita Mirza, Matteo Cannaviccio
Jazyk:	angličtina
Rok vydání:	2019
Předmět:	education.field_of_study business.industry Computer science Population computer.software_genre Relationship extraction Entity linking Knowledge base Benchmark (computing) Artificial intelligence business F1 score education computer Natural language Natural language processing
Zdroj:	Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing EMNLP/IJCNLP (1)
Popis:	KnowledgeNet is a benchmark dataset for the task of automatically populating a knowledge base (Wikidata) with facts expressed in natural language text on the web. KnowledgeNet provides text exhaustively annotated with facts, thus enabling the holistic end-to-end evaluation of knowledge base population systems as a whole, unlike previous benchmarks that are more suitable for the evaluation of individual subcomponents (e.g., entity linking, relation extraction). We discuss five baseline approaches, where the best approach achieves an F1 score of 0.50, significantly outperforming a traditional approach by 79% (0.28). However, our best baseline is far from reaching human performance (0.82), indicating our dataset is challenging. The KnowledgeNet dataset and baselines are available at https://github.com/diffbot/knowledge-net
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::d774add11f435bb492fbd149bc234e16 https://hdl.handle.net/21.11116/0000-0008-0410-1 Zobrazit plný text záznamu