Improving Named Entity Linking Corpora Quality
Autor: | Albert Weichselbraun, Philipp Kuntschik, Lyndon J. B. Nixon, Adrian M. P. Brasoveanu |
---|---|
Rok vydání: | 2019 |
Předmět: |
Computer science
business.industry Process (engineering) media_common.quotation_subject InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL Comparability 020206 networking & telecommunications 02 engineering and technology Benchmarking computer.software_genre Annotation Knowledge base 0202 electrical engineering electronic engineering information engineering Key (cryptography) 020201 artificial intelligence & image processing Quality (business) Artificial intelligence business computer Natural language processing Software versioning media_common |
Zdroj: | RANLP |
DOI: | 10.26615/978-954-452-056-4_152 |
Popis: | Gold standard corpora and competitive evaluations play a key role in benchmarking named entity linking (NEL) performance and driving the development of more sophisticated NEL systems. The quality of the used corpora and the used evaluation metrics are crucial in this process. We, therefore, assess the quality of three popular evaluation corpora, identifying four major issues which affect these gold standards: (i) the use of different annotation styles, (ii) incorrect and missing annotations, (iii) Knowledge Base evolution, (iv) and differences in annotating co-occurrences. This paper addresses these issues by formalizing NEL annotations and corpus versioning which allows standardizing corpus creation, supports corpus evolution, and paves the way for the use of lenses to automatically transform between different corpus configurations. In addition, the use of clearly defined scoring rules and evaluation metrics ensures a better comparability of evaluation results. |
Databáze: | OpenAIRE |
Externí odkaz: |