TSE-NER: An Iterative Approach for Long-Tail Entity Extraction in Scientific Publications

Autor:	Mesbah, S., Lofi, C., Valle Torre, M., Bozzon, A., Houben, G.J.P.M., Vrandečić, D., Bontcheva, K., Suárez-Figueroa, M.C., Presutti, V., Celino, I., Sabou, M., Kaffee, L.M, Simperl, E.
Rok vydání:	2018
Předmět:	0301 basic medicine Training set Computer science business.industry Workaround Specific knowledge computer.software_genre Task (project management) Set (abstract data type) 03 medical and health sciences 030104 developmental biology Entity type Named-entity recognition Semantic expansion Artificial intelligence business computer Natural language processing
Zdroj:	Lecture Notes in Computer Science ISBN: 9783030006709 ISWC (1) The Semantic Web – ISWC 2018: Proceedings of the 17th International Semantic Web Conference The Semantic Web – ISWC 2018
DOI:	10.1007/978-3-030-00671-6_8
Popis:	Named Entity Recognition and Typing (NER/NET) is a challenging task, especially with long-tail entities such as the ones found in scientific publications. These entities (e.g. “WebKB”, “StatSnowball”) are rare, often relevant only in specific knowledge domains, yet important for retrieval and exploration purposes. State-of-the-art NER approaches employ supervised machine learning models, trained on expensive typelabeled data laboriously produced by human annotators. A common workaround is the generation of labeled training data from knowledge bases; this approach is not suitable for long-tail entity types that are, by definition, scarcely represented in KBs.This paper presents an iterative approach for training NER and NETclassifiers in scientific publications that relies on minimal human input,namely a small seed set of instances for the targeted entity type. Weintroduce different strategies for training data extraction, semantic expansion, and result entity filtering.We evaluate our approach on scientificpublications, focusing on the long-tail entities types Datasets, Methods incomputer science publications, and Proteins in biomedical publications.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::6d5b0d5dc0bfa019e4302372c511abc5 https://doi.org/10.1007/978-3-030-00671-6_8 Zobrazit plný text záznamu