Robust Multilingual Named Entity Recognition with Shallow Semi-Supervised Features

Autor:	German Rigau, 19020434 Hồ Mạnh Tân, RODRIGO AGERRI GASCON
Jazyk:	angličtina
Rok vydání:	2017
Předmět:	Linguistics and Language Dependency (UML) Information extraction Computer science Computer Science - Artificial Intelligence 02 engineering and technology Semi-supervised learning computer.software_genre Clustering Language and Linguistics Task (project management) Set (abstract data type) Named-entity recognition Artificial Intelligence 0202 electrical engineering electronic engineering information engineering Cluster analysis Computer Science - Computation and Language business.industry Natural language processing 020206 networking & telecommunications Named entity recognition 020201 artificial intelligence & image processing Artificial intelligence State (computer science) business computer
Zdroj:	Scopus-Elsevier Recercat. Dipósit de la Recerca de Catalunya instname
Popis:	We present a multilingual Named Entity Recognition approach based on a robust and general set of features across languages and datasets. Our system combines shallow local information with clustering semi-supervised features induced on large amounts of unlabeled text. Understanding via empirical experimentation how to effectively combine various types of clustering features allows us to seamlessly export our system to other datasets and languages. The result is a simple but highly competitive system which obtains state of the art results across five languages and twelve datasets. The results are reported on standard shared task evaluation data such as CoNLL for English, Spanish and Dutch. Furthermore, and despite the lack of linguistically motivated features, we also report best results for languages such as Basque and German. In addition, we demonstrate that our method also obtains very competitive results even when the amount of supervised data is cut by half, alleviating the dependency on manually annotated data. Finally, the results show that our emphasis on clustering features is crucial to develop robust out-of-domain models. The system and models are freely available to facilitate its use and guarantee the reproducibility of results. Comment: 26 pages, 19 tables (submitted for publication on September 2015), Artificial Intelligence (2016)
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::29cd3ab2cc49c8e260bb7a6dab69853e http://arxiv.org/abs/1701.09123 Zobrazit plný text záznamu