Cross-lingual linking of multi-word entities and language-dependent learning of multi-word entity patterns

Autor:	Jacquet, Guillaume, Ehrmann, Maud, Piskorski, Jakub, Tanev, Hristo, Steinberger, Ralf
Jazyk:	angličtina
Rok vydání:	2019
DOI:	10.5281/zenodo.2579049
Popis:	We address large-scale multilingual multi-word entity (MWEntity) recognition and variant matching. Firstly, we recognise MWEntities in 22 different languages, iden- tify monolingual variant spellings and link equivalent groups of variants across all languages. We then use the previously recognised MWEntities to learn new recog- nition rules based on distributional patterns. Not requiring any linguistic tools, the method is suitable for our highly multilingual environment. When adding the new rules to the original rule-based NER system, F1 performance for Spanish increases from 42.4% to 50% (18% increase) and for English from 43.4% to 44.5% (2.5% in- crease). Besides aiming at turning free text into semi-structured data for search and for machine-processing purposes, we use the system to link related news over time and across languages, as well as to detect trends.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::265b77d4615b8141962868716d1afb05 Zobrazit plný text záznamu