Cross-lingual linking of multi-word entities and language-dependent learning of multi-word entity patterns

Autor: Jacquet, Guillaume, Ehrmann, Maud, Piskorski, Jakub, Tanev, Hristo, Steinberger, Ralf
Jazyk: angličtina
Rok vydání: 2019
DOI: 10.5281/zenodo.2579049
Popis: We address large-scale multilingual multi-word entity (MWEntity) recognition and variant matching. Firstly, we recognise MWEntities in 22 different languages, iden- tify monolingual variant spellings and link equivalent groups of variants across all languages. We then use the previously recognised MWEntities to learn new recog- nition rules based on distributional patterns. Not requiring any linguistic tools, the method is suitable for our highly multilingual environment. When adding the new rules to the original rule-based NER system, F1 performance for Spanish increases from 42.4% to 50% (18% increase) and for English from 43.4% to 44.5% (2.5% in- crease). Besides aiming at turning free text into semi-structured data for search and for machine-processing purposes, we use the system to link related news over time and across languages, as well as to detect trends.  
Databáze: OpenAIRE