Cross-Lingual Link Discovery for Under-Resourced Languages

Autor: Rosner, M., Ahmadi, S., Apostol, E. -S, Bosque-Gil, J., Chiarcos, C., Dojchinovski, M., Gkirtzou, K., Gracia, J., Dagmar Gromann, Liebeskind, C., Oleškevičienė, G. V., Sérasset, G., Truică, C. -O
Jazyk: angličtina
Rok vydání: 2022
Předmět:
Zdroj: Scopus-Elsevier
DOI: 10.5281/zenodo.7108066
Popis: In this paper, we provide an overview of current technologies for cross-lingual link discovery, and we discuss challenges, experiences and prospects of their application to under-resourced languages. We first introduce the goals of cross-lingual linking and associated technologies, and in particular, the role that the Linked Data paradigm (Bizer et al., 2011) applied to language data can play in this context. We define under-resourced languages with a specific focus on languages actively used on the internet, i.e., languages with a digitally versatile speaker community, but limited support in terms of language technology. We argue that languages for which considerable amounts of textual data and (at least) a bilingual word list are available, techniques for cross-lingual linking can be readily applied, and that these enable the implementation of downstream applications for under-resourced languages via the localisation and adaptation of existing technologies and resources.
This article is based upon work from COST Action NexusLinguarum – "European network for Webcentered linguistic data science" (CA18209), supported by COST (European Cooperation in Science and Technology) www.cost.eu. This work is also partially supported by the I+D+i project PID2020-113903RBI00, funded by MCIN/AEI/10.13039/501100011033, by DGA/FEDER, and by the Agencia Estatal de Investigacion´ of the Spanish Ministry of Economy and Competitiveness and the European Social Fund through the "Ramon y Cajal" program (RYC2019-028112-I). ´ (Abgaz, 2020)
Databáze: OpenAIRE