Can Cognate Prediction Be Modelled as a Low-Resource Machine Translation Task?

Autor:	Rachel Bawden, Clémentine Fourrier, Benoît Sagot
Přispěvatelé:	Automatic Language Modelling and ANAlysis & Computational Humanities (ALMAnaCH), Inria de Paris, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), ANR-19-P3IA-0001,PRAIRIE,PaRis Artificial Intelligence Research InstitutE(2019)
Jazyk:	angličtina
Rok vydání:	2021
Předmět:	Machine translation business.industry Low resource Computer science Romance languages computer.software_genre [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] Task (project management) Cognate Artificial intelligence business computer Natural language processing Word (computer architecture)
Zdroj:	Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Aug 2021, Bangkok, Thailand ACL/IJCNLP (Findings) HAL ACL-IJCNLP 2021-Findings of the Association for Computational Linguistics ACL-IJCNLP 2021-Findings of the Association for Computational Linguistics, Aug 2021, Bangkok, Thailand
Popis:	International audience; Cognate prediction is the task of generating, in a given language, the likely cognates of words in a related language, where cognates are words in related languages that have evolved from a common ancestor word. It is a task for which little data exists and which can aid linguists in the discovery of previously undiscovered relations. Previous work has applied machine translation (MT) techniques to this task, based on the tasks' similarities, without, however, studying their numerous differences or optimising architectural choices and hyper-parameters. In this paper, we investigate whether cognate prediction can benefit from insights from low-resource MT. We first compare statistical MT (SMT) and neural MT (NMT) architectures in a bilingual setup. We then study the impact of employing data augmentation techniques commonly seen to give gains in low-resource MT: monolingual pretraining, backtranslation and multilinguality. Our experiments on several Romance languages show that cognate prediction behaves only to a certain extent like a standard lowresource MT task. In particular, MT architectures, both statistical and neural, can be successfully used for the task, but using supplementary monolingual data is not always as beneficial as using additional language data, contrarily to what is observed for MT.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::e7cbe8eee13ca55157c5c2bb1a4e5f09 https://hal.inria.fr/hal-03243380/file/Is_Cognate_Prediction_a_Low_Resource_Machine_Translation_Task__ACL2021Findings-2.pdf Zobrazit plný text záznamu