Can Cognate Prediction Be Modelled as a Low-Resource Machine Translation Task?
Autor: | Rachel Bawden, Clémentine Fourrier, Benoît Sagot |
---|---|
Přispěvatelé: | Automatic Language Modelling and ANAlysis & Computational Humanities (ALMAnaCH), Inria de Paris, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), ANR-19-P3IA-0001,PRAIRIE,PaRis Artificial Intelligence Research InstitutE(2019) |
Jazyk: | angličtina |
Rok vydání: | 2021 |
Předmět: |
Machine translation
business.industry Low resource Computer science Romance languages computer.software_genre [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] Task (project management) Cognate Artificial intelligence business computer Natural language processing Word (computer architecture) |
Zdroj: | Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Aug 2021, Bangkok, Thailand ACL/IJCNLP (Findings) HAL ACL-IJCNLP 2021-Findings of the Association for Computational Linguistics ACL-IJCNLP 2021-Findings of the Association for Computational Linguistics, Aug 2021, Bangkok, Thailand |
Popis: | International audience; Cognate prediction is the task of generating, in a given language, the likely cognates of words in a related language, where cognates are words in related languages that have evolved from a common ancestor word. It is a task for which little data exists and which can aid linguists in the discovery of previously undiscovered relations. Previous work has applied machine translation (MT) techniques to this task, based on the tasks' similarities, without, however, studying their numerous differences or optimising architectural choices and hyper-parameters. In this paper, we investigate whether cognate prediction can benefit from insights from low-resource MT. We first compare statistical MT (SMT) and neural MT (NMT) architectures in a bilingual setup. We then study the impact of employing data augmentation techniques commonly seen to give gains in low-resource MT: monolingual pretraining, backtranslation and multilinguality. Our experiments on several Romance languages show that cognate prediction behaves only to a certain extent like a standard lowresource MT task. In particular, MT architectures, both statistical and neural, can be successfully used for the task, but using supplementary monolingual data is not always as beneficial as using additional language data, contrarily to what is observed for MT. |
Databáze: | OpenAIRE |
Externí odkaz: |