The Indigenous Languages Technology project at NRC Canada: an empowerment-oriented approach to developing language software

Autor:	Nathan Thanyehténhas Brinklow, Alain Désilets, S. Z. Child, Benoit Farley, Eddie Antonio Santos, Akwiratékha' Martin, Anna Kazantseva, Fineen Davis, Christopher Cox, Delasie Torkornoo, Heather Souter, Brian Maracle Owennatékha, Eric Joanis, Gilles Boulianne, Olivia Sammons, Rebecca Knowles, Darlene A. Stewart, Marie-Odile Junker, Patrick Littell, Delaney Lothian, Vishwa Gupta, Caroline Running Wolf, Daisy Rosenblum, Roland Kuhn, David Huggins-Daines, Aidan Pine
Jazyk:	angličtina
Rok vydání:	2020
Předmět:	030505 public health Machine translation Computer science business.industry media_common.quotation_subject Iroquoian language Verb 02 engineering and technology computer.software_genre Indigenous World Wide Web 03 medical and health sciences Software Polysynthetic language 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Transcription (software) 0305 other medical science Empowerment business computer media_common
Zdroj:	COLING
Popis:	This paper describes a three-year project at the National Research Council of Canada aimed at developing software to assist Indigenous communities in their efforts to preserve their languages and extend their use. The project aimed to work within the empowerment paradigm, where the linguistic goals of communities have at least equal weight with those of the researchers, and where collaboration with communities is central. Because many of the technological directions we took were in response to community needs, the project ended up as a collection of diverse subprojects, including the creation of a sophisticated framework for building verb conjugators for highly inflectional polysynthetic languages (a verb conjugator for Kanyen’kéha, in the Iroquoian language family, was built in the framework), release of what is probably the largest available corpus of sentences in a polysynthetic language (Inuktut) aligned with English sentences and experiments with machine translation (MT) systems trained on this corpus, free online services based on automatic speech recognition (ASR) for easing the transcription bottleneck for recordings of speech in Indigenous languages (and other languages), limited-domain text-to-speech synthesis for some Indigenous languages, and several other subprojects. Ce rapport technique décrit un projet de trois ans au Conseil national de recherches du Canada pour le développement de logiciels visant à soutenir les communautés autochtones dans leurs efforts de préservation et de revitalisation de leurs langues. Ce projet s’efforce de respecter le principe de l’autonomisation : l’importance accordée aux objectifs linguistiques des communautés est supérieure ou égale à celle accordée à ceux des chercheurs et la collaboration avec les communautés est centrale. Puisque beaucoup des orientations technologiques ont été choisies en réponse aux besoins des communautés, le projet a donné lieu à divers sous-projets, notamment : la création d’une plateforme pour créer des conjugueurs pour les langues polysynthétiques hautement flexionnelles (un conjugueur pour la langue iroquoienne kanyen'kéha a été développé à l’aide de cette plateforme); la publication d’un corpus bilingue inuktut–anglais, qui est probablement le plus grand corpus de phrases disponible pour une langue polysynthétique; un projet d’expérimentation de la traduction automatique entrainée sur ce corpus; des services en ligne gratuits basés sur la reconnaissance de la parole automatique pour soulager le goulot d’étranglement de la transcription des enregistrements en langues autochtones; un projet sur l’utilisation de la synthèse vocale à domaine restreint pour certaines langues autochtones; et plusieurs autres sous-projets.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::9627dc3cdc6226aaef4b066aa19a4341 https://doi.org/10.18653/v1/2020.coling-main.516 Zobrazit plný text záznamu