Statistical models for text normalization and machine translation

Autor: Kevin P. Scannell
Rok vydání: 2014
Předmět:
Zdroj: Proceedings of the First Celtic Language Technology Workshop.
DOI: 10.3115/v1/w14-4605
Popis: Irish and Scottish Gaelic are closely-related languages that together with Manx Gaelic make up the Goidelic branch of the Celtic family. We present a statistical model for translation from Scottish Gaelic to Irish that we hope will facilitate communication between the two language communities, especially in social media. An important aspect of this work is to overcome the orthographical differences between the languages, many of which were introduced in a major spelling reform of Irish in the 1940’s and 1950’s. Prior to that date, the orthographies of the two languages were quite similar, thanks in part to a shared literary tradition. As a consequence of this, machine translation from Scottish Gaelic to Irish has a great deal in common with the problem of normalizing pre-standard Irish texts, a problem with applications to lexicography and information retrieval. We show how a single statistical model can be used effectively in both contexts.
Databáze: OpenAIRE