Architecture for text normalization using statistical machine translation techniques

Autor:	Lopez Ludeña, Veronica, San Segundo Hernández, Rubén, Montero Martínez, Juan Manuel, Barra Chicote, Roberto, Lorenzo Trueba, Jaime
Jazyk:	angličtina
Rok vydání:	2012
Předmět:	Telecomunicaciones
Zdroj:	Jornadas en Tecnología del Habla and III Iberian SLTech \| VII Jornadas en Tecnología del Habla and III Iberian SLTech \| 21/11/2012-22/11/2012 \| Madrid, España Archivo Digital UPM Universidad Politécnica de Madrid
Popis:	This paper proposes an architecture, based on statistical machine translation, for developing the text normalization module of a text to speech conversion system. The main target is to generate a language independent text normalization module, based on data and flexible enough to deal with all situa-tions presented in this task. The proposed architecture is composed by three main modules: a tokenizer module for splitting the text input into a token graph (tokenization), a phrase-based translation module (token translation) and a post-processing module for removing some tokens. This paper presents initial exper-iments for numbers and abbreviations. The very good results obtained validate the proposed architecture.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=dedup_wf_001::d10a16087ab46d53d779f4d22eb89af5 http://oa.upm.es/20353/ Zobrazit plný text záznamu