Fuzzy-Match Repair Guided by Quality Estimation

Autor:	Mikel L. Forcada, Felipe Sánchez-Martínez, John Ortega
Přispěvatelé:	Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, Transducens
Rok vydání:	2022
Předmět:	Machine translation Computer science media_common.quotation_subject 02 engineering and technology computer.software_genre Translation (geometry) Quality estimation Set (abstract data type) Translation memories Artificial Intelligence 0202 electrical engineering electronic engineering information engineering Translations Quality (business) Fuzzy-match repair Language media_common business.industry Applied Mathematics Translating Approximate string matching Computer-aided translation Computational Theory and Mathematics Lenguajes y Sistemas Informáticos 020201 artificial intelligence & image processing Translation memory Computer Vision and Pattern Recognition Artificial intelligence business computer Algorithms Software Natural language processing
Zdroj:	RUA. Repositorio Institucional de la Universidad de Alicante Universidad de Alicante (UA)
ISSN:	1939-3539 0162-8828
DOI:	10.1109/tpami.2020.3021361
Popis:	Computer-aided translation tools based on translation memories are widely used to assist professional translators. A translation memory (TM) consists of a set of translation units (TU) made up of source- and target-language segment pairs. For the translation of a new source segment s', these tools search the TM and retrieve the TUs (s,t) whose source segments are more similar to s'. The translator then chooses a TU and edit the target segment t to turn it into an adequate translation of s'. Fuzzy-match repair (FMR) techniques can be used to automatically modify the parts of t that need to be edited. We describe a language-independent FMR method that first uses machine translation to generate, given s' and (s,t), a set of candidate fuzzy-match repaired segments, and then chooses the best one by estimating their quality. An evaluation on three different language pairs shows that the selected candidate is a good approximation to the best (oracle) candidate produced and is closer to reference translations than machine-translated segments and unrepaired fuzzy matches (t). In addition, a single quality estimation model trained on a mix of data from all the languages performs well on any of the languages used. This work was supported by the Spanish Government through the EFFORTUNE project [TIN-2015-69632-R].
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::ab57dca3d1e24ec908c94459afb7d7be https://doi.org/10.1109/tpami.2020.3021361 Zobrazit plný text záznamu