Recent advances in machine translation using comparable corpora

Autor: Pierre Zweigenbaum, Reinhard Rapp, Serge Sharoff
Rok vydání: 2016
Předmět:
Zdroj: Natural Language Engineering. 22:501-516
ISSN: 1469-8110
1351-3249
DOI: 10.1017/s1351324916000115
Popis: This paper highlights some of the recent developments in the field of machine translation using comparable corpora. We start by updating previous definitions of comparable corpora and then look at bilingual versions of continuous vector space models. Recently, neural networks have been used to obtain latent context representations with only few dimensions which are often called word embeddings. These promising new techniques cannot only be applied to parallel but also to comparable corpora. Subsequent sections of the paper discuss work specifically targeting at machine translation using comparable corpora, as well as work dealing with the extraction of parallel segments from comparable corpora. Finally, we give an overview on the design and the results of a recent shared task on measuring document comparability across languages.
Databáze: OpenAIRE