Zobrazeno 1 - 10
of 39
pro vyhledávání: '"He, Zhongjun"'
The training paradigm for machine translation has gradually shifted, from learning neural machine translation (NMT) models with extensive parallel corpora to instruction finetuning on multilingual large language models (LLMs) with high-quality transl
Externí odkaz:
http://arxiv.org/abs/2401.05861
Consistency regularization methods, such as R-Drop (Liang et al., 2021) and CrossConST (Gao et al., 2023), have achieved impressive supervised and zero-shot performance in the neural machine translation (NMT) field. Can we also boost end-to-end (E2E)
Externí odkaz:
http://arxiv.org/abs/2308.14482
Multilingual sentence representations are the foundation for similarity-based bitext mining, which is crucial for scaling multilingual neural machine translation (NMT) system to more languages. In this paper, we introduce MuSR: a one-for-all Multilin
Externí odkaz:
http://arxiv.org/abs/2306.06919
The multilingual neural machine translation (NMT) model has a promising capability of zero-shot translation, where it could directly translate between language pairs unseen during training. For good transfer performance from supervised directions to
Externí odkaz:
http://arxiv.org/abs/2305.07310
We introduce Bi-SimCut: a simple but effective training strategy to boost neural machine translation (NMT) performance. It consists of two procedures: bidirectional pretraining and unidirectional finetuning. Both procedures utilize SimCut, a simple r
Externí odkaz:
http://arxiv.org/abs/2206.02368
Diverse machine translation aims at generating various target language translations for a given source language sentence. Leveraging the linear relationship in the sentence latent space introduced by the mixup training, we propose a novel method, Mix
Externí odkaz:
http://arxiv.org/abs/2109.03402
Autor:
Zhang, Ruiqing, Wang, Xiyang, Zhang, Chuanqiang, He, Zhongjun, Wu, Hua, Li, Zhi, Wang, Haifeng, Chen, Ying, Li, Qinfei
This paper presents BSTC (Baidu Speech Translation Corpus), a large-scale Chinese-English speech translation dataset. This dataset is constructed based on a collection of licensed videos of talks or lectures, including about 68 hours of Mandarin data
Externí odkaz:
http://arxiv.org/abs/2104.03575
Autor:
Liu, Yuchen, Zhang, Jiajun, Xiong, Hao, Zhou, Long, He, Zhongjun, Wu, Hua, Wang, Haifeng, Zong, Chengqing
Speech-to-text translation (ST), which translates source language speech into target language text, has attracted intensive attention in recent years. Compared to the traditional pipeline system, the end-to-end ST model has potential benefits of lowe
Externí odkaz:
http://arxiv.org/abs/1912.07240
Conventional Neural Machine Translation (NMT) models benefit from the training with an additional agent, e.g., dual learning, and bidirectional decoding with one agent decoding from left to right and the other decoding in the opposite direction. In t
Externí odkaz:
http://arxiv.org/abs/1909.01101
In this paper, we present DuTongChuan, a novel context-aware translation model for simultaneous interpreting. This model allows to constantly read streaming text from the Automatic Speech Recognition (ASR) model and simultaneously determine the bound
Externí odkaz:
http://arxiv.org/abs/1907.12984