Zobrazeno 1 - 10
of 25
pro vyhledávání: '"Ma, Zhengrui"'
Models like GPT-4o enable real-time interaction with large language models (LLMs) through speech, significantly enhancing user experience compared to traditional text-based interaction. However, there is still a lack of exploration on how to build sp
Externí odkaz:
http://arxiv.org/abs/2409.06666
Direct speech-to-speech translation (S2ST) has achieved impressive translation quality, but it often faces the challenge of slow decoding due to the considerable length of speech sequences. Recently, some research has turned to non-autoregressive (NA
Externí odkaz:
http://arxiv.org/abs/2406.07330
Recently proposed two-pass direct speech-to-speech translation (S2ST) models decompose the task into speech-to-text translation (S2TT) and text-to-speech (TTS) within an end-to-end model, yielding promising results. However, the training of these mod
Externí odkaz:
http://arxiv.org/abs/2406.07289
Simultaneous translation models play a crucial role in facilitating communication. However, existing research primarily focuses on text-to-text or speech-to-text models, necessitating additional cascade components to achieve speech-to-speech translat
Externí odkaz:
http://arxiv.org/abs/2406.06937
Simultaneous Machine Translation (SiMT) generates target translations while reading the source sentence. It relies on a policy to determine the optimal timing for reading sentences and generating translations. Existing SiMT methods generally adopt th
Externí odkaz:
http://arxiv.org/abs/2406.06910
Simultaneous speech-to-speech translation (Simul-S2ST, a.k.a streaming speech translation) outputs target speech while receiving streaming speech inputs, which is critical for real-time communication. Beyond accomplishing translation between speech,
Externí odkaz:
http://arxiv.org/abs/2406.03049
Simultaneous Machine Translation (SiMT) generates translations while reading the source sentence, necessitating a policy to determine the optimal timing for reading and generating words. Despite the remarkable performance achieved by Large Language M
Externí odkaz:
http://arxiv.org/abs/2402.13036
Non-autoregressive Transformer(NAT) significantly accelerates the inference of neural machine translation. However, conventional NAT models suffer from limited expression power and performance degradation compared to autoregressive (AT) models due to
Externí odkaz:
http://arxiv.org/abs/2311.07941
Maximum likelihood estimation (MLE) is a statistical method used to estimate the parameters of a probability distribution that best explain the observed data. In the context of text generation, MLE is often used to train generative language models, w
Externí odkaz:
http://arxiv.org/abs/2310.17217
Simultaneous machine translation (SiMT) models are trained to strike a balance between latency and translation quality. However, training these models to achieve high quality while maintaining low latency often leads to a tendency for aggressive anti
Externí odkaz:
http://arxiv.org/abs/2310.14883