Zobrazeno 1 - 10
of 148
pro vyhledávání: '"Wang, Peidong"'
Language-agnostic many-to-one end-to-end speech translation models can convert audio signals from different source languages into text in a target language. These models do not need source language identification, which improves user experience. In s
Externí odkaz:
http://arxiv.org/abs/2406.10276
Autor:
Zhang, Lingxia, Lin, Xiaodie, Wang, Peidong, Yang, Kaiyan, Zeng, Xiao, Wei, Zhaohui, Wang, Zizhu
Optimization is one of the keystones of modern science and engineering. Its applications in quantum technology and machine learning helped nurture variational quantum algorithms and generative AI respectively. We propose a general approach to design
Externí odkaz:
http://arxiv.org/abs/2404.18041
Autor:
Zhang, Yiqun, Kong, Fanheng, Wang, Peidong, Sun, Shuang, Wang, Lingshuai, Feng, Shi, Wang, Daling, Zhang, Yifei, Song, Kaisong
Stickers, while widely recognized for enhancing empathetic communication in online interactions, remain underexplored in current empathetic dialogue research, notably due to the challenge of a lack of comprehensive datasets. In this paper, we introdu
Externí odkaz:
http://arxiv.org/abs/2402.01679
The growing need for instant spoken language transcription and translation is driven by increased global communication and cross-lingual interactions. This has made offering translations in multiple languages essential for user applications. Traditio
Externí odkaz:
http://arxiv.org/abs/2310.14806
Simultaneous Speech-to-Text translation serves a critical role in real-time crosslingual communication. Despite the advancements in recent years, challenges remain in achieving stability in the translation process, a concern primarily manifested in t
Externí odkaz:
http://arxiv.org/abs/2310.04399
Autor:
Yang, Mu, Kanda, Naoyuki, Wang, Xiaofei, Chen, Junkun, Wang, Peidong, Xue, Jian, Li, Jinyu, Yoshioka, Takuya
End-to-end speech translation (ST) for conversation recordings involves several under-explored challenges such as speaker diarization (SD) without accurate word time stamps and handling of overlapping speech in a streaming fashion. In this work, we p
Externí odkaz:
http://arxiv.org/abs/2309.08007
In real-world applications, users often require both translations and transcriptions of speech to enhance their comprehension, particularly in streaming scenarios where incremental generation is necessary. This paper introduces a streaming Transforme
Externí odkaz:
http://arxiv.org/abs/2307.03354
Autor:
Sun, Eric, Li, Jinyu, Hu, Yuxuan, Zhu, Yimeng, Zhou, Long, Xue, Jian, Wang, Peidong, Liu, Linquan, Liu, Shujie, Lin, Edward, Gong, Yifan
We propose gated language experts and curriculum training to enhance multilingual transformer transducer models without requiring language identification (LID) input from users during inference. Our method incorporates a gating mechanism and LID loss
Externí odkaz:
http://arxiv.org/abs/2303.00786
Autor:
Huang, Zili, Chen, Zhuo, Kanda, Naoyuki, Wu, Jian, Wang, Yiming, Li, Jinyu, Yoshioka, Takuya, Wang, Xiaofei, Wang, Peidong
Self-supervised learning (SSL), which utilizes the input data itself for representation learning, has achieved state-of-the-art results for various downstream speech tasks. However, most of the previous studies focused on offline single-talker applic
Externí odkaz:
http://arxiv.org/abs/2211.05564
Autor:
Wang, Peidong, Sun, Eric, Xue, Jian, Wu, Yu, Zhou, Long, Gaur, Yashesh, Liu, Shujie, Li, Jinyu
Automatic speech recognition (ASR) and speech translation (ST) can both use neural transducers as the model structure. It is thus possible to use a single transducer model to perform both tasks. In real-world applications, such joint ASR and ST model
Externí odkaz:
http://arxiv.org/abs/2211.02809