Zobrazeno 1 - 10
of 117
pro vyhledávání: '"Chen, Junkun"'
Language-agnostic many-to-one end-to-end speech translation models can convert audio signals from different source languages into text in a target language. These models do not need source language identification, which improves user experience. In s
Externí odkaz:
http://arxiv.org/abs/2406.10276
The growing need for instant spoken language transcription and translation is driven by increased global communication and cross-lingual interactions. This has made offering translations in multiple languages essential for user applications. Traditio
Externí odkaz:
http://arxiv.org/abs/2310.14806
Simultaneous Speech-to-Text translation serves a critical role in real-time crosslingual communication. Despite the advancements in recent years, challenges remain in achieving stability in the translation process, a concern primarily manifested in t
Externí odkaz:
http://arxiv.org/abs/2310.04399
Autor:
Yang, Mu, Kanda, Naoyuki, Wang, Xiaofei, Chen, Junkun, Wang, Peidong, Xue, Jian, Li, Jinyu, Yoshioka, Takuya
End-to-end speech translation (ST) for conversation recordings involves several under-explored challenges such as speaker diarization (SD) without accurate word time stamps and handling of overlapping speech in a streaming fashion. In this work, we p
Externí odkaz:
http://arxiv.org/abs/2309.08007
In real-world applications, users often require both translations and transcriptions of speech to enhance their comprehension, particularly in streaming scenarios where incremental generation is necessary. This paper introduces a streaming Transforme
Externí odkaz:
http://arxiv.org/abs/2307.03354
Autor:
Fan, Xiaoran, Pang, Chao, Yuan, Tian, Bai, He, Zheng, Renjie, Zhu, Pengfei, Wang, Shuohuan, Chen, Junkun, Chen, Zeyu, Huang, Liang, Sun, Yu, Wu, Hua
Speech representation learning has improved both speech understanding and speech synthesis tasks for single language. However, its ability in cross-lingual scenarios has not been explored. In this paper, we extend the pretraining method for cross-lin
Externí odkaz:
http://arxiv.org/abs/2211.03545
Self-supervised contrastive learning is a powerful tool to learn visual representation without labels. Prior work has primarily focused on evaluating the recognition accuracy of various pre-training algorithms, but has overlooked other behavioral asp
Externí odkaz:
http://arxiv.org/abs/2206.05259
Autor:
Zhang, Hui, Yuan, Tian, Chen, Junkun, Li, Xintong, Zheng, Renjie, Huang, Yuxin, Chen, Xiaojie, Gong, Enlei, Chen, Zeyu, Hu, Xiaoguang, Yu, Dianhai, Ma, Yanjun, Huang, Liang
PaddleSpeech is an open-source all-in-one speech toolkit. It aims at facilitating the development and research of speech processing technologies by providing an easy-to-use command-line interface and a simple code structure. This paper describes the
Externí odkaz:
http://arxiv.org/abs/2205.12007
Autor:
Xun, Guangxu, Ma, Mingbo, Bian, Yuchen, Cai, Xingyu, Huang, Jiaji, Zheng, Renjie, Chen, Junkun, Yuan, Jiahong, Church, Kenneth, Huang, Liang
In simultaneous translation (SimulMT), the most widely used strategy is the wait-k policy thanks to its simplicity and effectiveness in balancing translation quality and latency. However, wait-k suffers from two major limitations: (a) it is a fixed p
Externí odkaz:
http://arxiv.org/abs/2204.12672
Recently, speech representation learning has improved many speech-related tasks such as speech recognition, speech classification, and speech-to-text translation. However, all the above tasks are in the direction of speech understanding, but for the
Externí odkaz:
http://arxiv.org/abs/2203.09690