Zobrazeno 51 - 60
of 2 053
pro vyhledávání: '"Li, Jinyu"'
The rapid development of single-modal pre-training has prompted researchers to pay more attention to cross-modal pre-training methods. In this paper, we propose a unified-modal speech-unit-text pre-training model, SpeechUT, to connect the representat
Externí odkaz:
http://arxiv.org/abs/2210.03730
Autor:
Zhang, Ziqiang, Chen, Sanyuan, Zhou, Long, Wu, Yu, Ren, Shuo, Liu, Shujie, Yao, Zhuoyuan, Gong, Xun, Dai, Lirong, Li, Jinyu, Wei, Furu
How to boost speech pre-training with textual data is an unsolved problem due to the fact that speech and text are very different modalities with distinct characteristics. In this paper, we propose a cross-modal Speech and Language Model (SpeechLM) t
Externí odkaz:
http://arxiv.org/abs/2209.15329
This paper presents a novel streaming automatic speech recognition (ASR) framework for multi-talker overlapping speech captured by a distant microphone array with an arbitrary geometry. Our framework, named t-SOT-VA, capitalizes on independently deve
Externí odkaz:
http://arxiv.org/abs/2209.04974
Autor:
Ye, Weicai, Yu, Xingyuan, Lan, Xinyue, Ming, Yuhang, Li, Jinyu, Bao, Hujun, Cui, Zhaopeng, Zhang, Guofeng
We present a novel dual-flow representation of scene motion that decomposes the optical flow into a static flow field caused by the camera motion and another dynamic flow field caused by the objects' movements in the scene. Based on this representati
Externí odkaz:
http://arxiv.org/abs/2207.08794
Recently, masked prediction pre-training has seen remarkable progress in self-supervised learning (SSL) for speech recognition. It usually requires a codebook obtained in an unsupervised way, making it less accurate and difficult to interpret. We pro
Externí odkaz:
http://arxiv.org/abs/2206.10125
This paper describes the submission of our end-to-end YiTrans speech translation system for the IWSLT 2022 offline task, which translates from English audio to German, Chinese, and Japanese. The YiTrans system is built on large-scale pre-trained enco
Externí odkaz:
http://arxiv.org/abs/2206.05777
Publikováno v:
电力工程技术, Vol 42, Iss 6, Pp 249-255 (2023)
As an important part of the national security system, energy security is of great importance to the construction of a modern and powerful socialist country in China. Based on the definition of energy security by the International Energy Agency, an ev
Externí odkaz:
https://doaj.org/article/f995d428d06040759659cc92f9e8ba0e
Autor:
Chen, Sanyuan, Wu, Yu, Chen, Zhuo, Wu, Jian, Yoshioka, Takuya, Liu, Shujie, Li, Jinyu, Yu, Xiangzhan
Transformer has been successfully applied to speech separation recently with its strong long-dependency modeling capacity using a self-attention mechanism. However, Transformer tends to have heavy run-time costs due to the deep encoder layers, which
Externí odkaz:
http://arxiv.org/abs/2204.12777
Autor:
Chen, Sanyuan, Wu, Yu, Wang, Chengyi, Liu, Shujie, Chen, Zhuo, Wang, Peidong, Liu, Gang, Li, Jinyu, Wu, Jian, Yu, Xiangzhan, Wei, Furu
Recently, self-supervised learning (SSL) has demonstrated strong performance in speaker recognition, even if the pre-training objective is designed for speech recognition. In this paper, we study which factor leads to the success of self-supervised l
Externí odkaz:
http://arxiv.org/abs/2204.12765
Neural transducers have been widely used in automatic speech recognition (ASR). In this paper, we introduce it to streaming end-to-end speech translation (ST), which aims to convert audio signals to texts in other languages directly. Compared with ca
Externí odkaz:
http://arxiv.org/abs/2204.05352