Zobrazeno 1 - 10
of 197
pro vyhledávání: '"Pan, Fuping"'
Autor:
Song, Xingchen, Wu, Di, Zhang, Binbin, Zhou, Dinghao, Peng, Zhendong, Dang, Bo, Pan, Fuping, Yang, Chao
Scale has opened new frontiers in natural language processing, but at a high cost. In response, by learning to only activate a subset of parameters in training and inference, Mixture-of-Experts (MoE) have been proposed as an energy efficient path to
Externí odkaz:
http://arxiv.org/abs/2404.16407
This study describes our system for Task 1 Single-speaker Visual Speech Recognition (VSR) fixed track in the Chinese Continuous Visual Speech Recognition Challenge (CNVSRC) 2023. Specifically, we use intermediate connectionist temporal classification
Externí odkaz:
http://arxiv.org/abs/2312.07254
Recent advances in neural text-to-speech (TTS) models bring thousands of TTS applications into daily life, where models are deployed in cloud to provide services for customs. Among these models are diffusion probabilistic models (DPMs), which can be
Externí odkaz:
http://arxiv.org/abs/2308.16569
Publikováno v:
inproceedings{song23c_interspeech, year=2023, booktitle={Proc. INTERSPEECH 2023}, pages={1648--1652}}
In this paper, we present ZeroPrompt (Figure 1-(a)) and the corresponding Prompt-and-Refine strategy (Figure 3), two simple but effective \textbf{training-free} methods to decrease the Token Display Time (TDT) of streaming ASR models \textbf{without
Externí odkaz:
http://arxiv.org/abs/2305.10649
Autor:
Liang, Chengdong, Zhang, Xiao-Lei, Zhang, BinBin, Wu, Di, Li, Shengqiang, Song, Xingchen, Peng, Zhendong, Pan, Fuping
Recently, the unified streaming and non-streaming two-pass (U2/U2++) end-to-end model for speech recognition has shown great performance in terms of streaming capability, accuracy and latency. In this paper, we present fast-U2++, an enhanced version
Externí odkaz:
http://arxiv.org/abs/2211.00941
Autor:
Song, Xingchen, Wu, Di, Wu, Zhiyong, Zhang, Binbin, Zhang, Yuekai, Peng, Zhendong, Li, Wenpeng, Pan, Fuping, Zhu, Changbao
In this paper, we present TrimTail, a simple but effective emission regularization method to improve the latency of streaming ASR models. The core idea of TrimTail is to apply length penalty (i.e., by trimming trailing frames, see Fig. 1-(b)) directl
Externí odkaz:
http://arxiv.org/abs/2211.00522
Autor:
Song, Xingchen, Wu, Di, Zhang, Binbin, Wu, Zhiyong, Li, Wenpeng, Li, Dongfang, Zhang, Pengshen, Peng, Zhendong, Pan, Fuping, Zhu, Changbao, Wu, Zhongqin
The recently proposed Conformer architecture which combines convolution with attention to capture both local and global dependencies has become the \textit{de facto} backbone model for Automatic Speech Recognition~(ASR). Inherited from the Natural La
Externí odkaz:
http://arxiv.org/abs/2210.17079
Autor:
Wang, Jie, Xu, Menglong, Hou, Jingyong, Zhang, Binbin, Zhang, Xiao-Lei, Xie, Lei, Pan, Fuping
Keyword spotting (KWS) enables speech-based user interaction and gradually becomes an indispensable component of smart devices. Recently, end-to-end (E2E) methods have become the most popular approach for on-device KWS tasks. However, there is still
Externí odkaz:
http://arxiv.org/abs/2210.16743
Autor:
Zhang, Binbin, Wu, Di, Peng, Zhendong, Song, Xingchen, Yao, Zhuoyuan, Lv, Hang, Xie, Lei, Yang, Chao, Pan, Fuping, Niu, Jianwei
Recently, we made available WeNet, a production-oriented end-to-end speech recognition toolkit, which introduces a unified two-pass (U2) framework and a built-in runtime to address the streaming and non-streaming decoding modes in a single model. To
Externí odkaz:
http://arxiv.org/abs/2203.15455
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.