Zobrazeno 1 - 10
of 48
pro vyhledávání: '"Peng, Zhendong"'
In automatic speech recognition, subsampling is essential for tackling diverse scenarios. However, the inadequacy of a single subsampling rate to address various real-world situations often necessitates training and deploying multiple models, consequ
Externí odkaz:
http://arxiv.org/abs/2408.04325
Autor:
Peng, Qihao, Ren, Hong, Peng, Zhendong, Pan, Cunhua, Elkashlan, Maged, Wang, Dongming, Wang, Jiangzhou, You, Xiaohu
This paper analyzes the impact of pilot-sharing scheme on synchronization performance in a scenario where several slave access points (APs) with uncertain carrier frequency offsets (CFOs) and timing offsets (TOs) share a common pilot sequence. First,
Externí odkaz:
http://arxiv.org/abs/2405.18775
Autor:
Song, Xingchen, Wu, Di, Zhang, Binbin, Zhou, Dinghao, Peng, Zhendong, Dang, Bo, Pan, Fuping, Yang, Chao
Scale has opened new frontiers in natural language processing, but at a high cost. In response, by learning to only activate a subset of parameters in training and inference, Mixture-of-Experts (MoE) have been proposed as an energy efficient path to
Externí odkaz:
http://arxiv.org/abs/2404.16407
We propose a novel integrated sensing and communication (ISAC) system that leverages sensing to assist communication, ensuring fast initial access, seamless user tracking, and uninterrupted communication for millimeter wave (mmWave) wideband systems.
Externí odkaz:
http://arxiv.org/abs/2403.09330
Recent advances in neural text-to-speech (TTS) models bring thousands of TTS applications into daily life, where models are deployed in cloud to provide services for customs. Among these models are diffusion probabilistic models (DPMs), which can be
Externí odkaz:
http://arxiv.org/abs/2308.16569
Publikováno v:
inproceedings{song23c_interspeech, year=2023, booktitle={Proc. INTERSPEECH 2023}, pages={1648--1652}}
In this paper, we present ZeroPrompt (Figure 1-(a)) and the corresponding Prompt-and-Refine strategy (Figure 3), two simple but effective \textbf{training-free} methods to decrease the Token Display Time (TDT) of streaming ASR models \textbf{without
Externí odkaz:
http://arxiv.org/abs/2305.10649
This paper investigates downlink power adaptation for the suborbital node in suborbital-ground communication systems, which are subject to extremely high reliability and ultra-low latency communications requirements. The problem is formulated as a po
Externí odkaz:
http://arxiv.org/abs/2303.05680
Autor:
Liang, Chengdong, Zhang, Xiao-Lei, Zhang, BinBin, Wu, Di, Li, Shengqiang, Song, Xingchen, Peng, Zhendong, Pan, Fuping
Recently, the unified streaming and non-streaming two-pass (U2/U2++) end-to-end model for speech recognition has shown great performance in terms of streaming capability, accuracy and latency. In this paper, we present fast-U2++, an enhanced version
Externí odkaz:
http://arxiv.org/abs/2211.00941
Autor:
Song, Xingchen, Wu, Di, Wu, Zhiyong, Zhang, Binbin, Zhang, Yuekai, Peng, Zhendong, Li, Wenpeng, Pan, Fuping, Zhu, Changbao
In this paper, we present TrimTail, a simple but effective emission regularization method to improve the latency of streaming ASR models. The core idea of TrimTail is to apply length penalty (i.e., by trimming trailing frames, see Fig. 1-(b)) directl
Externí odkaz:
http://arxiv.org/abs/2211.00522
Autor:
Song, Xingchen, Wu, Di, Zhang, Binbin, Wu, Zhiyong, Li, Wenpeng, Li, Dongfang, Zhang, Pengshen, Peng, Zhendong, Pan, Fuping, Zhu, Changbao, Wu, Zhongqin
The recently proposed Conformer architecture which combines convolution with attention to capture both local and global dependencies has become the \textit{de facto} backbone model for Automatic Speech Recognition~(ASR). Inherited from the Natural La
Externí odkaz:
http://arxiv.org/abs/2210.17079