Zobrazeno 1 - 10
of 2 242
pro vyhledávání: '"ZHANG, Binbin"'
Autor:
Song, Xingchen, Liang, Chengdong, Zhang, Binbin, Zhang, Pengshen, Wang, ZiYu, Ma, Youcheng, Xu, Menglong, Wang, Lin, Wu, Di, Pan, Fuping, Zhou, Dinghao, Peng, Zhendong
Large Automatic Speech Recognition (ASR) models demand a vast number of parameters, copious amounts of data, and significant computational resources during the training process. However, such models can merely be deployed on high-compute cloud platfo
Externí odkaz:
http://arxiv.org/abs/2412.15622
Autor:
Song, Xingchen, Xing, Mengtao, Ma, Changwei, Li, Shengqiang, Wu, Di, Zhang, Binbin, Pan, Fuping, Zhou, Dinghao, Zhang, Yuekai, Lei, Shun, Peng, Zhendong, Wu, Zhiyong
It is well known that LLM-based systems are data-hungry. Recent LLM-based TTS works typically employ complex data processing pipelines to obtain high-quality training data. These sophisticated pipelines require excellent models at each stage (e.g., s
Externí odkaz:
http://arxiv.org/abs/2412.08237
Autor:
Liu, Qize, Pan, Xiaofan, Zheng, Xutao, Gao, Huaizhong, Li, Longhao, Wang, Qidong, Yang, Zirui, Tang, Chenchong, Wu, Wenxuan, Cheng, Jianping, Zeng, Zhi, Zeng, Ming, Feng, Hua, Zhang, Binbin, Wang, Zhonghai, Zhou, Rong, Liu, Yuanyuan, Lin, Lin, Zhong, Jiayong, Jiang, Jianyong, Han, Wentao, Tian, Yang, Xu, Benda, Collaboration, GRID
The Gamma-Ray Integrated Detectors (GRID) are a space science mission that employs compact gamma-ray detectors mounted on NanoSats in low Earth orbit (LEO) to monitor the transient gamma-ray sky. Owing to the unpredictability of the time and location
Externí odkaz:
http://arxiv.org/abs/2410.13402
Autor:
Xue, Hongfei, Gong, Rong, Shao, Mingchen, Xu, Xin, Wang, Lezhi, Xie, Lei, Bu, Hui, Zhou, Jiaming, Qin, Yong, Du, Jun, Li, Ming, Zhang, Binbin, Jia, Bin
The StutteringSpeech Challenge focuses on advancing speech technologies for people who stutter, specifically targeting Stuttering Event Detection (SED) and Automatic Speech Recognition (ASR) in Mandarin. The challenge comprises three tracks: (1) SED,
Externí odkaz:
http://arxiv.org/abs/2409.05430
In automatic speech recognition, subsampling is essential for tackling diverse scenarios. However, the inadequacy of a single subsampling rate to address various real-world situations often necessitates training and deploying multiple models, consequ
Externí odkaz:
http://arxiv.org/abs/2408.04325
Autor:
Gong, Rong, Xue, Hongfei, Wang, Lezhi, Xu, Xin, Li, Qisheng, Xie, Lei, Bu, Hui, Wu, Shaomei, Zhou, Jiaming, Qin, Yong, Zhang, Binbin, Du, Jun, Bin, Jia, Li, Ming
The rapid advancements in speech technologies over the past two decades have led to human-level performance in tasks like automatic speech recognition (ASR) for fluent speech. However, the efficacy of these models diminishes when applied to atypical
Externí odkaz:
http://arxiv.org/abs/2406.07256
Autor:
Ma, Linhan, Guo, Dake, Song, Kun, Jiang, Yuepeng, Wang, Shuai, Xue, Liumeng, Xu, Weiming, Zhao, Huan, Zhang, Binbin, Xie, Lei
With the development of large text-to-speech (TTS) models and scale-up of the training data, state-of-the-art TTS systems have achieved impressive performance. In this paper, we present WenetSpeech4TTS, a multi-domain Mandarin corpus derived from the
Externí odkaz:
http://arxiv.org/abs/2406.05763
In this paper we derive new second-order optimality conditions for a very general set-constrained optimization problem where the underlying set may be nononvex. We consider local optimality in specific directions (i.e., optimal in a directional neigh
Externí odkaz:
http://arxiv.org/abs/2404.17696
Autor:
Song, Xingchen, Wu, Di, Zhang, Binbin, Zhou, Dinghao, Peng, Zhendong, Dang, Bo, Pan, Fuping, Yang, Chao
Scale has opened new frontiers in natural language processing, but at a high cost. In response, by learning to only activate a subset of parameters in training and inference, Mixture-of-Experts (MoE) have been proposed as an energy efficient path to
Externí odkaz:
http://arxiv.org/abs/2404.16407
Autor:
Wang, He, Guo, Pengcheng, Li, Yue, Zhang, Ao, Sun, Jiayao, Xie, Lei, Chen, Wei, Zhou, Pan, Bu, Hui, Xu, Xin, Zhang, Binbin, Chen, Zhuo, Wu, Jian, Wang, Longbiao, Chng, Eng Siong, Li, Sun
To promote speech processing and recognition research in driving scenarios, we build on the success of the Intelligent Cockpit Speech Recognition Challenge (ICSRC) held at ISCSLP 2022 and launch the ICASSP 2024 In-Car Multi-Channel Automatic Speech R
Externí odkaz:
http://arxiv.org/abs/2401.03473