Výsledky vyhledávání - "ZHANG, Binbin"

Report

TouchASP: Elastic Automatic Speech Perception that Everyone Can Touch

Autor: Song, Xingchen, Liang, Chengdong, Zhang, Binbin, Zhang, Pengshen, Wang, ZiYu, Ma, Youcheng, Xu, Menglong, Wang, Lin, Wu, Di, Pan, Fuping, Zhou, Dinghao, Peng, Zhendong

Large Automatic Speech Recognition (ASR) models demand a vast number of parameters, copious amounts of data, and significant computational resources during the training process. However, such models can merely be deployed on high-compute cloud platfo

Externí odkaz: http://arxiv.org/abs/2412.15622

Zobrazit plný text záznamu

Report

TouchTTS: An Embarrassingly Simple TTS Framework that Everyone Can Touch

Autor: Song, Xingchen, Xing, Mengtao, Ma, Changwei, Li, Shengqiang, Wu, Di, Zhang, Binbin, Pan, Fuping, Zhou, Dinghao, Zhang, Yuekai, Lei, Shun, Peng, Zhendong, Wu, Zhiyong

It is well known that LLM-based systems are data-hungry. Recent LLM-based TTS works typically employ complex data processing pipelines to obtain high-quality training data. These sophisticated pipelines require excellent models at each stage (e.g., s

Externí odkaz: http://arxiv.org/abs/2412.08237

Zobrazit plný text záznamu

Report

Monte Carlo Simulation of Angular Response of GRID Detectors for GRID Mission

The Gamma-Ray Integrated Detectors (GRID) are a space science mission that employs compact gamma-ray detectors mounted on NanoSats in low Earth orbit (LEO) to monitor the transient gamma-ray sky. Owing to the unpredictability of the time and location

Externí odkaz: http://arxiv.org/abs/2410.13402

Zobrazit plný text záznamu

Report

Findings of the 2024 Mandarin Stuttering Event Detection and Automatic Speech Recognition Challenge

Autor: Xue, Hongfei, Gong, Rong, Shao, Mingchen, Xu, Xin, Wang, Lezhi, Xie, Lei, Bu, Hui, Zhou, Jiaming, Qin, Yong, Du, Jun, Li, Ming, Zhang, Binbin, Jia, Bin

The StutteringSpeech Challenge focuses on advancing speech technologies for people who stutter, specifically targeting Stuttering Event Detection (SED) and Automatic Speech Recognition (ASR) in Mandarin. The challenge comprises three tracks: (1) SED,

Externí odkaz: http://arxiv.org/abs/2409.05430

Zobrazit plný text záznamu

Report

HydraFormer: One Encoder For All Subsampling Rates

Autor: Xu, Yaoxun, Song, Xingchen, Wu, Zhiyong, Wu, Di, Peng, Zhendong, Zhang, Binbin

In automatic speech recognition, subsampling is essential for tackling diverse scenarios. However, the inadequacy of a single subsampling rate to address various real-world situations often necessitates training and deploying multiple models, consequ

Externí odkaz: http://arxiv.org/abs/2408.04325

Zobrazit plný text záznamu

Report

AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection

Autor: Gong, Rong, Xue, Hongfei, Wang, Lezhi, Xu, Xin, Li, Qisheng, Xie, Lei, Bu, Hui, Wu, Shaomei, Zhou, Jiaming, Qin, Yong, Zhang, Binbin, Du, Jun, Bin, Jia, Li, Ming

The rapid advancements in speech technologies over the past two decades have led to human-level performance in tasks like automatic speech recognition (ASR) for fluent speech. However, the efficacy of these models diminishes when applied to atypical

Externí odkaz: http://arxiv.org/abs/2406.07256

Zobrazit plný text záznamu

Report

WenetSpeech4TTS: A 12,800-hour Mandarin TTS Corpus for Large Speech Generation Model Benchmark

Autor: Ma, Linhan, Guo, Dake, Song, Kun, Jiang, Yuepeng, Wang, Shuai, Xue, Liumeng, Xu, Weiming, Zhao, Huan, Zhang, Binbin, Xie, Lei

With the development of large text-to-speech (TTS) models and scale-up of the training data, state-of-the-art TTS systems have achieved impressive performance. In this paper, we present WenetSpeech4TTS, a multi-domain Mandarin corpus derived from the

Externí odkaz: http://arxiv.org/abs/2406.05763

Zobrazit plný text záznamu

Report

New second-order optimality conditions for directional optimality of a general set-constrained optimization problem

Autor: Ouyang, Wei, Ye, Jane, Zhang, Binbin

In this paper we derive new second-order optimality conditions for a very general set-constrained optimization problem where the underlying set may be nononvex. We consider local optimality in specific directions (i.e., optimal in a directional neigh

Externí odkaz: http://arxiv.org/abs/2404.17696

Zobrazit plný text záznamu

Report

U2++ MoE: Scaling 4.7x parameters with minimal impact on RTF

Autor: Song, Xingchen, Wu, Di, Zhang, Binbin, Zhou, Dinghao, Peng, Zhendong, Dang, Bo, Pan, Fuping, Yang, Chao

Scale has opened new frontiers in natural language processing, but at a high cost. In response, by learning to only activate a subset of parameters in training and inference, Mixture-of-Experts (MoE) have been proposed as an energy efficient path to

Externí odkaz: http://arxiv.org/abs/2404.16407

Zobrazit plný text záznamu

Report

ICMC-ASR: The ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition Challenge

Autor: Wang, He, Guo, Pengcheng, Li, Yue, Zhang, Ao, Sun, Jiayao, Xie, Lei, Chen, Wei, Zhou, Pan, Bu, Hui, Xu, Xin, Zhang, Binbin, Chen, Zhuo, Wu, Jian, Wang, Longbiao, Chng, Eng Siong, Li, Sun

To promote speech processing and recognition research in driving scenarios, we build on the success of the Intelligent Cockpit Speech Recognition Challenge (ICSRC) held at ISCSLP 2022 and launch the ICASSP 2024 In-Car Multi-Channel Automatic Speech R

Externí odkaz: http://arxiv.org/abs/2401.03473

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání