Zobrazeno 1 - 10
of 105
pro vyhledávání: '"Hu, Xinhui"'
The global population is rapidly aging, necessitating technologies that promote healthy aging. Voice User Interfaces (VUIs), leveraging natural language interaction, offer a promising solution for older adults due to their ease of use. However, curre
Externí odkaz:
http://arxiv.org/abs/2409.08449
In this work, we propose a Switch-Conformer-based MoE system named SC-MoE for unified streaming and non-streaming code-switching (CS) automatic speech recognition (ASR), where we design a streaming MoE layer consisting of three language experts, whic
Externí odkaz:
http://arxiv.org/abs/2406.18021
Autor:
Tian, Jingguang, Ye, Shuaishuai, Chen, Shunfei, Xiang, Yang, Yin, Zhaohui, Hu, Xinhui, Xu, Xinkang
This paper presents our system submission for the In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) Challenge, which focuses on speaker diarization and speech recognition in complex multi-speaker scenarios. To address these challenges, we
Externí odkaz:
http://arxiv.org/abs/2405.05498
Generally, the performance of deep neural networks (DNNs) heavily depends on the quality of data representation learning. Our preliminary work has emphasized the significance of deep representation learning (DRL) in the context of speech enhancement
Externí odkaz:
http://arxiv.org/abs/2312.09620
Recently, researchers have shown an increasing interest in automatically predicting the subjective evaluation for speech synthesis systems. This prediction is a challenging task, especially on the out-of-domain test set. In this paper, we proposed a
Externí odkaz:
http://arxiv.org/abs/2311.10656
Overlapped Speech Detection (OSD) is an important part of speech applications involving analysis of multi-party conversations. However, most of existing OSD systems are trained and evaluated on small datasets with limited application domains, which l
Externí odkaz:
http://arxiv.org/abs/2308.05987
In this work, we empirically confirm that non-autoregressive translation with an iterative refinement mechanism (IR-NAT) suffers from poor acceleration robustness because it is more sensitive to decoding batch size and computing device setting than a
Externí odkaz:
http://arxiv.org/abs/2210.10416
In this technical report, we describe the Royalflush submissions for the VoxCeleb Speaker Recognition Challenge 2022 (VoxSRC-22). Our submissions contain track 1, which is for supervised speaker verification and track 3, which is for semi-supervised
Externí odkaz:
http://arxiv.org/abs/2209.09010
Autor:
Hu, Xinhui1 (AUTHOR), Ding, Hong2 (AUTHOR), Wei, Qing1 (AUTHOR), Chen, Ruoxin1 (AUTHOR), Zhao, Weiting1 (AUTHOR), Jiang, Liqiong3 (AUTHOR), Wang, Jing1 (AUTHOR), Liu, Haifei1 (AUTHOR), Cao, Jingyuan4 (AUTHOR), Liu, Hong1 (AUTHOR) jstzliu@sina.com, Wang, Bin1 (AUTHOR) wangbinhewei@126.com
Publikováno v:
Renal Failure. Dec2024, Vol. 46 Issue 2, p1-10. 10p.
This paper describes the Royalflush speaker diarization system submitted to the Multi-channel Multi-party Meeting Transcription Challenge(M2MeT). Our system comprises speech enhancement, overlapped speech detection, speaker embedding extraction, spea
Externí odkaz:
http://arxiv.org/abs/2202.04814