Zobrazeno 1 - 7
of 7
pro vyhledávání: '"Xu, Yaoxun"'
Autor:
Xu, Yaoxun, Chen, Hangting, Yu, Jianwei, Tan, Wei, Gu, Rongzhi, Lei, Shun, Lin, Zhiwei, Wu, Zhiyong
Music codecs are a vital aspect of audio codec research, and ultra low-bitrate compression holds significant importance for music transmission and generation. Due to the complexity of music backgrounds and the richness of vocals, solely relying on mo
Externí odkaz:
http://arxiv.org/abs/2409.13216
This paper investigates discrete and continuous speech representations in Large Language Model (LLM)-based Automatic Speech Recognition (ASR), organizing them by feature continuity and training approach into four categories: supervised and unsupervis
Externí odkaz:
http://arxiv.org/abs/2409.00800
Autor:
Shi, Mohan, Jin, Zengrui, Xu, Yaoxun, Xu, Yong, Zhang, Shi-Xiong, Wei, Kun, Shao, Yiwen, Zhang, Chunlei, Yu, Dong
Recognizing overlapping speech from multiple speakers in conversational scenarios is one of the most challenging problem for automatic speech recognition (ASR). Serialized output training (SOT) is a classic method to address multi-talker ASR, with th
Externí odkaz:
http://arxiv.org/abs/2408.17431
In automatic speech recognition, subsampling is essential for tackling diverse scenarios. However, the inadequacy of a single subsampling rate to address various real-world situations often necessitates training and deploying multiple models, consequ
Externí odkaz:
http://arxiv.org/abs/2408.04325
Autor:
Xu, Yaoxun, Chen, Hangting, Yu, Jianwei, Huang, Qiaochu, Wu, Zhiyong, Zhang, Shixiong, Li, Guangzhi, Luo, Yi, Gu, Rongzhi
Speech emotions are crucial in human communication and are extensively used in fields like speech synthesis and natural language understanding. Most prior studies, such as speech emotion recognition, have categorized speech emotions into a fixed set
Externí odkaz:
http://arxiv.org/abs/2312.10381
Autor:
Zhu, Jiaxu, Tong, Weinan, Xu, Yaoxun, Song, Changhe, Wu, Zhiyong, You, Zhao, Su, Dan, Yu, Dong, Meng, Helen
Mapping two modalities, speech and text, into a shared representation space, is a research topic of using text-only data to improve end-to-end automatic speech recognition (ASR) performance in new domains. However, the length of speech representation
Externí odkaz:
http://arxiv.org/abs/2309.02459
Autor:
Xu, Yaoxun, Liu, Baiji, and, Qiaochu Huang, Song, Xingchen, Wu, Zhiyong, Kang, Shiyin, Meng, Helen
Due to the mismatch between the source and target domains, how to better utilize the biased word information to improve the performance of the automatic speech recognition model in the target domain becomes a hot research topic. Previous approaches e
Externí odkaz:
http://arxiv.org/abs/2304.09607