Výsledky vyhledávání

Report

Autor: Xu, Yaoxun, Chen, Hangting, Yu, Jianwei, Tan, Wei, Gu, Rongzhi, Lei, Shun, Lin, Zhiwei, Wu, Zhiyong

Music codecs are a vital aspect of audio codec research, and ultra low-bitrate compression holds significant importance for music transmission and generation. Due to the complexity of music backgrounds and the richness of vocals, solely relying on mo

Externí odkaz: http://arxiv.org/abs/2409.13216

Zobrazit plný text záznamu

Report

Comparing Discrete and Continuous Space LLMs for Speech Recognition

Autor: Xu, Yaoxun, Zhang, Shi-Xiong, Yu, Jianwei, Wu, Zhiyong, Yu, Dong

This paper investigates discrete and continuous speech representations in Large Language Model (LLM)-based Automatic Speech Recognition (ASR), organizing them by feature continuity and training approach into four categories: supervised and unsupervis

Externí odkaz: http://arxiv.org/abs/2409.00800

Zobrazit plný text záznamu

Report

Advancing Multi-talker ASR Performance with Large Language Models

Autor: Shi, Mohan, Jin, Zengrui, Xu, Yaoxun, Xu, Yong, Zhang, Shi-Xiong, Wei, Kun, Shao, Yiwen, Zhang, Chunlei, Yu, Dong

Recognizing overlapping speech from multiple speakers in conversational scenarios is one of the most challenging problem for automatic speech recognition (ASR). Serialized output training (SOT) is a classic method to address multi-talker ASR, with th

Externí odkaz: http://arxiv.org/abs/2408.17431

Zobrazit plný text záznamu

Report

HydraFormer: One Encoder For All Subsampling Rates

Autor: Xu, Yaoxun, Song, Xingchen, Wu, Zhiyong, Wu, Di, Peng, Zhendong, Zhang, Binbin

In automatic speech recognition, subsampling is essential for tackling diverse scenarios. However, the inadequacy of a single subsampling rate to address various real-world situations often necessitates training and deploying multiple models, consequ

Externí odkaz: http://arxiv.org/abs/2408.04325

Zobrazit plný text záznamu

Report

SECap: Speech Emotion Captioning with Large Language Model

Autor: Xu, Yaoxun, Chen, Hangting, Yu, Jianwei, Huang, Qiaochu, Wu, Zhiyong, Zhang, Shixiong, Li, Guangzhi, Luo, Yi, Gu, Rongzhi

Speech emotions are crucial in human communication and are extensively used in fields like speech synthesis and natural language understanding. Most prior studies, such as speech emotion recognition, have categorized speech emotions into a fixed set

Externí odkaz: http://arxiv.org/abs/2312.10381

Zobrazit plný text záznamu

Report

Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation

Autor: Zhu, Jiaxu, Tong, Weinan, Xu, Yaoxun, Song, Changhe, Wu, Zhiyong, You, Zhao, Su, Dan, Yu, Dong, Meng, Helen

Mapping two modalities, speech and text, into a shared representation space, is a research topic of using text-only data to improve end-to-end automatic speech recognition (ASR) performance in new domains. However, the length of speech representation

Externí odkaz: http://arxiv.org/abs/2309.02459

Zobrazit plný text záznamu

Report

CB-Conformer: Contextual biasing Conformer for biased word recognition

Autor: Xu, Yaoxun, Liu, Baiji, and, Qiaochu Huang, Song, Xingchen, Wu, Zhiyong, Kang, Shiyin, Meng, Helen

Due to the mismatch between the source and target domains, how to better utilize the biased word information to improve the performance of the automatic speech recognition model in the target domain becomes a hot research topic. Previous approaches e

Externí odkaz: http://arxiv.org/abs/2304.09607

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání