Zobrazeno 1 - 10
of 12 506
pro vyhledávání: '"Robust speech recognition"'
Attention-based encoder-decoder, e.g. transformer and its variants, generates the output sequence in an autoregressive (AR) manner. Despite its superior performance, AR model is computationally inefficient as its generation requires as many iteration
Externí odkaz:
http://arxiv.org/abs/2409.17746
Autor:
Wang, Chien-Chun, Chen, Li-Wei, Chou, Cheng-Kang, Lee, Hung-Shin, Chen, Berlin, Wang, Hsin-Min
While pre-trained automatic speech recognition (ASR) systems demonstrate impressive performance on matched domains, their performance often degrades when confronted with channel mismatch stemming from unseen recording environments and conditions. To
Externí odkaz:
http://arxiv.org/abs/2409.12386
Creating Automatic Speech Recognition (ASR) systems that are robust and resilient to classroom conditions is paramount to the development of AI tools to aid teachers and students. In this work, we study the efficacy of continued pretraining (CPT) in
Externí odkaz:
http://arxiv.org/abs/2409.14494
Audio-LLM introduces audio modality into a large language model (LLM) to enable a powerful LLM to recognize, understand, and generate audio. However, during speech recognition in noisy environments, we observed the presence of illusions and repetitio
Externí odkaz:
http://arxiv.org/abs/2408.09491
Autor:
Wang, Kuan-Chen, Li, You-Jin, Chen, Wei-Lun, Chen, Yu-Wen, Wang, Yi-Ching, Yeh, Ping-Cheng, Zhang, Chao, Tsao, Yu
Noise robustness is critical when applying automatic speech recognition (ASR) in real-world scenarios. One solution involves the used of speech enhancement (SE) models as the front end of ASR. However, neural network-based (NN-based) SE often introdu
Externí odkaz:
http://arxiv.org/abs/2406.12699
Autor:
Hu, Yuchen, Chen, Chen, Yang, Chao-Han Huck, Li, Ruizhe, Zhang, Chao, Chen, Pin-Yu, Chng, EnSiong
Recent advances in large language models (LLMs) have promoted generative error correction (GER) for automatic speech recognition (ASR), which leverages the rich linguistic knowledge and powerful reasoning ability of LLMs to improve recognition result
Externí odkaz:
http://arxiv.org/abs/2401.10446
Autor:
Feng, Tiantian, Lin, Ju, Huang, Yiteng, He, Weipeng, Kalgaonkar, Kaustubh, Moritz, Niko, Wan, Li, Lei, Xin, Sun, Ming, Seide, Frank
Modern smart glasses leverage advanced audio sensing and machine learning technologies to offer real-time transcribing and captioning services, considerably enriching human experiences in daily communications. However, such systems frequently encount
Externí odkaz:
http://arxiv.org/abs/2309.10993
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.