Zobrazeno 1 - 10
of 94
pro vyhledávání: '"Povey, Daniel"'
Autor:
Shao, Yiwen, Zhang, Shi-Xiong, Xu, Yong, Yu, Meng, Yu, Dong, Povey, Daniel, Khudanpur, Sanjeev
In the field of multi-channel, multi-speaker Automatic Speech Recognition (ASR), the task of discerning and accurately transcribing a target speaker's speech within background noise remains a formidable challenge. Traditional approaches often rely on
Externí odkaz:
http://arxiv.org/abs/2406.09589
Autor:
Wang, Quandong, Yuan, Yuxuan, Yang, Xiaoyu, Zhang, Ruike, Zhao, Kang, Liu, Wei, Luan, Jian, Povey, Daniel, Wang, Bin
While Large Language Models (LLMs) have achieved remarkable success in various fields, the efficiency of training and inference remains a major challenge. To address this issue, we propose SUBLLM, short for Subsampling-Upsampling-Bypass Large Languag
Externí odkaz:
http://arxiv.org/abs/2406.06571
Autor:
Huang, Ruizhe, Zhang, Xiaohui, Ni, Zhaoheng, Sun, Li, Hira, Moto, Hwang, Jeff, Manohar, Vimal, Pratap, Vineel, Wiesner, Matthew, Watanabe, Shinji, Povey, Daniel, Khudanpur, Sanjeev
Connectionist temporal classification (CTC) models are known to have peaky output distributions. Such behavior is not a problem for automatic speech recognition (ASR), but it can cause inaccurate forced alignments (FA), especially at finer granularit
Externí odkaz:
http://arxiv.org/abs/2406.02560
Autor:
Raj, Desh, Wiesner, Matthew, Maciejewski, Matthew, Garcia-Perera, Leibny Paola, Povey, Daniel, Khudanpur, Sanjeev
The Streaming Unmixing and Recognition Transducer (SURT) has recently become a popular framework for continuous, streaming, multi-talker speech recognition (ASR). With advances in architecture, objectives, and mixture simulation methods, it was demon
Externí odkaz:
http://arxiv.org/abs/2401.15676
Autor:
Yao, Zengwei, Guo, Liyong, Yang, Xiaoyu, Kang, Wei, Kuang, Fangjun, Yang, Yifan, Jin, Zengrui, Lin, Long, Povey, Daniel
The Conformer has become the most popular encoder model for automatic speech recognition (ASR). It adds convolution modules to a transformer to learn both local and global dependencies. In this work we describe a faster, more memory-efficient, and be
Externí odkaz:
http://arxiv.org/abs/2310.11230
Autor:
Gao, Dongji, Xu, Hainan, Raj, Desh, Perera, Leibny Paola Garcia, Povey, Daniel, Khudanpur, Sanjeev
Training automatic speech recognition (ASR) systems requires large amounts of well-curated paired data. However, human annotators usually perform "non-verbatim" transcription, which can result in poorly trained models. In this paper, we propose Omni-
Externí odkaz:
http://arxiv.org/abs/2309.15796
Autor:
Kang, Wei, Yang, Xiaoyu, Yao, Zengwei, Kuang, Fangjun, Yang, Yifan, Guo, Liyong, Lin, Long, Povey, Daniel
In this paper, we introduce Libriheavy, a large-scale ASR corpus consisting of 50,000 hours of read English speech derived from LibriVox. To the best of our knowledge, Libriheavy is the largest freely-available corpus of speech with supervisions. Dif
Externí odkaz:
http://arxiv.org/abs/2309.08105
Self-supervised learning (SSL) proficiency in speech-related tasks has driven research into utilizing discrete tokens for speech tasks like recognition and translation, which offer lower storage requirements and great potential to employ natural lang
Externí odkaz:
http://arxiv.org/abs/2309.07377
Autor:
Yang, Xiaoyu, Kang, Wei, Yao, Zengwei, Yang, Yifan, Guo, Liyong, Kuang, Fangjun, Lin, Long, Povey, Daniel
Prompts are crucial to large language models as they provide context information such as topic or logical relationships. Inspired by this, we propose PromptASR, a framework that integrates prompts in end-to-end automatic speech recognition (E2E ASR)
Externí odkaz:
http://arxiv.org/abs/2309.07414
When labeled data is insufficient, semi-supervised learning with the pseudo-labeling technique can significantly improve the performance of automatic speech recognition. However, pseudo-labels are often noisy, containing numerous incorrect tokens. Ta
Externí odkaz:
http://arxiv.org/abs/2308.06547