Zobrazeno 1 - 10
of 55
pro vyhledávání: '"Guo, Liyong"'
Autor:
Yao, Zengwei, Kang, Wei, Yang, Xiaoyu, Kuang, Fangjun, Guo, Liyong, Zhu, Han, Jin, Zengrui, Li, Zhaoqing, Lin, Long, Povey, Daniel
Connectionist Temporal Classification (CTC) is a widely used method for automatic speech recognition (ASR), renowned for its simplicity and computational efficiency. However, it often falls short in recognition performance compared to transducer or s
Externí odkaz:
http://arxiv.org/abs/2410.05101
Autor:
Jin, Zengrui, Yang, Yifan, Shi, Mohan, Kang, Wei, Yang, Xiaoyu, Yao, Zengwei, Kuang, Fangjun, Guo, Liyong, Meng, Lingwei, Lin, Long, Xu, Yong, Zhang, Shi-Xiong, Povey, Daniel
The evolving speech processing landscape is increasingly focused on complex scenarios like meetings or cocktail parties with multiple simultaneous speakers and far-field conditions. Existing methodologies for addressing these challenges fall into two
Externí odkaz:
http://arxiv.org/abs/2409.00819
Autor:
Yao, Zengwei, Guo, Liyong, Yang, Xiaoyu, Kang, Wei, Kuang, Fangjun, Yang, Yifan, Jin, Zengrui, Lin, Long, Povey, Daniel
The Conformer has become the most popular encoder model for automatic speech recognition (ASR). It adds convolution modules to a transformer to learn both local and global dependencies. In this work we describe a faster, more memory-efficient, and be
Externí odkaz:
http://arxiv.org/abs/2310.11230
Autor:
Kang, Wei, Yang, Xiaoyu, Yao, Zengwei, Kuang, Fangjun, Yang, Yifan, Guo, Liyong, Lin, Long, Povey, Daniel
In this paper, we introduce Libriheavy, a large-scale ASR corpus consisting of 50,000 hours of read English speech derived from LibriVox. To the best of our knowledge, Libriheavy is the largest freely-available corpus of speech with supervisions. Dif
Externí odkaz:
http://arxiv.org/abs/2309.08105
Autor:
Yang, Xiaoyu, Kang, Wei, Yao, Zengwei, Yang, Yifan, Guo, Liyong, Kuang, Fangjun, Lin, Long, Povey, Daniel
Prompts are crucial to large language models as they provide context information such as topic or logical relationships. Inspired by this, we propose PromptASR, a framework that integrates prompts in end-to-end automatic speech recognition (E2E ASR)
Externí odkaz:
http://arxiv.org/abs/2309.07414
Autor:
Yang, Yifan, Yang, Xiaoyu, Guo, Liyong, Yao, Zengwei, Kang, Wei, Kuang, Fangjun, Lin, Long, Chen, Xie, Povey, Daniel
Neural Transducer and connectionist temporal classification (CTC) are popular end-to-end automatic speech recognition systems. Due to their frame-synchronous design, blank symbols are introduced to address the length mismatch between acoustic frames
Externí odkaz:
http://arxiv.org/abs/2305.11558
Autor:
Yao, Zengwei, Kang, Wei, Kuang, Fangjun, Guo, Liyong, Yang, Xiaoyu, Yang, Yifan, Lin, Long, Povey, Daniel
Connectionist Temporal Classification (CTC) suffers from the latency problem when applied to streaming models. We argue that in CTC lattice, the alignments that can access more future context are preferred during training, thereby leading to higher s
Externí odkaz:
http://arxiv.org/abs/2305.11539
In this paper, we investigate representation learning for low-resource keyword spotting (KWS). The main challenges of KWS are limited labeled data and limited available device resources. To address those challenges, we explore representation learning
Externí odkaz:
http://arxiv.org/abs/2303.10912
Electroencephalography (EEG) plays a vital role in detecting how brain responses to different stimulus. In this paper, we propose a novel Shallow-Deep Attention-based Network (SDANet) to classify the correct auditory stimulus evoking the EEG signal.
Externí odkaz:
http://arxiv.org/abs/2303.10897
Autor:
Guo, Liyong, Yang, Xiaoyu, Wang, Quandong, Kong, Yuxiang, Yao, Zengwei, Cui, Fan, Kuang, Fangjun, Kang, Wei, Lin, Long, Luo, Mingshuang, Zelasko, Piotr, Povey, Daniel
Knowledge distillation(KD) is a common approach to improve model performance in automatic speech recognition (ASR), where a student model is trained to imitate the output behaviour of a teacher model. However, traditional KD methods suffer from teach
Externí odkaz:
http://arxiv.org/abs/2211.00508