Výsledky vyhledávání - "Povey, Daniel"

Report

Multi-Channel Multi-Speaker ASR Using Target Speaker's Solo Segment

Autor: Shao, Yiwen, Zhang, Shi-Xiong, Xu, Yong, Yu, Meng, Yu, Dong, Povey, Daniel, Khudanpur, Sanjeev

In the field of multi-channel, multi-speaker Automatic Speech Recognition (ASR), the task of discerning and accurately transcribing a target speaker's speech within background noise remains a formidable challenge. Traditional approaches often rely on

Externí odkaz: http://arxiv.org/abs/2406.09589

Zobrazit plný text záznamu

Report

SUBLLM: A Novel Efficient Architecture with Token Sequence Subsampling for LLM

Autor: Wang, Quandong, Yuan, Yuxuan, Yang, Xiaoyu, Zhang, Ruike, Zhao, Kang, Liu, Wei, Luan, Jian, Povey, Daniel, Wang, Bin

While Large Language Models (LLMs) have achieved remarkable success in various fields, the efficiency of training and inference remains a major challenge. To address this issue, we propose SUBLLM, short for Subsampling-Upsampling-Bypass Large Languag

Externí odkaz: http://arxiv.org/abs/2406.06571

Zobrazit plný text záznamu

Report

Less Peaky and More Accurate CTC Forced Alignment by Label Priors

Autor: Huang, Ruizhe, Zhang, Xiaohui, Ni, Zhaoheng, Sun, Li, Hira, Moto, Hwang, Jeff, Manohar, Vimal, Pratap, Vineel, Wiesner, Matthew, Watanabe, Shinji, Povey, Daniel, Khudanpur, Sanjeev

Connectionist temporal classification (CTC) models are known to have peaky output distributions. Such behavior is not a problem for automatic speech recognition (ASR), but it can cause inaccurate forced alignments (FA), especially at finer granularit

Externí odkaz: http://arxiv.org/abs/2406.02560

Zobrazit plný text záznamu

Report

On Speaker Attribution with SURT

Autor: Raj, Desh, Wiesner, Matthew, Maciejewski, Matthew, Garcia-Perera, Leibny Paola, Povey, Daniel, Khudanpur, Sanjeev

The Streaming Unmixing and Recognition Transducer (SURT) has recently become a popular framework for continuous, streaming, multi-talker speech recognition (ASR). With advances in architecture, objectives, and mixture simulation methods, it was demon

Externí odkaz: http://arxiv.org/abs/2401.15676

Zobrazit plný text záznamu

Report

Zipformer: A faster and better encoder for automatic speech recognition

Autor: Yao, Zengwei, Guo, Liyong, Yang, Xiaoyu, Kang, Wei, Kuang, Fangjun, Yang, Yifan, Jin, Zengrui, Lin, Long, Povey, Daniel

The Conformer has become the most popular encoder model for automatic speech recognition (ASR). It adds convolution modules to a transformer to learn both local and global dependencies. In this work we describe a faster, more memory-efficient, and be

Externí odkaz: http://arxiv.org/abs/2310.11230

Zobrazit plný text záznamu

Report

Learning from Flawed Data: Weakly Supervised Automatic Speech Recognition

Autor: Gao, Dongji, Xu, Hainan, Raj, Desh, Perera, Leibny Paola Garcia, Povey, Daniel, Khudanpur, Sanjeev

Training automatic speech recognition (ASR) systems requires large amounts of well-curated paired data. However, human annotators usually perform "non-verbatim" transcription, which can result in poorly trained models. In this paper, we propose Omni-

Externí odkaz: http://arxiv.org/abs/2309.15796

Zobrazit plný text záznamu

Report

Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context

Autor: Kang, Wei, Yang, Xiaoyu, Yao, Zengwei, Kuang, Fangjun, Yang, Yifan, Guo, Liyong, Lin, Long, Povey, Daniel

In this paper, we introduce Libriheavy, a large-scale ASR corpus consisting of 50,000 hours of read English speech derived from LibriVox. To the best of our knowledge, Libriheavy is the largest freely-available corpus of speech with supervisions. Dif

Externí odkaz: http://arxiv.org/abs/2309.08105

Zobrazit plný text záznamu

Report

Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS

Autor: Yang, Yifan, Shen, Feiyu, Du, Chenpeng, Ma, Ziyang, Yu, Kai, Povey, Daniel, Chen, Xie

Self-supervised learning (SSL) proficiency in speech-related tasks has driven research into utilizing discrete tokens for speech tasks like recognition and translation, which offer lower storage requirements and great potential to employ natural lang

Externí odkaz: http://arxiv.org/abs/2309.07377

Zobrazit plný text záznamu

Report

PromptASR for contextualized ASR with controllable style

Autor: Yang, Xiaoyu, Kang, Wei, Yao, Zengwei, Yang, Yifan, Guo, Liyong, Kuang, Fangjun, Lin, Long, Povey, Daniel

Prompts are crucial to large language models as they provide context information such as topic or logical relationships. Inspired by this, we propose PromptASR, a framework that integrates prompts in end-to-end automatic speech recognition (E2E ASR)

Externí odkaz: http://arxiv.org/abs/2309.07414

Zobrazit plný text záznamu

Report

Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech Recognition

Autor: Zhu, Han, Gao, Dongji, Cheng, Gaofeng, Povey, Daniel, Zhang, Pengyuan, Yan, Yonghong

When labeled data is insufficient, semi-supervised learning with the pseudo-labeling technique can significantly improve the performance of automatic speech recognition. However, pseudo-labels are often noisy, containing numerous incorrect tokens. Ta

Externí odkaz: http://arxiv.org/abs/2308.06547

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání