Výsledky vyhledávání

Report

Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data

Autor: Ao, Junyi, Zhang, Ziqiang, Zhou, Long, Liu, Shujie, Li, Haizhou, Ko, Tom, Dai, Lirong, Li, Jinyu, Qian, Yao, Wei, Furu

This paper studies a novel pre-training technique with unpaired speech data, Speech2C, for encoder-decoder based automatic speech recognition (ASR). Within a multi-task learning framework, we introduce two pre-training tasks for the encoder-decoder n

Externí odkaz: http://arxiv.org/abs/2203.17113

Zobrazit plný text záznamu

Report

Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings

Autor: Kanda, Naoyuki, Wu, Jian, Wu, Yu, Xiao, Xiong, Meng, Zhong, Wang, Xiaofei, Gaur, Yashesh, Chen, Zhuo, Li, Jinyu, Yoshioka, Takuya

This paper presents a streaming speaker-attributed automatic speech recognition (SA-ASR) model that can recognize ``who spoke what'' with low latency even when multiple people are speaking simultaneously. Our model is based on token-level serialized

Externí odkaz: http://arxiv.org/abs/2203.16685

Zobrazit plný text záznamu

Report

Towards Contextual Spelling Correction for Customization of End-to-end Speech Recognition Systems

Autor: Wang, Xiaoqiang, Liu, Yanqing, Li, Jinyu, Miljanic, Veljko, Zhao, Sheng, Khalil, Hosam

Contextual biasing is an important and challenging task for end-to-end automatic speech recognition (ASR) systems, which aims to achieve better recognition performance by biasing the ASR system to particular context phrases such as person names, musi

Externí odkaz: http://arxiv.org/abs/2203.00888

Zobrazit plný text záznamu

Report

Streaming Multi-Talker ASR with Token-Level Serialized Output Training

Autor: Kanda, Naoyuki, Wu, Jian, Wu, Yu, Xiao, Xiong, Meng, Zhong, Wang, Xiaofei, Gaur, Yashesh, Chen, Zhuo, Li, Jinyu, Yoshioka, Takuya

This paper proposes a token-level serialized output training (t-SOT), a novel framework for streaming multi-talker automatic speech recognition (ASR). Unlike existing streaming multi-talker ASR models using multiple output branches, the t-SOT model h

Externí odkaz: http://arxiv.org/abs/2202.00842

Zobrazit plný text záznamu

Report

Endpoint Detection for Streaming End-to-End Multi-talker ASR

Autor: Lu, Liang, Li, Jinyu, Gong, Yifan

Streaming end-to-end multi-talker speech recognition aims at transcribing the overlapped speech from conversations or meetings with an all-neural model in a streaming fashion, which is fundamentally different from a modular-based approach that usuall

Externí odkaz: http://arxiv.org/abs/2201.09979

Zobrazit plný text záznamu

Akademický článek

Modeling and influence on effective thermal conductivity of woven fabrics based on structure parameters

Autor: Yang, Yunchu, Wang, Hengyu, Yan, Hangyu, Ni, Yunfeng, Li, Jinyu

Publikováno v: International Journal of Clothing Science and Technology, 2023, Vol. 35, Issue 6, pp. 938-951.

Externí odkaz: http://www.emeraldinsight.com/doi/10.1108/IJCST-12-2021-0180

Zobrazit plný text záznamu

Report

Self-Supervised Learning for speech recognition with Intermediate layer supervision

Autor: Wang, Chengyi, Wu, Yu, Chen, Sanyuan, Liu, Shujie, Li, Jinyu, Qian, Yao, Yang, Zhenglu

Recently, pioneer work finds that speech pre-trained models can solve full-stack speech processing tasks, because the model utilizes bottom layers to learn speaker-related information and top layers to encode content-related information. Since the ne

Externí odkaz: http://arxiv.org/abs/2112.08778

Zobrazit plný text záznamu

Report

Sequence-level self-learning with multiple hypotheses

Autor: Kumatani, Kenichi, Dimitriadis, Dimitrios, Gaur, Yashesh, Gmyr, Robert, Eskimez, Sefik Emre, Li, Jinyu, Zeng, Michael

Publikováno v: Proc. Interspeech 2020, page 3775-3779

In this work, we develop new self-learning techniques with an attention-based sequence-to-sequence (seq2seq) model for automatic speech recognition (ASR). For untranscribed speech data, the hypothesis from an ASR system must be used as a label. Howev

Externí odkaz: http://arxiv.org/abs/2112.05826

Zobrazit plný text záznamu

Report

Recent Advances in End-to-End Automatic Speech Recognition

Autor: Li, Jinyu

Recently, the speech community is seeing a significant trend of moving from deep neural network based hybrid modeling to end-to-end (E2E) modeling for automatic speech recognition (ASR). While E2E models achieve the state-of-the-art results in most b

Externí odkaz: http://arxiv.org/abs/2111.01690

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání