Zobrazeno 61 - 70
of 2 053
pro vyhledávání: '"Li, Jinyu"'
Autor:
Ao, Junyi, Zhang, Ziqiang, Zhou, Long, Liu, Shujie, Li, Haizhou, Ko, Tom, Dai, Lirong, Li, Jinyu, Qian, Yao, Wei, Furu
This paper studies a novel pre-training technique with unpaired speech data, Speech2C, for encoder-decoder based automatic speech recognition (ASR). Within a multi-task learning framework, we introduce two pre-training tasks for the encoder-decoder n
Externí odkaz:
http://arxiv.org/abs/2203.17113
Autor:
Kanda, Naoyuki, Wu, Jian, Wu, Yu, Xiao, Xiong, Meng, Zhong, Wang, Xiaofei, Gaur, Yashesh, Chen, Zhuo, Li, Jinyu, Yoshioka, Takuya
This paper presents a streaming speaker-attributed automatic speech recognition (SA-ASR) model that can recognize ``who spoke what'' with low latency even when multiple people are speaking simultaneously. Our model is based on token-level serialized
Externí odkaz:
http://arxiv.org/abs/2203.16685
Contextual biasing is an important and challenging task for end-to-end automatic speech recognition (ASR) systems, which aims to achieve better recognition performance by biasing the ASR system to particular context phrases such as person names, musi
Externí odkaz:
http://arxiv.org/abs/2203.00888
Autor:
Kanda, Naoyuki, Wu, Jian, Wu, Yu, Xiao, Xiong, Meng, Zhong, Wang, Xiaofei, Gaur, Yashesh, Chen, Zhuo, Li, Jinyu, Yoshioka, Takuya
This paper proposes a token-level serialized output training (t-SOT), a novel framework for streaming multi-talker automatic speech recognition (ASR). Unlike existing streaming multi-talker ASR models using multiple output branches, the t-SOT model h
Externí odkaz:
http://arxiv.org/abs/2202.00842
Streaming end-to-end multi-talker speech recognition aims at transcribing the overlapped speech from conversations or meetings with an all-neural model in a streaming fashion, which is fundamentally different from a modular-based approach that usuall
Externí odkaz:
http://arxiv.org/abs/2201.09979
Publikováno v:
International Journal of Clothing Science and Technology, 2023, Vol. 35, Issue 6, pp. 938-951.
Externí odkaz:
http://www.emeraldinsight.com/doi/10.1108/IJCST-12-2021-0180
Recently, pioneer work finds that speech pre-trained models can solve full-stack speech processing tasks, because the model utilizes bottom layers to learn speaker-related information and top layers to encode content-related information. Since the ne
Externí odkaz:
http://arxiv.org/abs/2112.08778
Autor:
Kumatani, Kenichi, Dimitriadis, Dimitrios, Gaur, Yashesh, Gmyr, Robert, Eskimez, Sefik Emre, Li, Jinyu, Zeng, Michael
Publikováno v:
Proc. Interspeech 2020, page 3775-3779
In this work, we develop new self-learning techniques with an attention-based sequence-to-sequence (seq2seq) model for automatic speech recognition (ASR). For untranscribed speech data, the hypothesis from an ASR system must be used as a label. Howev
Externí odkaz:
http://arxiv.org/abs/2112.05826
Autor:
Li, Jinyu
Recently, the speech community is seeing a significant trend of moving from deep neural network based hybrid modeling to end-to-end (E2E) modeling for automatic speech recognition (ASR). While E2E models achieve the state-of-the-art results in most b
Externí odkaz:
http://arxiv.org/abs/2111.01690
Autor:
Li, Rui, Penmathsa, Akhil, Sun, Tai, Gallandat, Noris, Li, Jinyu, Park, Jihye, Kim, Han-Jin, Kim, Pyungsoon, Yoon, Narae, Jang, Ji-Hoon, Züttel, Andreas
Publikováno v:
In International Journal of Hydrogen Energy 27 June 2024 72:687-693