Zobrazeno 21 - 30
of 2 053
pro vyhledávání: '"Li, Jinyu"'
Token-level serialized output training (t-SOT) was recently proposed to address the challenge of streaming multi-talker automatic speech recognition (ASR). T-SOT effectively handles overlapped speech by representing multi-talker transcriptions as a s
Externí odkaz:
http://arxiv.org/abs/2309.08131
Autor:
Yang, Mu, Kanda, Naoyuki, Wang, Xiaofei, Chen, Junkun, Wang, Peidong, Xue, Jian, Li, Jinyu, Yoshioka, Takuya
End-to-end speech translation (ST) for conversation recordings involves several under-explored challenges such as speaker diarization (SD) without accurate word time stamps and handling of overlapping speech in a streaming fashion. In this work, we p
Externí odkaz:
http://arxiv.org/abs/2309.08007
Autor:
Wang, Xiaofei, Thakker, Manthan, Chen, Zhuo, Kanda, Naoyuki, Eskimez, Sefik Emre, Chen, Sanyuan, Tang, Min, Liu, Shujie, Li, Jinyu, Yoshioka, Takuya
Recent advancements in generative speech models based on audio-text prompts have enabled remarkable innovations like high-quality zero-shot text-to-speech. However, existing models still face limitations in handling diverse audio-text speech generati
Externí odkaz:
http://arxiv.org/abs/2308.06873
Metaverse has attracted great attention from industry and academia in recent years. Metaverse for the ocean (Meta-ocean) is the implementation of the Metaverse technologies in virtual emersion of the ocean which is beneficial for people yearning for
Externí odkaz:
http://arxiv.org/abs/2308.05901
In end-to-end automatic speech recognition system, one of the difficulties for language expansion is the limited paired speech and text training data. In this paper, we propose a novel method to generate augmented samples with unpaired speech feature
Externí odkaz:
http://arxiv.org/abs/2307.16332
Autor:
Wu, Jian, Gaur, Yashesh, Chen, Zhuo, Zhou, Long, Zhu, Yimeng, Wang, Tianrui, Li, Jinyu, Liu, Shujie, Ren, Bo, Liu, Linquan, Wu, Yu
Large language models (LLMs) have achieved remarkable success in the field of natural language processing, enabling better human-computer interaction using natural language. However, the seamless integration of speech signals into LLMs has not been e
Externí odkaz:
http://arxiv.org/abs/2307.03917
In real-world applications, users often require both translations and transcriptions of speech to enhance their comprehension, particularly in streaming scenarios where incremental generation is necessary. This paper introduces a streaming Transforme
Externí odkaz:
http://arxiv.org/abs/2307.03354
Recent end-to-end automatic speech recognition (ASR) systems often utilize a Transformer-based acoustic encoder that generates embedding at a high frame rate. However, this design is inefficient, particularly for long speech signals due to the quadra
Externí odkaz:
http://arxiv.org/abs/2306.16009
The integration of Language Models (LMs) has proven to be an effective way to address domain shifts in speech recognition. However, these approaches usually require a significant amount of target domain text data for the training of LMs. Different fr
Externí odkaz:
http://arxiv.org/abs/2306.16007
Autor:
Jiang, Huiqiang, Zhang, Li Lyna, Li, Yuang, Wu, Yu, Cao, Shijie, Cao, Ting, Yang, Yuqing, Li, Jinyu, Yang, Mao, Qiu, Lili
Automatic Speech Recognition (ASR) has seen remarkable advancements with deep neural networks, such as Transformer and Conformer. However, these models typically have large model sizes and high inference costs, posing a challenge to deploy on resourc
Externí odkaz:
http://arxiv.org/abs/2305.19549