Výsledky vyhledávání

Report

t-SOT FNT: Streaming Multi-talker ASR with Text-only Domain Adaptation Capability

Autor: Wu, Jian, Kanda, Naoyuki, Yoshioka, Takuya, Zhao, Rui, Chen, Zhuo, Li, Jinyu

Token-level serialized output training (t-SOT) was recently proposed to address the challenge of streaming multi-talker automatic speech recognition (ASR). T-SOT effectively handles overlapped speech by representing multi-talker transcriptions as a s

Externí odkaz: http://arxiv.org/abs/2309.08131

Zobrazit plný text záznamu

Report

DiariST: Streaming Speech Translation with Speaker Diarization

Autor: Yang, Mu, Kanda, Naoyuki, Wang, Xiaofei, Chen, Junkun, Wang, Peidong, Xue, Jian, Li, Jinyu, Yoshioka, Takuya

End-to-end speech translation (ST) for conversation recordings involves several under-explored challenges such as speaker diarization (SD) without accurate word time stamps and handling of overlapping speech in a streaming fashion. In this work, we p

Externí odkaz: http://arxiv.org/abs/2309.08007

Zobrazit plný text záznamu

Report

SpeechX: Neural Codec Language Model as a Versatile Speech Transformer

Autor: Wang, Xiaofei, Thakker, Manthan, Chen, Zhuo, Kanda, Naoyuki, Eskimez, Sefik Emre, Chen, Sanyuan, Tang, Min, Liu, Shujie, Li, Jinyu, Yoshioka, Takuya

Recent advancements in generative speech models based on audio-text prompts have enabled remarkable innovations like high-quality zero-shot text-to-speech. However, existing models still face limitations in handling diverse audio-text speech generati

Externí odkaz: http://arxiv.org/abs/2308.06873

Zobrazit plný text záznamu

Report

Deepsea: A Meta-ocean Prototype for Undersea Exploration

Autor: Li, Jinyu, Hu, Ping, Cui, Weicheng, Huang, Tianyi, Cheng, Shenghui

Metaverse has attracted great attention from industry and academia in recent years. Metaverse for the ocean (Meta-ocean) is the implementation of the Metaverse technologies in virtual emersion of the ocean which is beneficial for people yearning for

Externí odkaz: http://arxiv.org/abs/2308.05901

Zobrazit plný text záznamu

Report

Pre-training End-to-end ASR Models with Augmented Speech Samples Queried by Text

Autor: Sun, Eric, Li, Jinyu, Xue, Jian, Gong, Yifan

In end-to-end automatic speech recognition system, one of the difficulties for language expansion is the limited paired speech and text training data. In this paper, we propose a novel method to generate augmented samples with unpaired speech feature

Externí odkaz: http://arxiv.org/abs/2307.16332

Zobrazit plný text záznamu

Report

On decoder-only architecture for speech-to-text and large language model integration

Autor: Wu, Jian, Gaur, Yashesh, Chen, Zhuo, Zhou, Long, Zhu, Yimeng, Wang, Tianrui, Li, Jinyu, Liu, Shujie, Ren, Bo, Liu, Linquan, Wu, Yu

Large language models (LLMs) have achieved remarkable success in the field of natural language processing, enabling better human-computer interaction using natural language. However, the seamless integration of speech signals into LLMs has not been e

Externí odkaz: http://arxiv.org/abs/2307.03917

Zobrazit plný text záznamu

Report

Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments

Autor: Papi, Sara, Wang, Peidong, Chen, Junkun, Xue, Jian, Li, Jinyu, Gaur, Yashesh

In real-world applications, users often require both translations and transcriptions of speech to enhance their comprehension, particularly in streaming scenarios where incremental generation is necessary. This paper introduces a streaming Transforme

Externí odkaz: http://arxiv.org/abs/2307.03354

Zobrazit plný text záznamu

Report

Accelerating Transducers through Adjacent Token Merging

Autor: Li, Yuang, Wu, Yu, Li, Jinyu, Liu, Shujie

Recent end-to-end automatic speech recognition (ASR) systems often utilize a Transformer-based acoustic encoder that generates embedding at a high frame rate. However, this design is inefficient, particularly for long speech signals due to the quadra

Externí odkaz: http://arxiv.org/abs/2306.16009

Zobrazit plný text záznamu

Report

Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition

Autor: Li, Yuang, Wu, Yu, Li, Jinyu, Liu, Shujie

The integration of Language Models (LMs) has proven to be an effective way to address domain shifts in speech recognition. However, these approaches usually require a significant amount of target domain text data for the training of LMs. Different fr

Externí odkaz: http://arxiv.org/abs/2306.16007

Zobrazit plný text záznamu

Report

Accurate and Structured Pruning for Efficient Automatic Speech Recognition

Autor: Jiang, Huiqiang, Zhang, Li Lyna, Li, Yuang, Wu, Yu, Cao, Shijie, Cao, Ting, Yang, Yuqing, Li, Jinyu, Yang, Mao, Qiu, Lili

Automatic Speech Recognition (ASR) has seen remarkable advancements with deep neural networks, such as Transformer and Conformer. However, these models typically have large model sizes and high inference costs, posing a challenge to deploy on resourc

Externí odkaz: http://arxiv.org/abs/2305.19549

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání