Výsledky vyhledávání - "Wu, Chunyang"

Report

Speech ReaLLM -- Real-time Streaming Speech Recognition with Multimodal LLMs by Teaching the Flow of Time

Autor: Seide, Frank, Doulaty, Morrie, Shi, Yangyang, Gaur, Yashesh, Jia, Junteng, Wu, Chunyang

We introduce Speech ReaLLM, a new ASR architecture that marries "decoder-only" ASR with the RNN-T to make multimodal LLM architectures capable of real-time streaming. This is the first "decoder-only" ASR architecture designed to handle continuous aud

Externí odkaz: http://arxiv.org/abs/2406.09569

Zobrazit plný text záznamu

Report

Effective internal language model training and fusion for factorized transducer model

Autor: Guo, Jinxi, Moritz, Niko, Ma, Yingyi, Seide, Frank, Wu, Chunyang, Mahadeokar, Jay, Kalinli, Ozlem, Fuegen, Christian, Seltzer, Mike

The internal language model (ILM) of the neural transducer has been widely studied. In most prior work, it is mainly used for estimating the ILM score and is subsequently subtracted during inference to facilitate improved integration with external la

Externí odkaz: http://arxiv.org/abs/2404.01716

Zobrazit plný text záznamu

Report

AudioChatLlama: Towards General-Purpose Speech Abilities for LLMs

Autor: Fathullah, Yassir, Wu, Chunyang, Lakomkin, Egor, Li, Ke, Jia, Junteng, Shangguan, Yuan, Mahadeokar, Jay, Kalinli, Ozlem, Fuegen, Christian, Seltzer, Mike

In this work, we extend the instruction-tuned Llama-2 model with end-to-end general-purpose speech processing and reasoning abilities while maintaining the wide range of original LLM capabilities, without using any carefully curated paired data. The

Externí odkaz: http://arxiv.org/abs/2311.06753

Zobrazit plný text záznamu

Report

Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model

Autor: Xie, Jiamin, Li, Ke, Guo, Jinxi, Tjandra, Andros, Shangguan, Yuan, Sari, Leda, Wu, Chunyang, Jia, Junteng, Mahadeokar, Jay, Kalinli, Ozlem

Neural network pruning offers an effective method for compressing a multilingual automatic speech recognition (ASR) model with minimal performance loss. However, it entails several rounds of pruning and re-training needed to be run for each language.

Externí odkaz: http://arxiv.org/abs/2309.13018

Zobrazit plný text záznamu

Report

End-to-End Speech Recognition Contextualization with Large Language Models

Autor: Lakomkin, Egor, Wu, Chunyang, Fathullah, Yassir, Kalinli, Ozlem, Seltzer, Michael L., Fuegen, Christian

In recent years, Large Language Models (LLMs) have garnered significant attention from the research community due to their exceptional performance and generalization capabilities. In this paper, we introduce a novel method for contextualizing speech

Externí odkaz: http://arxiv.org/abs/2309.10917

Zobrazit plný text záznamu

Report

TODM: Train Once Deploy Many Efficient Supernet-Based RNN-T Compression For On-device ASR Models

Autor: Shangguan, Yuan, Yang, Haichuan, Li, Danni, Wu, Chunyang, Fathullah, Yassir, Wang, Dilin, Dalmia, Ayushi, Krishnamoorthi, Raghuraman, Kalinli, Ozlem, Jia, Junteng, Mahadeokar, Jay, Lei, Xin, Seltzer, Mike, Chandra, Vikas

Automatic Speech Recognition (ASR) models need to be optimized for specific hardware before they can be deployed on devices. This can be done by tuning the model's hyperparameters or exploring variations in its architecture. Re-training and re-valida

Externí odkaz: http://arxiv.org/abs/2309.01947

Zobrazit plný text záznamu

Report

Prompting Large Language Models with Speech Recognition Abilities

Autor: Fathullah, Yassir, Wu, Chunyang, Lakomkin, Egor, Jia, Junteng, Shangguan, Yuan, Li, Ke, Guo, Jinxi, Xiong, Wenhan, Mahadeokar, Jay, Kalinli, Ozlem, Fuegen, Christian, Seltzer, Mike

Large language models have proven themselves highly flexible, able to solve a wide range of generative tasks, such as abstractive summarization and open-ended question answering. In this paper we extend the capabilities of LLMs by directly attaching

Externí odkaz: http://arxiv.org/abs/2307.11795

Zobrazit plný text záznamu

Report

Towards Selection of Text-to-speech Data to Augment ASR Training

Autor: Liu, Shuo, Sarı, Leda, Wu, Chunyang, Keren, Gil, Shangguan, Yuan, Mahadeokar, Jay, Kalinli, Ozlem

This paper presents a method for selecting appropriate synthetic speech samples from a given large text-to-speech (TTS) dataset as supplementary training data for an automatic speech recognition (ASR) model. We trained a neural network, which can be

Externí odkaz: http://arxiv.org/abs/2306.00998

Zobrazit plný text záznamu

Report

Multi-Head State Space Model for Speech Recognition

Autor: Fathullah, Yassir, Wu, Chunyang, Shangguan, Yuan, Jia, Junteng, Xiong, Wenhan, Mahadeokar, Jay, Liu, Chunxi, Shi, Yangyang, Kalinli, Ozlem, Seltzer, Mike, Gales, Mark J. F.

State space models (SSMs) have recently shown promising results on small-scale sequence and language modelling tasks, rivalling and outperforming many attention-based approaches. In this paper, we propose a multi-head state space (MH-SSM) architectur

Externí odkaz: http://arxiv.org/abs/2305.12498

Zobrazit plný text záznamu

Report

Anchored Speech Recognition with Neural Transducers

Autor: Raj, Desh, Jia, Junteng, Mahadeokar, Jay, Wu, Chunyang, Moritz, Niko, Zhang, Xiaohui, Kalinli, Ozlem

Neural transducers have achieved human level performance on standard speech recognition benchmarks. However, their performance significantly degrades in the presence of cross-talk, especially when the primary speaker has a low signal-to-noise ratio.

Externí odkaz: http://arxiv.org/abs/2210.11588

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání