Zobrazeno 1 - 10
of 20
pro vyhledávání: '"Seltzer, Mike"'
Autor:
Shen, Maohao, Zhang, Shun, Wu, Jilong, Xiu, Zhiping, AlBadawy, Ehab, Lu, Yiting, Seltzer, Mike, He, Qing
Large language models (LLMs) have revolutionized natural language processing (NLP) with impressive performance across various text-based tasks. However, the extension of text-dominant LLMs to with speech generation tasks remains under-explored. In th
Externí odkaz:
http://arxiv.org/abs/2410.20336
Autor:
Guo, Jinxi, Moritz, Niko, Ma, Yingyi, Seide, Frank, Wu, Chunyang, Mahadeokar, Jay, Kalinli, Ozlem, Fuegen, Christian, Seltzer, Mike
The internal language model (ILM) of the neural transducer has been widely studied. In most prior work, it is mainly used for estimating the ILM score and is subsequently subtracted during inference to facilitate improved integration with external la
Externí odkaz:
http://arxiv.org/abs/2404.01716
Autor:
Fathullah, Yassir, Wu, Chunyang, Lakomkin, Egor, Li, Ke, Jia, Junteng, Shangguan, Yuan, Mahadeokar, Jay, Kalinli, Ozlem, Fuegen, Christian, Seltzer, Mike
In this work, we extend the instruction-tuned Llama-2 model with end-to-end general-purpose speech processing and reasoning abilities while maintaining the wide range of original LLM capabilities, without using any carefully curated paired data. The
Externí odkaz:
http://arxiv.org/abs/2311.06753
Autor:
Shangguan, Yuan, Yang, Haichuan, Li, Danni, Wu, Chunyang, Fathullah, Yassir, Wang, Dilin, Dalmia, Ayushi, Krishnamoorthi, Raghuraman, Kalinli, Ozlem, Jia, Junteng, Mahadeokar, Jay, Lei, Xin, Seltzer, Mike, Chandra, Vikas
Automatic Speech Recognition (ASR) models need to be optimized for specific hardware before they can be deployed on devices. This can be done by tuning the model's hyperparameters or exploring variations in its architecture. Re-training and re-valida
Externí odkaz:
http://arxiv.org/abs/2309.01947
Autor:
Fathullah, Yassir, Wu, Chunyang, Lakomkin, Egor, Jia, Junteng, Shangguan, Yuan, Li, Ke, Guo, Jinxi, Xiong, Wenhan, Mahadeokar, Jay, Kalinli, Ozlem, Fuegen, Christian, Seltzer, Mike
Large language models have proven themselves highly flexible, able to solve a wide range of generative tasks, such as abstractive summarization and open-ended question answering. In this paper we extend the capabilities of LLMs by directly attaching
Externí odkaz:
http://arxiv.org/abs/2307.11795
Autor:
Fathullah, Yassir, Wu, Chunyang, Shangguan, Yuan, Jia, Junteng, Xiong, Wenhan, Mahadeokar, Jay, Liu, Chunxi, Shi, Yangyang, Kalinli, Ozlem, Seltzer, Mike, Gales, Mark J. F.
State space models (SSMs) have recently shown promising results on small-scale sequence and language modelling tasks, rivalling and outperforming many attention-based approaches. In this paper, we propose a multi-head state space (MH-SSM) architectur
Externí odkaz:
http://arxiv.org/abs/2305.12498
Autor:
Liang, Dawei, Su, Hang, Singh, Tarun, Mahadeokar, Jay, Puri, Shanil, Zhu, Jiedan, Thomaz, Edison, Seltzer, Mike
Interactive voice assistants have been widely used as input interfaces in various scenarios, e.g. on smart homes devices, wearables and on AR devices. Detecting the end of a speech query, i.e. speech end-pointing, is an important task for voice assis
Externí odkaz:
http://arxiv.org/abs/2210.14252
Autor:
Shi, Yangyang, Wu, Chunyang, Wang, Dilin, Xiao, Alex, Mahadeokar, Jay, Zhang, Xiaohui, Liu, Chunxi, Li, Ke, Shangguan, Yuan, Nagaraja, Varun, Kalinli, Ozlem, Seltzer, Mike
This paper improves the streaming transformer transducer for speech recognition by using non-causal convolution. Many works apply the causal convolution to improve streaming transformer ignoring the lookahead context. We propose to use non-causal con
Externí odkaz:
http://arxiv.org/abs/2110.05241
Autor:
Liang, Dawei, Shi, Yangyang, Wang, Yun, Singhal, Nayan, Xiao, Alex, Shaw, Jonathan, Thomaz, Edison, Kalinli, Ozlem, Seltzer, Mike
Detection of common events and scenes from audio is useful for extracting and understanding human contexts in daily life. Prior studies have shown that leveraging knowledge from a relevant domain is beneficial for a target acoustic event detection (A
Externí odkaz:
http://arxiv.org/abs/2110.03174
Autor:
Zhang, Xiaohui, Manohar, Vimal, Zhang, David, Zhang, Frank, Shi, Yangyang, Singhal, Nayan, Chan, Julian, Peng, Fuchun, Saraf, Yatharth, Seltzer, Mike
Hybrid automatic speech recognition (ASR) models are typically sequentially trained with CTC or LF-MMI criteria. However, they have vastly different legacies and are usually implemented in different frameworks. In this paper, by decoupling the concep
Externí odkaz:
http://arxiv.org/abs/2107.04154