Zobrazeno 1 - 10
of 2 515
pro vyhledávání: '"A. Balam"'
NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts
Autor:
Lin, Yen-Ting, Yang, Chao-Han Huck, Chen, Zhehuai, Zelasko, Piotr, Yang, Xuesong, Chen, Zih-Ching, Puvvada, Krishna C, Fu, Szu-Wei, Hu, Ke, Chiu, Jun Wei, Balam, Jagadeesh, Ginsburg, Boris, Wang, Yu-Chiang Frank
Construction of a general-purpose post-recognition error corrector poses a crucial question: how can we most effectively train a model on a large mixture of domain datasets? The answer would lie in learning dataset-specific features and digesting the
Externí odkaz:
http://arxiv.org/abs/2411.05945
Autor:
Ouyang, Siqi, Hrinchuk, Oleksii, Chen, Zhehuai, Lavrukhin, Vitaly, Balam, Jagadeesh, Li, Lei, Ginsburg, Boris
Simultaneous machine translation (SMT) takes streaming input utterances and incrementally produces target text. Existing SMT methods only use the partial utterance that has already arrived at the input and the generated hypothesis. Motivated by human
Externí odkaz:
http://arxiv.org/abs/2410.22499
Autor:
Peng, Yifan, Puvvada, Krishna C., Chen, Zhehuai, Zelasko, Piotr, Huang, He, Dhawan, Kunal, Hu, Ke, Watanabe, Shinji, Balam, Jagadeesh, Ginsburg, Boris
Recent studies have augmented large language models (LLMs) with speech capabilities, leading to the development of speech language models (SpeechLMs). Earlier SpeechLMs focused on single-turn speech-based question answering (QA), where user input com
Externí odkaz:
http://arxiv.org/abs/2410.17485
Autor:
Lu, Ke-Han, Chen, Zhehuai, Fu, Szu-Wei, Yang, Chao-Han Huck, Balam, Jagadeesh, Ginsburg, Boris, Wang, Yu-Chiang Frank, Lee, Hung-yi
Recent end-to-end speech language models (SLMs) have expanded upon the capabilities of large language models (LLMs) by incorporating pre-trained speech models. However, these SLMs often undergo extensive speech instruction-tuning to bridge the gap be
Externí odkaz:
http://arxiv.org/abs/2409.20007
Autor:
Żelasko, Piotr, Chen, Zhehuai, Wang, Mengru, Galvez, Daniel, Hrinchuk, Oleksii, Ding, Shuoyang, Hu, Ke, Balam, Jagadeesh, Lavrukhin, Vitaly, Ginsburg, Boris
A rising interest in the modality extension of foundation language models warrants discussion on the most effective, and efficient, multimodal training approach. This work focuses on neural machine translation (NMT) and proposes a joint multimodal tr
Externí odkaz:
http://arxiv.org/abs/2409.13523
META-CAT: Speaker-Informed Speech Embeddings via Meta Information Concatenation for Multi-talker ASR
Autor:
Wang, Jinhan, Wang, Weiqing, Dhawan, Kunal, Park, Taejin, Kim, Myungjong, Medennikov, Ivan, Huang, He, Koluguri, Nithin, Balam, Jagadeesh, Ginsburg, Boris
We propose a novel end-to-end multi-talker automatic speech recognition (ASR) framework that enables both multi-speaker (MS) ASR and target-speaker (TS) ASR. Our proposed model is trained in a fully end-to-end manner, incorporating speaker supervisio
Externí odkaz:
http://arxiv.org/abs/2409.12352
Autor:
Hu, Ke, Chen, Zhehuai, Yang, Chao-Han Huck, Żelasko, Piotr, Hrinchuk, Oleksii, Lavrukhin, Vitaly, Balam, Jagadeesh, Ginsburg, Boris
Large language models (LLMs) have demonstrated remarkable advancements in language understanding and generation. Building on the success of text-based LLMs, recent research has adapted these models to use speech embeddings for prompting, resulting in
Externí odkaz:
http://arxiv.org/abs/2409.11538
Autor:
Yang, Chao-Han Huck, Park, Taejin, Gong, Yuan, Li, Yuanchao, Chen, Zhehuai, Lin, Yen-Ting, Chen, Chen, Hu, Yuchen, Dhawan, Kunal, Żelasko, Piotr, Zhang, Chao, Chen, Yun-Nung, Tsao, Yu, Balam, Jagadeesh, Ginsburg, Boris, Siniscalchi, Sabato Marco, Chng, Eng Siong, Bell, Peter, Lai, Catherine, Watanabe, Shinji, Stolcke, Andreas
Given recent advances in generative AI technology, a key question is how large language models (LLMs) can enhance acoustic modeling tasks using text decoding results from a frozen, pretrained automatic speech recognition (ASR) model. To explore new c
Externí odkaz:
http://arxiv.org/abs/2409.09785
Autor:
Park, Taejin, Medennikov, Ivan, Dhawan, Kunal, Wang, Weiqing, Huang, He, Koluguri, Nithin Rao, Puvvada, Krishna C., Balam, Jagadeesh, Ginsburg, Boris
We propose Sortformer, a novel neural model for speaker diarization, trained with unconventional objectives compared to existing end-to-end diarization models. The permutation problem in speaker diarization has long been regarded as a critical challe
Externí odkaz:
http://arxiv.org/abs/2409.06656
Autor:
Koluguri, Nithin Rao, Bartley, Travis, Xu, Hainan, Hrinchuk, Oleksii, Balam, Jagadeesh, Ginsburg, Boris, Kucsko, Georg
This paper presents a new method for training sequence-to-sequence models for speech recognition and translation tasks. Instead of the traditional approach of training models on short segments containing only lowercase or partial punctuation and capi
Externí odkaz:
http://arxiv.org/abs/2409.05601