Zobrazeno 1 - 10
of 592
pro vyhledávání: '"Puvvada P"'
NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts
Autor:
Lin, Yen-Ting, Yang, Chao-Han Huck, Chen, Zhehuai, Zelasko, Piotr, Yang, Xuesong, Chen, Zih-Ching, Puvvada, Krishna C, Fu, Szu-Wei, Hu, Ke, Chiu, Jun Wei, Balam, Jagadeesh, Ginsburg, Boris, Wang, Yu-Chiang Frank
Construction of a general-purpose post-recognition error corrector poses a crucial question: how can we most effectively train a model on a large mixture of domain datasets? The answer would lie in learning dataset-specific features and digesting the
Externí odkaz:
http://arxiv.org/abs/2411.05945
Autor:
Peng, Yifan, Puvvada, Krishna C., Chen, Zhehuai, Zelasko, Piotr, Huang, He, Dhawan, Kunal, Hu, Ke, Watanabe, Shinji, Balam, Jagadeesh, Ginsburg, Boris
Recent studies have augmented large language models (LLMs) with speech capabilities, leading to the development of speech language models (SpeechLMs). Earlier SpeechLMs focused on single-turn speech-based question answering (QA), where user input com
Externí odkaz:
http://arxiv.org/abs/2410.17485
Autor:
Zhou, Fang, Huang, Yaning, Liang, Dong, Li, Dai, Zhang, Zhongke, Wang, Kai, Xin, Xiao, Aboelela, Abdallah, Jiang, Zheliang, Wang, Yang, Song, Jeff, Zhang, Wei, Liang, Chen, Li, Huayu, Sun, ChongLin, Yang, Hang, Qu, Lei, Shu, Zhan, Yuan, Mindi, Maccherani, Emanuele, Hayat, Taha, Guo, John, Puvvada, Varna, Pashkevich, Uladzimir
The increasing complexity of deep learning models used for calculating user representations presents significant challenges, particularly with limited computational resources and strict service-level agreements (SLAs). Previous research efforts have
Externí odkaz:
http://arxiv.org/abs/2410.06497
Autor:
Park, Taejin, Medennikov, Ivan, Dhawan, Kunal, Wang, Weiqing, Huang, He, Koluguri, Nithin Rao, Puvvada, Krishna C., Balam, Jagadeesh, Ginsburg, Boris
We propose Sortformer, a novel neural model for speaker diarization, trained with unconventional objectives compared to existing end-to-end diarization models. The permutation problem in speaker diarization has long been regarded as a critical challe
Externí odkaz:
http://arxiv.org/abs/2409.06656
Autor:
Wang, Weiqing, Dhawan, Kunal, Park, Taejin, Puvvada, Krishna C., Medennikov, Ivan, Majumdar, Somshubra, Huang, He, Balam, Jagadeesh, Ginsburg, Boris
Speech foundation models have achieved state-of-the-art (SoTA) performance across various tasks, such as automatic speech recognition (ASR) in hundreds of languages. However, multi-speaker ASR remains a challenging task for these models due to data s
Externí odkaz:
http://arxiv.org/abs/2409.01438
Autor:
Huang, He, Park, Taejin, Dhawan, Kunal, Medennikov, Ivan, Puvvada, Krishna C., Koluguri, Nithin Rao, Wang, Weiqing, Balam, Jagadeesh, Ginsburg, Boris
Self-supervised learning has been proved to benefit a wide range of speech processing tasks, such as speech recognition/translation, speaker verification and diarization, etc. However, most of current approaches are computationally expensive. In this
Externí odkaz:
http://arxiv.org/abs/2408.13106
Autor:
Chen, Zhehuai, Huang, He, Hrinchuk, Oleksii, Puvvada, Krishna C., Koluguri, Nithin Rao, Żelasko, Piotr, Balam, Jagadeesh, Ginsburg, Boris
Incorporating speech understanding capabilities into pretrained large-language models has become a vital research direction (SpeechLLM). The previous architectures can be categorized as: i) GPT-style, prepend speech prompts to the text prompts as a s
Externí odkaz:
http://arxiv.org/abs/2406.19954
Autor:
Puvvada, Krishna C., Żelasko, Piotr, Huang, He, Hrinchuk, Oleksii, Koluguri, Nithin Rao, Dhawan, Kunal, Majumdar, Somshubra, Rastorgueva, Elena, Chen, Zhehuai, Lavrukhin, Vitaly, Balam, Jagadeesh, Ginsburg, Boris
Recent advances in speech recognition and translation rely on hundreds of thousands of hours of Internet speech data. We argue that state-of-the art accuracy can be reached without relying on web-scale data. Canary - multilingual ASR and speech trans
Externí odkaz:
http://arxiv.org/abs/2406.19674
Autor:
Trudeau, A., Gonzalez, Anthony H., Thongkham, K., Lee, Kyoung-Soo, Alberts, Stacey, Brodwin, M., Connor, Thomas, Eisenhardt, Peter R. M., Moravec, Emily, Puvvada, Eshwar, Stanford, S. A.
The evolution of galaxies depends on their masses and local environments; understanding when and how environmental quenching starts to operate remains a challenge. Furthermore, studies of the high-redshift regime have been limited to massive cluster
Externí odkaz:
http://arxiv.org/abs/2406.03633
Humans are adept at leveraging visual cues from lip movements for recognizing speech in adverse listening conditions. Audio-Visual Speech Recognition (AVSR) models follow similar approach to achieve robust speech recognition in noisy conditions. In t
Externí odkaz:
http://arxiv.org/abs/2405.12983