Zobrazeno 1 - 10
of 21
pro vyhledávání: '"Koluguri, Nithin Rao"'
Autor:
Park, Taejin, Medennikov, Ivan, Dhawan, Kunal, Wang, Weiqing, Huang, He, Koluguri, Nithin Rao, Puvvada, Krishna C., Balam, Jagadeesh, Ginsburg, Boris
We propose Sortformer, a novel neural model for speaker diarization, trained with unconventional objectives compared to existing end-to-end diarization models. The permutation problem in speaker diarization has long been regarded as a critical challe
Externí odkaz:
http://arxiv.org/abs/2409.06656
Autor:
Koluguri, Nithin Rao, Bartley, Travis, Xu, Hainan, Hrinchuk, Oleksii, Balam, Jagadeesh, Ginsburg, Boris, Kucsko, Georg
This paper presents a new method for training sequence-to-sequence models for speech recognition and translation tasks. Instead of the traditional approach of training models on short segments containing only lowercase or partial punctuation and capi
Externí odkaz:
http://arxiv.org/abs/2409.05601
Autor:
Huang, He, Park, Taejin, Dhawan, Kunal, Medennikov, Ivan, Puvvada, Krishna C., Koluguri, Nithin Rao, Wang, Weiqing, Balam, Jagadeesh, Ginsburg, Boris
Self-supervised learning has been proved to benefit a wide range of speech processing tasks, such as speech recognition/translation, speaker verification and diarization, etc. However, most of current approaches are computationally expensive. In this
Externí odkaz:
http://arxiv.org/abs/2408.13106
Autor:
Dhawan, Kunal, Koluguri, Nithin Rao, Jukić, Ante, Langman, Ryan, Balam, Jagadeesh, Ginsburg, Boris
Publikováno v:
Proceedings of Interspeech 2024
Discrete speech representations have garnered recent attention for their efficacy in training transformer-based models for various speech-related tasks such as automatic speech recognition (ASR), translation, speaker verification, and joint speech-te
Externí odkaz:
http://arxiv.org/abs/2407.03495
Autor:
Chen, Zhehuai, Huang, He, Hrinchuk, Oleksii, Puvvada, Krishna C., Koluguri, Nithin Rao, Żelasko, Piotr, Balam, Jagadeesh, Ginsburg, Boris
Incorporating speech understanding capabilities into pretrained large-language models has become a vital research direction (SpeechLLM). The previous architectures can be categorized as: i) GPT-style, prepend speech prompts to the text prompts as a s
Externí odkaz:
http://arxiv.org/abs/2406.19954
Autor:
Puvvada, Krishna C., Żelasko, Piotr, Huang, He, Hrinchuk, Oleksii, Koluguri, Nithin Rao, Dhawan, Kunal, Majumdar, Somshubra, Rastorgueva, Elena, Chen, Zhehuai, Lavrukhin, Vitaly, Balam, Jagadeesh, Ginsburg, Boris
Recent advances in speech recognition and translation rely on hundreds of thousands of hours of Internet speech data. We argue that state-of-the art accuracy can be reached without relying on web-scale data. Canary - multilingual ASR and speech trans
Externí odkaz:
http://arxiv.org/abs/2406.19674
Historically, most speech models in machine-learning have used the mel-spectrogram as a speech representation. Recently, discrete audio tokens produced by neural audio codecs have become a popular alternate speech representation for speech synthesis
Externí odkaz:
http://arxiv.org/abs/2406.05298
Discrete audio representation, aka audio tokenization, has seen renewed interest driven by its potential to facilitate the application of text language modeling approaches in audio domain. To this end, various compression and representation-learning
Externí odkaz:
http://arxiv.org/abs/2309.10922
Autor:
Koluguri, Nithin Rao, Kriman, Samuel, Zelenfroind, Georgy, Majumdar, Somshubra, Rekesh, Dima, Noroozi, Vahid, Balam, Jagadeesh, Ginsburg, Boris
This paper presents an overview and evaluation of some of the end-to-end ASR models on long-form audios. We study three categories of Automatic Speech Recognition(ASR) models based on their core architecture: (1) convolutional, (2) convolutional with
Externí odkaz:
http://arxiv.org/abs/2309.09950
Autor:
Rekesh, Dima, Koluguri, Nithin Rao, Kriman, Samuel, Majumdar, Somshubra, Noroozi, Vahid, Huang, He, Hrinchuk, Oleksii, Puvvada, Krishna, Kumar, Ankur, Balam, Jagadeesh, Ginsburg, Boris
Conformer-based models have become the dominant end-to-end architecture for speech processing tasks. With the objective of enhancing the conformer architecture for efficient training and inference, we carefully redesigned Conformer with a novel downs
Externí odkaz:
http://arxiv.org/abs/2305.05084