Výsledky vyhledávání - "Koluguri, Nithin Rao"

Report

Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens

Autor: Park, Taejin, Medennikov, Ivan, Dhawan, Kunal, Wang, Weiqing, Huang, He, Koluguri, Nithin Rao, Puvvada, Krishna C., Balam, Jagadeesh, Ginsburg, Boris

We propose Sortformer, a novel neural model for speaker diarization, trained with unconventional objectives compared to existing end-to-end diarization models. The permutation problem in speaker diarization has long been regarded as a critical challe

Externí odkaz: http://arxiv.org/abs/2409.06656

Zobrazit plný text záznamu

Report

Longer is (Not Necessarily) Stronger: Punctuated Long-Sequence Training for Enhanced Speech Recognition and Translation

Autor: Koluguri, Nithin Rao, Bartley, Travis, Xu, Hainan, Hrinchuk, Oleksii, Balam, Jagadeesh, Ginsburg, Boris, Kucsko, Georg

This paper presents a new method for training sequence-to-sequence models for speech recognition and translation tasks. Instead of the traditional approach of training models on short segments containing only lowercase or partial punctuation and capi

Externí odkaz: http://arxiv.org/abs/2409.05601

Zobrazit plný text záznamu

Report

NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks

Autor: Huang, He, Park, Taejin, Dhawan, Kunal, Medennikov, Ivan, Puvvada, Krishna C., Koluguri, Nithin Rao, Wang, Weiqing, Balam, Jagadeesh, Ginsburg, Boris

Self-supervised learning has been proved to benefit a wide range of speech processing tasks, such as speech recognition/translation, speaker verification and diarization, etc. However, most of current approaches are computationally expensive. In this

Externí odkaz: http://arxiv.org/abs/2408.13106

Zobrazit plný text záznamu

Report

Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations

Autor: Dhawan, Kunal, Koluguri, Nithin Rao, Jukić, Ante, Langman, Ryan, Balam, Jagadeesh, Ginsburg, Boris

Publikováno v: Proceedings of Interspeech 2024

Discrete speech representations have garnered recent attention for their efficacy in training transformer-based models for various speech-related tasks such as automatic speech recognition (ASR), translation, speaker verification, and joint speech-te

Externí odkaz: http://arxiv.org/abs/2407.03495

Zobrazit plný text záznamu

Report

BESTOW: Efficient and Streamable Speech Language Model with the Best of Two Worlds in GPT and T5

Autor: Chen, Zhehuai, Huang, He, Hrinchuk, Oleksii, Puvvada, Krishna C., Koluguri, Nithin Rao, Żelasko, Piotr, Balam, Jagadeesh, Ginsburg, Boris

Incorporating speech understanding capabilities into pretrained large-language models has become a vital research direction (SpeechLLM). The previous architectures can be categorized as: i) GPT-style, prepend speech prompts to the text prompts as a s

Externí odkaz: http://arxiv.org/abs/2406.19954

Zobrazit plný text záznamu

Report

Less is More: Accurate Speech Recognition & Translation without Web-Scale Data

Autor: Puvvada, Krishna C., Żelasko, Piotr, Huang, He, Hrinchuk, Oleksii, Koluguri, Nithin Rao, Dhawan, Kunal, Majumdar, Somshubra, Rastorgueva, Elena, Chen, Zhehuai, Lavrukhin, Vitaly, Balam, Jagadeesh, Ginsburg, Boris

Recent advances in speech recognition and translation rely on hundreds of thousands of hours of Internet speech data. We argue that state-of-the art accuracy can be reached without relying on web-scale data. Canary - multilingual ASR and speech trans

Externí odkaz: http://arxiv.org/abs/2406.19674

Zobrazit plný text záznamu

Report

Spectral Codecs: Spectrogram-Based Audio Codecs for High Quality Speech Synthesis

Autor: Langman, Ryan, Jukić, Ante, Dhawan, Kunal, Koluguri, Nithin Rao, Ginsburg, Boris

Historically, most speech models in machine-learning have used the mel-spectrogram as a speech representation. Recently, discrete audio tokens produced by neural audio codecs have become a popular alternate speech representation for speech synthesis

Externí odkaz: http://arxiv.org/abs/2406.05298

Zobrazit plný text záznamu

Report

Discrete Audio Representation as an Alternative to Mel-Spectrograms for Speaker and Speech Recognition

Autor: Puvvada, Krishna C., Koluguri, Nithin Rao, Dhawan, Kunal, Balam, Jagadeesh, Ginsburg, Boris

Discrete audio representation, aka audio tokenization, has seen renewed interest driven by its potential to facilitate the application of text language modeling approaches in audio domain. To this end, various compression and representation-learning

Externí odkaz: http://arxiv.org/abs/2309.10922

Zobrazit plný text záznamu

Report

Investigating End-to-End ASR Architectures for Long Form Audio Transcription

Autor: Koluguri, Nithin Rao, Kriman, Samuel, Zelenfroind, Georgy, Majumdar, Somshubra, Rekesh, Dima, Noroozi, Vahid, Balam, Jagadeesh, Ginsburg, Boris

This paper presents an overview and evaluation of some of the end-to-end ASR models on long-form audios. We study three categories of Automatic Speech Recognition(ASR) models based on their core architecture: (1) convolutional, (2) convolutional with

Externí odkaz: http://arxiv.org/abs/2309.09950

Zobrazit plný text záznamu

Report

Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition

Autor: Rekesh, Dima, Koluguri, Nithin Rao, Kriman, Samuel, Majumdar, Somshubra, Noroozi, Vahid, Huang, He, Hrinchuk, Oleksii, Puvvada, Krishna, Kumar, Ankur, Balam, Jagadeesh, Ginsburg, Boris

Conformer-based models have become the dominant end-to-end architecture for speech processing tasks. With the objective of enhancing the conformer architecture for efficient training and inference, we carefully redesigned Conformer with a novel downs

Externí odkaz: http://arxiv.org/abs/2305.05084

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání