Zobrazeno 1 - 10
of 28
pro vyhledávání: '"Ganapathiraju, Aravind"'
Autor:
Kumar, Shashi, Thorbecke, Iuliia, Burdisso, Sergio, Villatoro-Tello, Esaú, E, Manjunath K, Hacioğlu, Kadri, Rangappa, Pradeep, Motlicek, Petr, Ganapathiraju, Aravind, Stolcke, Andreas
Recent research has demonstrated that training a linear connector between speech foundation encoders and large language models (LLMs) enables this architecture to achieve strong ASR capabilities. Despite the impressive results, it remains unclear whe
Externí odkaz:
http://arxiv.org/abs/2411.03866
Autor:
Thorbecke, Iuliia, Zuluaga-Gomez, Juan, Villatoro-Tello, Esaú, Carofilis, Andres, Kumar, Shashi, Motlicek, Petr, Pandia, Karthik, Ganapathiraju, Aravind
Despite the recent success of end-to-end models for automatic speech recognition, recognizing special rare and out-of-vocabulary words, as well as fast domain adaptation with text, are still challenging. It often happens that biasing to the special e
Externí odkaz:
http://arxiv.org/abs/2409.13514
Autor:
Thorbecke, Iuliia, Zuluaga-Gomez, Juan, Villatoro-Tello, Esaú, Kumar, Shashi, Rangappa, Pradeep, Burdisso, Sergio, Motlicek, Petr, Pandia, Karthik, Ganapathiraju, Aravind
The training of automatic speech recognition (ASR) with little to no supervised data remains an open question. In this work, we demonstrate that streaming Transformer-Transducer (TT) models can be trained from scratch in consumer and accessible GPUs
Externí odkaz:
http://arxiv.org/abs/2409.13499
Autor:
Kumar, Shashi, Madikeri, Srikanth, Zuluaga-Gomez, Juan, Thorbecke, Iuliia, Villatoro-Tello, Esaú, Burdisso, Sergio, Motlicek, Petr, Pandia, Karthik, Ganapathiraju, Aravind
In traditional conversational intelligence from speech, a cascaded pipeline is used, involving tasks such as voice activity detection, diarization, transcription, and subsequent processing with different NLP models for tasks like semantic endpointing
Externí odkaz:
http://arxiv.org/abs/2407.04444
Autor:
Kumar, Shashi, Madikeri, Srikanth, Zuluaga-Gomez, Juan, Villatoro-Tello, Esaú, Thorbecke, Iuliia, Motlicek, Petr, E, Manjunath K, Ganapathiraju, Aravind
Self-supervised pretrained models exhibit competitive performance in automatic speech recognition on finetuning, even with limited in-domain supervised data. However, popular pretrained models are not suitable for streaming ASR because they are train
Externí odkaz:
http://arxiv.org/abs/2407.04439
Autor:
Nigmatulina, Iuliia, Madikeri, Srikanth, Villatoro-Tello, Esaú, Motliček, Petr, Zuluaga-Gomez, Juan, Pandia, Karthik, Ganapathiraju, Aravind
GPU decoding significantly accelerates the output of ASR predictions. While GPUs are already being used for online ASR decoding, post-processing and rescoring on GPUs have not been properly investigated yet. Rescoring with available contextual inform
Externí odkaz:
http://arxiv.org/abs/2306.15685
Autor:
Bansal, Lokesh, Dubagunta, S. Pavankumar, Chetlur, Malolan, Jagtap, Pushpak, Ganapathiraju, Aravind
New-age conversational agent systems perform both speech emotion recognition (SER) and automatic speech recognition (ASR) using two separate and often independent approaches for real-world application in noisy environments. In this paper, we investig
Externí odkaz:
http://arxiv.org/abs/2305.12540
Autor:
Villatoro-Tello, Esaú, Madikeri, Srikanth, Zuluaga-Gomez, Juan, Sharma, Bidisha, Sarfjoo, Seyyed Saeed, Nigmatulina, Iuliia, Motlicek, Petr, Ivanov, Alexei V., Ganapathiraju, Aravind
Publikováno v:
ICASSP 2023
In this paper, we perform an exhaustive evaluation of different representations to address the intent classification problem in a Spoken Language Understanding (SLU) setup. We benchmark three types of systems to perform the SLU intent detection task:
Externí odkaz:
http://arxiv.org/abs/2212.08489
The mechanism proposed here is for real-time speaker change detection in conversations, which firstly trains a neural network text-independent speaker classifier using in-domain speaker data. Through the network, features of conversational speech fro
Externí odkaz:
http://arxiv.org/abs/1702.02285
This work presents a novel framework based on feed-forward neural network for text-independent speaker classification and verification, two related systems of speaker recognition. With optimized features and model training, it achieves 100% classific
Externí odkaz:
http://arxiv.org/abs/1702.02289