Zobrazeno 1 - 10
of 1 014
pro vyhledávání: '"Zuluaga-Gómez A"'
Autor:
Thorbecke, Iuliia, Zuluaga-Gomez, Juan, Villatoro-Tello, Esaú, Carofilis, Andres, Kumar, Shashi, Motlicek, Petr, Pandia, Karthik, Ganapathiraju, Aravind
Despite the recent success of end-to-end models for automatic speech recognition, recognizing special rare and out-of-vocabulary words, as well as fast domain adaptation with text, are still challenging. It often happens that biasing to the special e
Externí odkaz:
http://arxiv.org/abs/2409.13514
Autor:
Thorbecke, Iuliia, Zuluaga-Gomez, Juan, Villatoro-Tello, Esaú, Kumar, Shashi, Rangappa, Pradeep, Burdisso, Sergio, Motlicek, Petr, Pandia, Karthik, Ganapathiraju, Aravind
The training of automatic speech recognition (ASR) with little to no supervised data remains an open question. In this work, we demonstrate that streaming Transformer-Transducer (TT) models can be trained from scratch in consumer and accessible GPUs
Externí odkaz:
http://arxiv.org/abs/2409.13499
Autor:
Kumar, Shashi, Madikeri, Srikanth, Zuluaga-Gomez, Juan, Thorbecke, Iuliia, Villatoro-Tello, Esaú, Burdisso, Sergio, Motlicek, Petr, Pandia, Karthik, Ganapathiraju, Aravind
In traditional conversational intelligence from speech, a cascaded pipeline is used, involving tasks such as voice activity detection, diarization, transcription, and subsequent processing with different NLP models for tasks like semantic endpointing
Externí odkaz:
http://arxiv.org/abs/2407.04444
Autor:
Kumar, Shashi, Madikeri, Srikanth, Zuluaga-Gomez, Juan, Villatoro-Tello, Esaú, Thorbecke, Iuliia, Motlicek, Petr, E, Manjunath K, Ganapathiraju, Aravind
Self-supervised pretrained models exhibit competitive performance in automatic speech recognition on finetuning, even with limited in-domain supervised data. However, popular pretrained models are not suitable for streaming ASR because they are train
Externí odkaz:
http://arxiv.org/abs/2407.04439
Autor:
Ravanelli, Mirco, Parcollet, Titouan, Moumen, Adel, de Langen, Sylvain, Subakan, Cem, Plantinga, Peter, Wang, Yingzhi, Mousavi, Pooneh, Della Libera, Luca, Ploujnikov, Artem, Paissan, Francesco, Borra, Davide, Zaiem, Salah, Zhao, Zeyu, Zhang, Shucong, Karakasidis, Georgios, Yeh, Sung-Lin, Champion, Pierre, Rouhe, Aku, Braun, Rudolf, Mai, Florian, Zuluaga-Gomez, Juan, Mousavi, Seyed Mahed, Nautsch, Andreas, Nguyen, Ha, Liu, Xuechen, Sagar, Sangeet, Duret, Jarod, Mdhaffar, Salima, Laperriere, Gaelle, Rouvier, Mickael, De Mori, Renato, Esteve, Yannick
SpeechBrain is an open-source Conversational AI toolkit based on PyTorch, focused particularly on speech processing tasks such as speech recognition, speech enhancement, speaker recognition, text-to-speech, and much more. It promotes transparency and
Externí odkaz:
http://arxiv.org/abs/2407.00463
Autor:
Zuluaga-Gomez, Juan, Huang, Zhaocheng, Niu, Xing, Paturi, Rohit, Srinivasan, Sundararajan, Mathur, Prashant, Thompson, Brian, Federico, Marcello
Conventional speech-to-text translation (ST) systems are trained on single-speaker utterances, and they may not generalize to real-life scenarios where the audio contains conversations by multiple speakers. In this paper, we tackle single-channel mul
Externí odkaz:
http://arxiv.org/abs/2311.00697
Autor:
Nigmatulina, Iuliia, Madikeri, Srikanth, Villatoro-Tello, Esaú, Motliček, Petr, Zuluaga-Gomez, Juan, Pandia, Karthik, Ganapathiraju, Aravind
GPU decoding significantly accelerates the output of ASR predictions. While GPUs are already being used for online ASR decoding, post-processing and rescoring on GPUs have not been properly investigated yet. Rescoring with available contextual inform
Externí odkaz:
http://arxiv.org/abs/2306.15685
Despite the recent advancements in Automatic Speech Recognition (ASR), the recognition of accented speech still remains a dominant problem. In order to create more inclusive ASR systems, research has shown that the integration of accent information,
Externí odkaz:
http://arxiv.org/abs/2305.18283
State-of-the-art ASR systems have achieved promising results by modeling local and global interactions separately. While the former can be computed efficiently, global interactions are usually modeled via attention mechanisms, which are expensive for
Externí odkaz:
http://arxiv.org/abs/2305.18281
Autor:
Zuluaga-Gomez, Juan
Breast cancer is one of the most threatening diseases in women's life; thus, the early and accurate diagnosis plays a key role in reducing the risk of death in a patient's life. Mammography stands as the reference technique for breast cancer screenin
Externí odkaz:
http://arxiv.org/abs/2305.02482