Zobrazeno 1 - 10
of 957
pro vyhledávání: '"Zaiem A"'
Despite being trained on massive and diverse datasets, speech self-supervised encoders are generally used for downstream purposes as mere frozen feature extractors or model initializers before fine-tuning. The former severely limits the exploitation
Externí odkaz:
http://arxiv.org/abs/2407.00756
Autor:
Ravanelli, Mirco, Parcollet, Titouan, Moumen, Adel, de Langen, Sylvain, Subakan, Cem, Plantinga, Peter, Wang, Yingzhi, Mousavi, Pooneh, Della Libera, Luca, Ploujnikov, Artem, Paissan, Francesco, Borra, Davide, Zaiem, Salah, Zhao, Zeyu, Zhang, Shucong, Karakasidis, Georgios, Yeh, Sung-Lin, Champion, Pierre, Rouhe, Aku, Braun, Rudolf, Mai, Florian, Zuluaga-Gomez, Juan, Mousavi, Seyed Mahed, Nautsch, Andreas, Nguyen, Ha, Liu, Xuechen, Sagar, Sangeet, Duret, Jarod, Mdhaffar, Salima, Laperriere, Gaelle, Rouvier, Mickael, De Mori, Renato, Esteve, Yannick
SpeechBrain is an open-source Conversational AI toolkit based on PyTorch, focused particularly on speech processing tasks such as speech recognition, speech enhancement, speaker recognition, text-to-speech, and much more. It promotes transparency and
Externí odkaz:
http://arxiv.org/abs/2407.00463
Autor:
Mousavi, Pooneh, Duret, Jarod, Zaiem, Salah, Della Libera, Luca, Ploujnikov, Artem, Subakan, Cem, Ravanelli, Mirco
Discrete audio tokens have recently gained attention for their potential to bridge the gap between audio and language processing. Ideal audio tokens must preserve content, paralinguistic elements, speaker identity, and many other audio details. Curre
Externí odkaz:
http://arxiv.org/abs/2406.10735
Modern multilingual automatic speech recognition (ASR) systems like Whisper have made it possible to transcribe audio in multiple languages with a single model. However, current state-of-the-art ASR models are typically evaluated on individual langua
Externí odkaz:
http://arxiv.org/abs/2310.16931
Recent progress in Automatic Speech Recognition (ASR) has been coupled with a substantial increase in the model sizes, which may now contain billions of parameters, leading to slow inferences even with adapted hardware. In this context, several ASR m
Externí odkaz:
http://arxiv.org/abs/2309.12712
Crafting an effective Automatic Speech Recognition (ASR) solution for dialects demands innovative approaches that not only address the data scarcity issue but also navigate the intricacies of linguistic diversity. In this paper, we address the aforem
Externí odkaz:
http://arxiv.org/abs/2309.11327
Autor:
Wright, George August, Cappellazzo, Umberto, Zaiem, Salah, Raj, Desh, Yang, Lucas Ondel, Falavigna, Daniele, Ali, Mohamed Nabih, Brutti, Alessio
The ability to dynamically adjust the computational load of neural models during inference is crucial for on-device processing scenarios characterised by limited and time-varying computational resources. A promising solution is presented by early-exi
Externí odkaz:
http://arxiv.org/abs/2309.09546
Self-supervised learning (SSL) leverages large datasets of unlabeled speech to reach impressive performance with reduced amounts of annotated data. The high number of proposed approaches fostered the emergence of comprehensive benchmarks that evaluat
Externí odkaz:
http://arxiv.org/abs/2308.14456
Automatic Data Augmentation for Domain Adapted Fine-Tuning of Self-Supervised Speech Representations
Self-Supervised Learning (SSL) has allowed leveraging large amounts of unlabeled speech data to improve the performance of speech recognition models even with small annotated datasets. Despite this, speech SSL representations may fail while facing an
Externí odkaz:
http://arxiv.org/abs/2306.00481
Publikováno v:
INTERSPEECH 2023
Self-supervised learning (SSL) has recently allowed leveraging large datasets of unlabeled speech signals to reach impressive performance on speech tasks using only small amounts of annotated data. The high number of proposed approaches fostered the
Externí odkaz:
http://arxiv.org/abs/2306.00452