Zobrazeno 1 - 10
of 217
pro vyhledávání: '"Estève, Yannick"'
Self-Supervised Learning (SSL) has proven to be effective in various domains, including speech processing. However, SSL is computationally and memory expensive. This is in part due the quadratic complexity of multi-head self-attention (MHSA). Alterna
Externí odkaz:
http://arxiv.org/abs/2409.02596
Publikováno v:
Speaker and Language Recognition Workshop - Odyssey, Jun 2024, Qu{\'e}bec (Canada), Canada
Speech resynthesis is a generic task for which we want to synthesize audio with another audio as input, which finds applications for media monitors and journalists.Among different tasks addressed by speech resynthesis, voice conversion preserves the
Externí odkaz:
http://arxiv.org/abs/2408.02712
Recent advancements in textless speech-to-speech translation systems have been driven by the adoption of self-supervised learning techniques. Although most state-of-the-art systems adopt a similar architecture to transform source language speech into
Externí odkaz:
http://arxiv.org/abs/2407.18332
Publikováno v:
Odyssey 2024, Jun 2024, Quebec, France
In this work, we detail our submission to the 2024 edition of the MSP-Podcast Speech Emotion Recognition (SER) Challenge. This challenge is divided into two distinct tasks: Categorical Emotion Recognition and Emotional Attribute Prediction. We concen
Externí odkaz:
http://arxiv.org/abs/2407.05746
Speech encoders pretrained through self-supervised learning (SSL) have demonstrated remarkable performance in various downstream tasks, including Spoken Language Understanding (SLU) and Automatic Speech Recognition (ASR). For instance, fine-tuning SS
Externí odkaz:
http://arxiv.org/abs/2407.04533
Autor:
Ravanelli, Mirco, Parcollet, Titouan, Moumen, Adel, de Langen, Sylvain, Subakan, Cem, Plantinga, Peter, Wang, Yingzhi, Mousavi, Pooneh, Della Libera, Luca, Ploujnikov, Artem, Paissan, Francesco, Borra, Davide, Zaiem, Salah, Zhao, Zeyu, Zhang, Shucong, Karakasidis, Georgios, Yeh, Sung-Lin, Champion, Pierre, Rouhe, Aku, Braun, Rudolf, Mai, Florian, Zuluaga-Gomez, Juan, Mousavi, Seyed Mahed, Nautsch, Andreas, Nguyen, Ha, Liu, Xuechen, Sagar, Sangeet, Duret, Jarod, Mdhaffar, Salima, Laperriere, Gaelle, Rouvier, Mickael, De Mori, Renato, Esteve, Yannick
SpeechBrain is an open-source Conversational AI toolkit based on PyTorch, focused particularly on speech processing tasks such as speech recognition, speech enhancement, speaker recognition, text-to-speech, and much more. It promotes transparency and
Externí odkaz:
http://arxiv.org/abs/2407.00463
Publikováno v:
27th International Conference on Text, Speech and Dialogue, Sep 2024, Brno (R{\'e}p. Tch{\`e}que), Czech Republic
In spoken Task-Oriented Dialogue (TOD) systems, the choice of the semantic representation describing the users' requests is key to a smooth interaction. Indeed, the system uses this representation to reason over a database and its domain knowledge to
Externí odkaz:
http://arxiv.org/abs/2406.13269
Self-Supervised Learning is vastly used to efficiently represent speech for Spoken Language Understanding, gradually replacing conventional approaches. Meanwhile, textual SSL models are proposed to encode language-agnostic semantics. SAMU-XLSR framew
Externí odkaz:
http://arxiv.org/abs/2406.12141
Autor:
Sekkat, Chloé, Leroy, Fanny, Mdhaffar, Salima, Smith, Blake Perry, Estève, Yannick, Dureau, Joseph, Coucke, Alice
Recent works demonstrate that voice assistants do not perform equally well for everyone, but research on demographic robustness of speech technologies is still scarce. This is mainly due to the rarity of large datasets with controlled demographic tag
Externí odkaz:
http://arxiv.org/abs/2405.19342
Self-Supervised Learning (SSL) has proven to be useful in various speech tasks. However, these methods are generally very demanding in terms of data, memory, and computational resources. BERT-based Speech pre-Training with Random-projection Quantizer
Externí odkaz:
http://arxiv.org/abs/2405.04296