Zobrazeno 1 - 10
of 1 194
pro vyhledávání: '"Della Libera, A."'
Autor:
Della Libera, Luca
Recent advances in deep reinforcement learning have achieved impressive results in a wide range of complex tasks, but poor sample efficiency remains a major obstacle to real-world deployment. Soft actor-critic (SAC) mitigates this problem by combinin
Externí odkaz:
http://arxiv.org/abs/2409.04971
Autor:
Bongratz, Fabian, Golkov, Vladimir, Mautner, Lukas, Della Libera, Luca, Heetmeyer, Frederik, Czaja, Felix, Rodemann, Julian, Cremers, Daniel
The field of reinforcement learning offers a large variety of concepts and methods to tackle sequential decision-making problems. This variety has become so large that choosing an algorithm for a task at hand can be challenging. In this work, we stre
Externí odkaz:
http://arxiv.org/abs/2407.20917
Autor:
Ravanelli, Mirco, Parcollet, Titouan, Moumen, Adel, de Langen, Sylvain, Subakan, Cem, Plantinga, Peter, Wang, Yingzhi, Mousavi, Pooneh, Della Libera, Luca, Ploujnikov, Artem, Paissan, Francesco, Borra, Davide, Zaiem, Salah, Zhao, Zeyu, Zhang, Shucong, Karakasidis, Georgios, Yeh, Sung-Lin, Champion, Pierre, Rouhe, Aku, Braun, Rudolf, Mai, Florian, Zuluaga-Gomez, Juan, Mousavi, Seyed Mahed, Nautsch, Andreas, Nguyen, Ha, Liu, Xuechen, Sagar, Sangeet, Duret, Jarod, Mdhaffar, Salima, Laperriere, Gaelle, Rouvier, Mickael, De Mori, Renato, Esteve, Yannick
SpeechBrain is an open-source Conversational AI toolkit based on PyTorch, focused particularly on speech processing tasks such as speech recognition, speech enhancement, speaker recognition, text-to-speech, and much more. It promotes transparency and
Externí odkaz:
http://arxiv.org/abs/2407.00463
Autor:
Mousavi, Pooneh, Della Libera, Luca, Duret, Jarod, Ploujnikov, Artem, Subakan, Cem, Ravanelli, Mirco
Discrete audio tokens have recently gained considerable attention for their potential to connect audio and language processing, enabling the creation of modern multimodal large language models. Ideal audio tokens must effectively preserve phonetic an
Externí odkaz:
http://arxiv.org/abs/2406.14294
Autor:
Mousavi, Pooneh, Duret, Jarod, Zaiem, Salah, Della Libera, Luca, Ploujnikov, Artem, Subakan, Cem, Ravanelli, Mirco
Discrete audio tokens have recently gained attention for their potential to bridge the gap between audio and language processing. Ideal audio tokens must preserve content, paralinguistic elements, speaker identity, and many other audio details. Curre
Externí odkaz:
http://arxiv.org/abs/2406.10735
Interpreting the decisions of deep learning models, including audio classifiers, is crucial for ensuring the transparency and trustworthiness of this technology. In this paper, we introduce LMAC-ZS (Listenable Maps for Audio Classifiers in the Zero-S
Externí odkaz:
http://arxiv.org/abs/2405.17615
The increasing success of deep neural networks has raised concerns about their inherent black-box nature, posing challenges related to interpretability and trust. While there has been extensive exploration of interpretation techniques in vision and l
Externí odkaz:
http://arxiv.org/abs/2402.02754
Autor:
Della Libera, Luca, Andreoli, Jacopo, Pezze, Davide Dalle, Ravanelli, Mirco, Susto, Gian Antonio
A crucial task in predictive maintenance is estimating the remaining useful life of physical systems. In the last decade, deep learning has improved considerably upon traditional model-based and statistical approaches in terms of predictive performan
Externí odkaz:
http://arxiv.org/abs/2402.01098
Modern multilingual automatic speech recognition (ASR) systems like Whisper have made it possible to transcribe audio in multiple languages with a single model. However, current state-of-the-art ASR models are typically evaluated on individual langua
Externí odkaz:
http://arxiv.org/abs/2310.16931
Autor:
Paissan, Francesco, Della Libera, Luca, Wang, Zhepei, Ravanelli, Mirco, Smaragdis, Paris, Subakan, Cem
In this paper, we explore audio-editing with non-rigid text edits. We show that the proposed editing pipeline is able to create audio edits that remain faithful to the input audio. We explore text prompts that perform addition, style transfer, and in
Externí odkaz:
http://arxiv.org/abs/2310.12858