Zobrazeno 1 - 10
of 95
pro vyhledávání: '"Brutti, Alessio"'
MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages
Autor:
Gaido, Marco, Papi, Sara, Bentivogli, Luisa, Brutti, Alessio, Cettolo, Mauro, Gretter, Roberto, Matassoni, Marco, Nabih, Mohamed, Negri, Matteo
The rise of foundation models (FMs), coupled with regulatory efforts addressing their risks and impacts, has sparked significant interest in open-source models. However, existing speech FMs (SFMs) fall short of full compliance with the open-source pr
Externí odkaz:
http://arxiv.org/abs/2410.01036
Autor:
Cappellazzo, Umberto, Kim, Minsu, Chen, Honglie, Ma, Pingchuan, Petridis, Stavros, Falavigna, Daniele, Brutti, Alessio, Pantic, Maja
Multimodal large language models (MLLMs) have recently become a focal point of research due to their formidable multimodal understanding capabilities. For example, in the audio and speech domains, an LLM can be equipped with (automatic) speech recogn
Externí odkaz:
http://arxiv.org/abs/2409.12319
Automatic speech recognition models require large amounts of speech recordings for training. However, the collection of such data often is cumbersome and leads to privacy concerns. Federated learning has been widely used as an effective decentralized
Externí odkaz:
http://arxiv.org/abs/2405.17376
Mixture of Experts (MoE) architectures have recently started burgeoning due to their ability to scale model's capacity while maintaining the computational cost affordable. Furthermore, they can be applied to both Transformers and State Space Models,
Externí odkaz:
http://arxiv.org/abs/2402.00828
Parameter-efficient transfer learning (PETL) methods have emerged as a solid alternative to the standard full fine-tuning approach. They only train a few extra parameters for each downstream task, without sacrificing performance and dispensing with t
Externí odkaz:
http://arxiv.org/abs/2312.03694
Autor:
Cappellazzo, Umberto, Fini, Enrico, Yang, Muqiao, Falavigna, Daniele, Brutti, Alessio, Raj, Bhiksha
Recently, neural networks have shown impressive progress across diverse fields, with speech processing being no exception. However, recent breakthroughs in this area require extensive offline training using large datasets and tremendous computing res
Externí odkaz:
http://arxiv.org/abs/2310.02699
Autor:
Wright, George August, Cappellazzo, Umberto, Zaiem, Salah, Raj, Desh, Yang, Lucas Ondel, Falavigna, Daniele, Ali, Mohamed Nabih, Brutti, Alessio
The ability to dynamically adjust the computational load of neural models during inference is crucial for on-device processing scenarios characterised by limited and time-varying computational resources. A promising solution is presented by early-exi
Externí odkaz:
http://arxiv.org/abs/2309.09546
Autor:
Serafini, Luca, Cornell, Samuele, Morrone, Giovanni, Zovato, Enrico, Brutti, Alessio, Squartini, Stefano
We performed an experimental review of current diarization systems for the conversational telephone speech (CTS) domain. In detail, we considered a total of eight different algorithms belonging to clustering-based, end-to-end neural diarization (EEND
Externí odkaz:
http://arxiv.org/abs/2305.18074
Sequence-Level Knowledge Distillation for Class-Incremental End-to-End Spoken Language Understanding
The ability to learn new concepts sequentially is a major weakness for modern neural networks, which hinders their use in non-stationary environments. Their propensity to fit the current data distribution to the detriment of the past acquired knowled
Externí odkaz:
http://arxiv.org/abs/2305.13899
Autor:
Morrone, Giovanni, Cornell, Samuele, Serafini, Luca, Zovato, Enrico, Brutti, Alessio, Squartini, Stefano
Publikováno v:
Speech Communication 161 (2024) 103081
Recent works show that speech separation guided diarization (SSGD) is an increasingly promising direction, mainly thanks to the recent progress in speech separation. It performs diarization by first separating the speakers and then applying voice act
Externí odkaz:
http://arxiv.org/abs/2303.12002