Výsledky vyhledávání - "Brutti, Alessio"

Report

MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages

Autor: Gaido, Marco, Papi, Sara, Bentivogli, Luisa, Brutti, Alessio, Cettolo, Mauro, Gretter, Roberto, Matassoni, Marco, Nabih, Mohamed, Negri, Matteo

The rise of foundation models (FMs), coupled with regulatory efforts addressing their risks and impacts, has sparked significant interest in open-source models. However, existing speech FMs (SFMs) fall short of full compliance with the open-source pr

Externí odkaz: http://arxiv.org/abs/2410.01036

Zobrazit plný text záznamu

Report

Large Language Models Are Strong Audio-Visual Speech Recognition Learners

Autor: Cappellazzo, Umberto, Kim, Minsu, Chen, Honglie, Ma, Pingchuan, Petridis, Stavros, Falavigna, Daniele, Brutti, Alessio, Pantic, Maja

Multimodal large language models (MLLMs) have recently become a focal point of research due to their formidable multimodal understanding capabilities. For example, in the audio and speech domains, an LLM can be equipped with (automatic) speech recogn

Externí odkaz: http://arxiv.org/abs/2409.12319

Zobrazit plný text záznamu

Report

Federating Dynamic Models using Early-Exit Architectures for Automatic Speech Recognition on Heterogeneous Clients

Autor: Ali, Mohamed Nabih, Brutti, Alessio, Falavigna, Daniele

Automatic speech recognition models require large amounts of speech recordings for training. However, the collection of such data often is cumbersome and leads to privacy concerns. Federated learning has been widely used as an effective decentralized

Externí odkaz: http://arxiv.org/abs/2405.17376

Zobrazit plný text záznamu

Report

Efficient Fine-tuning of Audio Spectrogram Transformers via Soft Mixture of Adapters

Autor: Cappellazzo, Umberto, Falavigna, Daniele, Brutti, Alessio

Mixture of Experts (MoE) architectures have recently started burgeoning due to their ability to scale model's capacity while maintaining the computational cost affordable. Furthermore, they can be applied to both Transformers and State Space Models,

Externí odkaz: http://arxiv.org/abs/2402.00828

Zobrazit plný text záznamu

Report

Parameter-Efficient Transfer Learning of Audio Spectrogram Transformers

Autor: Cappellazzo, Umberto, Falavigna, Daniele, Brutti, Alessio, Ravanelli, Mirco

Parameter-efficient transfer learning (PETL) methods have emerged as a solid alternative to the standard full fine-tuning approach. They only train a few extra parameters for each downstream task, without sacrificing performance and dispensing with t

Externí odkaz: http://arxiv.org/abs/2312.03694

Zobrazit plný text záznamu

Report

Continual Contrastive Spoken Language Understanding

Autor: Cappellazzo, Umberto, Fini, Enrico, Yang, Muqiao, Falavigna, Daniele, Brutti, Alessio, Raj, Bhiksha

Recently, neural networks have shown impressive progress across diverse fields, with speech processing being no exception. However, recent breakthroughs in this area require extensive offline training using large datasets and tremendous computing res

Externí odkaz: http://arxiv.org/abs/2310.02699

Zobrazit plný text záznamu

Report

Training dynamic models using early exits for automatic speech recognition on resource-constrained devices

Autor: Wright, George August, Cappellazzo, Umberto, Zaiem, Salah, Raj, Desh, Yang, Lucas Ondel, Falavigna, Daniele, Ali, Mohamed Nabih, Brutti, Alessio

The ability to dynamically adjust the computational load of neural models during inference is crucial for on-device processing scenarios characterised by limited and time-varying computational resources. A promising solution is presented by early-exi

Externí odkaz: http://arxiv.org/abs/2309.09546

Zobrazit plný text záznamu

Report

An Experimental Review of Speaker Diarization methods with application to Two-Speaker Conversational Telephone Speech recordings

Autor: Serafini, Luca, Cornell, Samuele, Morrone, Giovanni, Zovato, Enrico, Brutti, Alessio, Squartini, Stefano

We performed an experimental review of current diarization systems for the conversational telephone speech (CTS) domain. In detail, we considered a total of eight different algorithms belonging to clustering-based, end-to-end neural diarization (EEND

Externí odkaz: http://arxiv.org/abs/2305.18074

Zobrazit plný text záznamu

Report

Sequence-Level Knowledge Distillation for Class-Incremental End-to-End Spoken Language Understanding

Autor: Cappellazzo, Umberto, Yang, Muqiao, Falavigna, Daniele, Brutti, Alessio

The ability to learn new concepts sequentially is a major weakness for modern neural networks, which hinders their use in non-stationary environments. Their propensity to fit the current data distribution to the detriment of the past acquired knowled

Externí odkaz: http://arxiv.org/abs/2305.13899

Zobrazit plný text záznamu

Report

End-to-End Integration of Speech Separation and Voice Activity Detection for Low-Latency Diarization of Telephone Conversations

Autor: Morrone, Giovanni, Cornell, Samuele, Serafini, Luca, Zovato, Enrico, Brutti, Alessio, Squartini, Stefano

Publikováno v: Speech Communication 161 (2024) 103081

Recent works show that speech separation guided diarization (SSGD) is an increasingly promising direction, mainly thanks to the recent progress in speech separation. It performs diarization by first separating the speakers and then applying voice act

Externí odkaz: http://arxiv.org/abs/2303.12002

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání