Zobrazeno 1 - 10
of 5 599
pro vyhledávání: '"SINISCALCHI A."'
In this work, we propose a novel consistency-preserving loss function for recovering the phase information in the context of phase reconstruction (PR) and speech enhancement (SE). Different from conventional techniques that directly estimate the phas
Externí odkaz:
http://arxiv.org/abs/2409.16282
This work investigates two strategies for zero-shot non-intrusive speech assessment leveraging large language models. First, we explore the audio analysis capabilities of GPT-4o. Second, we propose GPT-Whisper, which uses Whisper as an audio-to-text
Externí odkaz:
http://arxiv.org/abs/2409.09914
Autor:
Yang, Chao-Han Huck, Park, Taejin, Gong, Yuan, Li, Yuanchao, Chen, Zhehuai, Lin, Yen-Ting, Chen, Chen, Hu, Yuchen, Dhawan, Kunal, Żelasko, Piotr, Zhang, Chao, Chen, Yun-Nung, Tsao, Yu, Balam, Jagadeesh, Ginsburg, Boris, Siniscalchi, Sabato Marco, Chng, Eng Siong, Bell, Peter, Lai, Catherine, Watanabe, Shinji, Stolcke, Andreas
Given recent advances in generative AI technology, a key question is how large language models (LLMs) can enhance acoustic modeling tasks using text decoding results from a frozen, pretrained automatic speech recognition (ASR) model. To explore new c
Externí odkaz:
http://arxiv.org/abs/2409.09785
Autor:
Khan, Muhammad Salman, La Quatra, Moreno, Hung, Kuo-Hsuan, Fu, Szu-Wei, Siniscalchi, Sabato Marco, Tsao, Yu
Self-supervised representation learning (SSL) has attained SOTA results on several downstream speech tasks, but SSL-based speech enhancement (SE) solutions still lag behind. To address this issue, we exploit three main ideas: (i) Transformer-based ma
Externí odkaz:
http://arxiv.org/abs/2408.04773
Autor:
La Quatra, Moreno, Turco, Maria Francesca, Svendsen, Torbjørn, Salvi, Giampiero, Orozco-Arroyave, Juan Rafael, Siniscalchi, Sabato Marco
This work is concerned with devising a robust Parkinson's (PD) disease detector from speech in real-world operating conditions using (i) foundational models, and (ii) speech enhancement (SE) methods. To this end, we first fine-tune several foundation
Externí odkaz:
http://arxiv.org/abs/2406.16128
Italy exhibits rich linguistic diversity across its territory due to the distinct regional languages spoken in different areas. Recent advances in self-supervised learning provide new opportunities to analyze Italy's linguistic varieties using speech
Externí odkaz:
http://arxiv.org/abs/2406.15862
We propose a novel language-universal approach to end-to-end automatic spoken keyword recognition (SKR) leveraging upon (i) a self-supervised pre-trained model, and (ii) a set of universal speech attributes (manner and place of articulation). Specifi
Externí odkaz:
http://arxiv.org/abs/2406.02488
Autor:
Chao, Rong, Cheng, Wen-Huang, La Quatra, Moreno, Siniscalchi, Sabato Marco, Yang, Chao-Han Huck, Fu, Szu-Wei, Tsao, Yu
This work aims to study a scalable state-space model (SSM), Mamba, for the speech enhancement (SE) task. We exploit a Mamba-based regression model to characterize speech signals and build an SE system upon Mamba, termed SEMamba. We explore the proper
Externí odkaz:
http://arxiv.org/abs/2405.06573
Autor:
La Quatra, Moreno, Koudounas, Alkis, Vaiani, Lorenzo, Baralis, Elena, Cagliero, Luca, Garza, Paolo, Siniscalchi, Sabato Marco
Limited diversity in standardized benchmarks for evaluating audio representation learning (ARL) methods may hinder systematic comparison of current methods' capabilities. We present ARCH, a comprehensive benchmark for evaluating ARL methods on divers
Externí odkaz:
http://arxiv.org/abs/2405.00934
Autor:
Chen, Chen, Li, Ruizhe, Hu, Yuchen, Siniscalchi, Sabato Marco, Chen, Pin-Yu, Chng, Ensiong, Yang, Chao-Han Huck
Recent studies have successfully shown that large language models (LLMs) can be successfully used for generative error correction (GER) on top of the automatic speech recognition (ASR) output. Specifically, an LLM is utilized to carry out a direct ma
Externí odkaz:
http://arxiv.org/abs/2402.05457