Zobrazeno 1 - 10
of 5 607
pro vyhledávání: '"A, Siniscalchi"'
Autor:
Liu, Yu-Tung, Wang, Kuan-Chen, Chao, Rong, Siniscalchi, Sabato Marco, Yeh, Ping-Cheng, Tsao, Yu
Surface electromyography (sEMG) recordings can be contaminated by electrocardiogram (ECG) signals when the monitored muscle is closed to the heart. Traditional signal-processing-based approaches, such as high-pass filtering and template subtraction,
Externí odkaz:
http://arxiv.org/abs/2411.18902
In this work, we propose a novel consistency-preserving loss function for recovering the phase information in the context of phase reconstruction (PR) and speech enhancement (SE). Different from conventional techniques that directly estimate the phas
Externí odkaz:
http://arxiv.org/abs/2409.16282
This work investigates two strategies for zero-shot non-intrusive speech assessment leveraging large language models. First, we explore the audio analysis capabilities of GPT-4o. Second, we propose GPT-Whisper, which uses Whisper as an audio-to-text
Externí odkaz:
http://arxiv.org/abs/2409.09914
Autor:
Yang, Chao-Han Huck, Park, Taejin, Gong, Yuan, Li, Yuanchao, Chen, Zhehuai, Lin, Yen-Ting, Chen, Chen, Hu, Yuchen, Dhawan, Kunal, Żelasko, Piotr, Zhang, Chao, Chen, Yun-Nung, Tsao, Yu, Balam, Jagadeesh, Ginsburg, Boris, Siniscalchi, Sabato Marco, Chng, Eng Siong, Bell, Peter, Lai, Catherine, Watanabe, Shinji, Stolcke, Andreas
Given recent advances in generative AI technology, a key question is how large language models (LLMs) can enhance acoustic modeling tasks using text decoding results from a frozen, pretrained automatic speech recognition (ASR) model. To explore new c
Externí odkaz:
http://arxiv.org/abs/2409.09785
Autor:
Khan, Muhammad Salman, La Quatra, Moreno, Hung, Kuo-Hsuan, Fu, Szu-Wei, Siniscalchi, Sabato Marco, Tsao, Yu
Self-supervised representation learning (SSL) has attained SOTA results on several downstream speech tasks, but SSL-based speech enhancement (SE) solutions still lag behind. To address this issue, we exploit three main ideas: (i) Transformer-based ma
Externí odkaz:
http://arxiv.org/abs/2408.04773
Autor:
La Quatra, Moreno, Turco, Maria Francesca, Svendsen, Torbjørn, Salvi, Giampiero, Orozco-Arroyave, Juan Rafael, Siniscalchi, Sabato Marco
This work is concerned with devising a robust Parkinson's (PD) disease detector from speech in real-world operating conditions using (i) foundational models, and (ii) speech enhancement (SE) methods. To this end, we first fine-tune several foundation
Externí odkaz:
http://arxiv.org/abs/2406.16128
Italy exhibits rich linguistic diversity across its territory due to the distinct regional languages spoken in different areas. Recent advances in self-supervised learning provide new opportunities to analyze Italy's linguistic varieties using speech
Externí odkaz:
http://arxiv.org/abs/2406.15862
We propose a novel language-universal approach to end-to-end automatic spoken keyword recognition (SKR) leveraging upon (i) a self-supervised pre-trained model, and (ii) a set of universal speech attributes (manner and place of articulation). Specifi
Externí odkaz:
http://arxiv.org/abs/2406.02488
Autor:
Chao, Rong, Cheng, Wen-Huang, La Quatra, Moreno, Siniscalchi, Sabato Marco, Yang, Chao-Han Huck, Fu, Szu-Wei, Tsao, Yu
This work aims to study a scalable state-space model (SSM), Mamba, for the speech enhancement (SE) task. We exploit a Mamba-based regression model to characterize speech signals and build an SE system upon Mamba, termed SEMamba. We explore the proper
Externí odkaz:
http://arxiv.org/abs/2405.06573
Autor:
La Quatra, Moreno, Koudounas, Alkis, Vaiani, Lorenzo, Baralis, Elena, Cagliero, Luca, Garza, Paolo, Siniscalchi, Sabato Marco
Limited diversity in standardized benchmarks for evaluating audio representation learning (ARL) methods may hinder systematic comparison of current methods' capabilities. We present ARCH, a comprehensive benchmark for evaluating ARL methods on divers
Externí odkaz:
http://arxiv.org/abs/2405.00934