Zobrazeno 1 - 10
of 196
pro vyhledávání: '"Smaragdis, Paris"'
Deep Learning techniques have excelled at generating embedding spaces that capture semantic similarities between items. Often these representations are paired, enabling experiments with analogies (pairs within the same domain) and cross-modality (pai
Externí odkaz:
http://arxiv.org/abs/2411.08687
Recent advances in audio-text cross-modal contrastive learning have shown its potential towards zero-shot learning. One possibility for this is by projecting item embeddings from pre-trained backbone neural networks into a cross-modal space in which
Externí odkaz:
http://arxiv.org/abs/2408.13068
Non-negative Matrix Factorization (NMF) is a powerful technique for analyzing regularly-sampled data, i.e., data that can be stored in a matrix. For audio, this has led to numerous applications using time-frequency (TF) representations like the Short
Externí odkaz:
http://arxiv.org/abs/2404.04439
We introduce a new online adaptive filtering method called supervised multi-step adaptive filters (SMS-AF). Our method uses neural networks to control or optimize linear multi-delay or multi-channel frequency-domain filters and can flexibly scale-up
Externí odkaz:
http://arxiv.org/abs/2403.00977
While neural network approaches have made significant strides in resolving classical signal processing problems, it is often the case that hybrid approaches that draw insight from both signal processing and neural networks produce more complete solut
Externí odkaz:
http://arxiv.org/abs/2402.06683
Adaptive filters (AFs) are vital for enhancing the performance of downstream tasks, such as speech recognition, sound event detection, and keyword spotting. However, traditional AF design prioritizes isolated signal-level objectives, often overlookin
Externí odkaz:
http://arxiv.org/abs/2312.10605
Autor:
Paissan, Francesco, Della Libera, Luca, Wang, Zhepei, Ravanelli, Mirco, Smaragdis, Paris, Subakan, Cem
In this paper, we explore audio-editing with non-rigid text edits. We show that the proposed editing pipeline is able to create audio edits that remain faithful to the input audio. We explore text prompts that perform addition, style transfer, and in
Externí odkaz:
http://arxiv.org/abs/2310.12858
Autor:
Lu, Austin, Moore, Ethaniel, Nallanthighall, Arya, Sarkar, Kanad, Mittal, Manan, Corey, Ryan M., Smaragdis, Paris, Singer, Andrew
We address the challenge of making spatial audio datasets by proposing a shared mechanized recording space that can run custom acoustic experiments: a Mechatronic Acoustic Research System (MARS). To accommodate a wide variety of experiments, we imple
Externí odkaz:
http://arxiv.org/abs/2310.00587
Pitch estimation is an essential step of many speech processing algorithms, including speech coding, synthesis, and enhancement. Recently, pitch estimators based on deep neural networks (DNNs) have have been outperforming well-established DSP-based t
Externí odkaz:
http://arxiv.org/abs/2309.14507
Publikováno v:
2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)
Recent approaches in source separation leverage semantic information about their input mixtures and constituent sources that when used in conditional separation models can achieve impressive performance. Most approaches along these lines have focused
Externí odkaz:
http://arxiv.org/abs/2307.14609