Zobrazeno 1 - 10
of 2 720
pro vyhledávání: '"FRANCOIS, G."'
Several attempts have been made to handle multiple source separation tasks such as speech enhancement, speech separation, sound event separation, music source separation (MSS), or cinematic audio source separation (CASS) with a single model. These mo
Externí odkaz:
http://arxiv.org/abs/2410.23987
Autor:
Saijo, Kohei, Ebbers, Janek, Germain, François G., Khurana, Sameer, Wichern, Gordon, Roux, Jonathan Le
The goal of text-queried target sound extraction (TSE) is to extract from a mixture a sound source specified with a natural-language caption. While it is preferable to have access to large-scale text-audio pairs to address a variety of text prompts,
Externí odkaz:
http://arxiv.org/abs/2409.13152
Autor:
Meyer, François G
The notion of Fr\'echet mean (also known as "barycenter") network is the workhorse of most machine learning algorithms that require the estimation of a "location" parameter to analyse network-valued data. In this context, it is critical that the netw
Externí odkaz:
http://arxiv.org/abs/2408.03461
Reverberation as supervision (RAS) is a framework that allows for training monaural speech separation models from multi-channel mixtures in an unsupervised manner. In RAS, models are trained so that sources predicted from a mixture at an input channe
Externí odkaz:
http://arxiv.org/abs/2408.03438
Time-frequency (TF) domain dual-path models achieve high-fidelity speech separation. While some previous state-of-the-art (SoTA) models rely on RNNs, this reliance means they lack the parallelizability, scalability, and versatility of Transformer blo
Externí odkaz:
http://arxiv.org/abs/2408.03440
Sound event detection is the task of recognizing sounds and determining their extent (onset/offset times) within an audio clip. Existing systems commonly predict sound presence confidence in short time frames. Then, thresholding produces binary frame
Externí odkaz:
http://arxiv.org/abs/2406.04212
We introduce Self-Monitored Inference-Time INtervention (SMITIN), an approach for controlling an autoregressive generative music transformer using classifier probes. These simple logistic regression probes are trained on the output of each attention
Externí odkaz:
http://arxiv.org/abs/2404.02252
In music source separation, a standard training data augmentation procedure is to create new training samples by randomly combining instrument stems from different songs. These random mixes have mismatched characteristics compared to real music, e.g.
Externí odkaz:
http://arxiv.org/abs/2402.18407
Autor:
Masuyama, Yoshiki, Wichern, Gordon, Germain, François G., Pan, Zexu, Khurana, Sameer, Hori, Chiori, Roux, Jonathan Le
Head-related transfer functions (HRTFs) are important for immersive audio, and their spatial interpolation has been studied to upsample finite measurements. Recently, neural fields (NFs) which map from sound source direction to HRTF have gained atten
Externí odkaz:
http://arxiv.org/abs/2402.17907
Neuro-steered speaker extraction aims to extract the listener's brain-attended speech signal from a multi-talker speech signal, in which the attention is derived from the cortical activity. This activity is usually recorded using electroencephalograp
Externí odkaz:
http://arxiv.org/abs/2312.07513