Zobrazeno 1 - 10
of 90
pro vyhledávání: '"Wichern, Gordon"'
Autor:
Saijo, Kohei, Ebbers, Janek, Germain, François G., Khurana, Sameer, Wichern, Gordon, Roux, Jonathan Le
The goal of text-queried target sound extraction (TSE) is to extract from a mixture a sound source specified with a natural-language caption. While it is preferable to have access to large-scale text-audio pairs to address a variety of text prompts,
Externí odkaz:
http://arxiv.org/abs/2409.13152
Reverberation as supervision (RAS) is a framework that allows for training monaural speech separation models from multi-channel mixtures in an unsupervised manner. In RAS, models are trained so that sources predicted from a mixture at an input channe
Externí odkaz:
http://arxiv.org/abs/2408.03438
Time-frequency (TF) domain dual-path models achieve high-fidelity speech separation. While some previous state-of-the-art (SoTA) models rely on RNNs, this reliance means they lack the parallelizability, scalability, and versatility of Transformer blo
Externí odkaz:
http://arxiv.org/abs/2408.03440
Sound event detection is the task of recognizing sounds and determining their extent (onset/offset times) within an audio clip. Existing systems commonly predict sound presence confidence in short time frames. Then, thresholding produces binary frame
Externí odkaz:
http://arxiv.org/abs/2406.04212
We introduce Self-Monitored Inference-Time INtervention (SMITIN), an approach for controlling an autoregressive generative music transformer using classifier probes. These simple logistic regression probes are trained on the output of each attention
Externí odkaz:
http://arxiv.org/abs/2404.02252
In music source separation, a standard training data augmentation procedure is to create new training samples by randomly combining instrument stems from different songs. These random mixes have mismatched characteristics compared to real music, e.g.
Externí odkaz:
http://arxiv.org/abs/2402.18407
Autor:
Masuyama, Yoshiki, Wichern, Gordon, Germain, François G., Pan, Zexu, Khurana, Sameer, Hori, Chiori, Roux, Jonathan Le
Head-related transfer functions (HRTFs) are important for immersive audio, and their spatial interpolation has been studied to upsample finite measurements. Recently, neural fields (NFs) which map from sound source direction to HRTF have gained atten
Externí odkaz:
http://arxiv.org/abs/2402.17907
Neuro-steered speaker extraction aims to extract the listener's brain-attended speech signal from a multi-talker speech signal, in which the attention is derived from the cortical activity. This activity is usually recorded using electroencephalograp
Externí odkaz:
http://arxiv.org/abs/2312.07513
Autor:
Pan, Zexu, Wichern, Gordon, Masuyama, Yoshiki, Germain, Francois G., Khurana, Sameer, Hori, Chiori, Roux, Jonathan Le
Target speech extraction aims to extract, based on a given conditioning cue, a target speech signal that is corrupted by interfering sources, such as noise or competing speakers. Building upon the achievements of the state-of-the-art (SOTA) time-freq
Externí odkaz:
http://arxiv.org/abs/2310.19644
Autor:
Bralios, Dimitrios, Wichern, Gordon, Germain, François G., Pan, Zexu, Khurana, Sameer, Hori, Chiori, Roux, Jonathan Le
The introduction of audio latent diffusion models possessing the ability to generate realistic sound clips on demand from a text description has the potential to revolutionize how we work with audio. In this work, we make an initial attempt at unders
Externí odkaz:
http://arxiv.org/abs/2310.10604