Výsledky vyhledávání - "Wichern, Gordon"

Report

Leveraging Audio-Only Data for Text-Queried Target Sound Extraction

Autor: Saijo, Kohei, Ebbers, Janek, Germain, François G., Khurana, Sameer, Wichern, Gordon, Roux, Jonathan Le

The goal of text-queried target sound extraction (TSE) is to extract from a mixture a sound source specified with a natural-language caption. While it is preferable to have access to large-scale text-audio pairs to address a variety of text prompts,

Externí odkaz: http://arxiv.org/abs/2409.13152

Zobrazit plný text záznamu

Report

Enhanced Reverberation as Supervision for Unsupervised Speech Separation

Autor: Saijo, Kohei, Wichern, Gordon, Germain, François G., Pan, Zexu, Roux, Jonathan Le

Reverberation as supervision (RAS) is a framework that allows for training monaural speech separation models from multi-channel mixtures in an unsupervised manner. In RAS, models are trained so that sources predicted from a mixture at an input channe

Externí odkaz: http://arxiv.org/abs/2408.03438

Zobrazit plný text záznamu

Report

TF-Locoformer: Transformer with Local Modeling by Convolution for Speech Separation and Enhancement

Autor: Saijo, Kohei, Wichern, Gordon, Germain, François G., Pan, Zexu, Roux, Jonathan Le

Time-frequency (TF) domain dual-path models achieve high-fidelity speech separation. While some previous state-of-the-art (SoTA) models rely on RNNs, this reliance means they lack the parallelizability, scalability, and versatility of Transformer blo

Externí odkaz: http://arxiv.org/abs/2408.03440

Zobrazit plný text záznamu

Report

Sound Event Bounding Boxes

Autor: Ebbers, Janek, Germain, Francois G., Wichern, Gordon, Roux, Jonathan Le

Sound event detection is the task of recognizing sounds and determining their extent (onset/offset times) within an audio clip. Existing systems commonly predict sound presence confidence in short time frames. Then, thresholding produces binary frame

Externí odkaz: http://arxiv.org/abs/2406.04212

Zobrazit plný text záznamu

Report

SMITIN: Self-Monitored Inference-Time INtervention for Generative Music Transformers

Autor: Koo, Junghyun, Wichern, Gordon, Germain, Francois G., Khurana, Sameer, Roux, Jonathan Le

We introduce Self-Monitored Inference-Time INtervention (SMITIN), an approach for controlling an autoregressive generative music transformer using classifier probes. These simple logistic regression probes are trained on the output of each attention

Externí odkaz: http://arxiv.org/abs/2404.02252

Zobrazit plný text záznamu

Report

Why does music source separation benefit from cacophony?

Autor: Jeon, Chang-Bin, Wichern, Gordon, Germain, François G., Roux, Jonathan Le

In music source separation, a standard training data augmentation procedure is to create new training samples by randomly combining instrument stems from different songs. These random mixes have mismatched characteristics compared to real music, e.g.

Externí odkaz: http://arxiv.org/abs/2402.18407

Zobrazit plný text záznamu

Report

NIIRF: Neural IIR Filter Field for HRTF Upsampling and Personalization

Autor: Masuyama, Yoshiki, Wichern, Gordon, Germain, François G., Pan, Zexu, Khurana, Sameer, Hori, Chiori, Roux, Jonathan Le

Head-related transfer functions (HRTFs) are important for immersive audio, and their spatial interpolation has been studied to upsample finite measurements. Recently, neural fields (NFs) which map from sound source direction to HRTF have gained atten

Externí odkaz: http://arxiv.org/abs/2402.17907

Zobrazit plný text záznamu

Report

NeuroHeed+: Improving Neuro-steered Speaker Extraction with Joint Auditory Attention Detection

Autor: Pan, Zexu, Wichern, Gordon, Germain, Francois G., Khurana, Sameer, Roux, Jonathan Le

Neuro-steered speaker extraction aims to extract the listener's brain-attended speech signal from a multi-talker speech signal, in which the attention is derived from the cortical activity. This activity is usually recorded using electroencephalograp

Externí odkaz: http://arxiv.org/abs/2312.07513

Zobrazit plný text záznamu

Report

Scenario-Aware Audio-Visual TF-GridNet for Target Speech Extraction

Autor: Pan, Zexu, Wichern, Gordon, Masuyama, Yoshiki, Germain, Francois G., Khurana, Sameer, Hori, Chiori, Roux, Jonathan Le

Target speech extraction aims to extract, based on a given conditioning cue, a target speech signal that is corrupted by interfering sources, such as noise or competing speakers. Building upon the achievements of the state-of-the-art (SOTA) time-freq

Externí odkaz: http://arxiv.org/abs/2310.19644

Zobrazit plný text záznamu

Report

Generation or Replication: Auscultating Audio Latent Diffusion Models

Autor: Bralios, Dimitrios, Wichern, Gordon, Germain, François G., Pan, Zexu, Khurana, Sameer, Hori, Chiori, Roux, Jonathan Le

The introduction of audio latent diffusion models possessing the ability to generate realistic sound clips on demand from a text description has the potential to revolutionize how we work with audio. In this work, we make an initial attempt at unders

Externí odkaz: http://arxiv.org/abs/2310.10604

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání