Zobrazeno 1 - 10
of 105
pro vyhledávání: '"Fontaine, Mathieu"'
Publikováno v:
INTERSPEECH, Sep 2024, Kos Island, Greece
Single-channel speech dereverberation aims at extracting a dry speech signal from a recording affected by the acoustic reflections in a room. However, most current deep learning-based approaches for speech dereverberation are not interpretable for ro
Externí odkaz:
http://arxiv.org/abs/2407.08657
Autor:
Letzelter, Victor, Perera, David, Rommel, Cédric, Fontaine, Mathieu, Essid, Slim, Richard, Gael, Pérez, Patrick
Winner-takes-all training is a simple learning paradigm, which handles ambiguous tasks by predicting a set of plausible hypotheses. Recently, a connection was established between Winner-takes-all training and centroidal Voronoi tessellations, showing
Externí odkaz:
http://arxiv.org/abs/2406.04706
Publikováno v:
ICASSP, Apr 2024, Seoul (Korea), South Korea
Isolating the desired speaker's voice amidst multiplespeakers in a noisy acoustic context is a challenging task. Per-sonalized speech enhancement (PSE) endeavours to achievethis by leveraging prior knowledge of the speaker's voice.Recent research eff
Externí odkaz:
http://arxiv.org/abs/2404.08022
Publikováno v:
IEEE International Conference on Acoustics, Speech and Signal Processing, Apr 2024, Seoul (Korea), South Korea
Diffusion models are receiving a growing interest for a variety of signal generation tasks such as speech or music synthesis. WaveGrad, for example, is a successful diffusion model that conditionally uses the mel spectrogram to guide a diffusion proc
Externí odkaz:
http://arxiv.org/abs/2402.15516
Publikováno v:
IEEE International Conference on Acoustics, Speech and Signal Processing, Apr 2024, Seoul (Korea), South Korea
Generative adversarial network (GAN) models can synthesize highquality audio signals while ensuring fast sample generation. However, they are difficult to train and are prone to several issues including mode collapse and divergence. In this paper, we
Externí odkaz:
http://arxiv.org/abs/2402.01753
Publikováno v:
IEEE International Conference on Acoustics, Speech, and Signal Processing, Apr 2024, Seoul (Korea), South Korea
Overlapped speech is notoriously problematic for speaker diarization systems. Consequently, the use of speech separation has recently been proposed to improve their performance. Although promising, speech separation models struggle with realistic dat
Externí odkaz:
http://arxiv.org/abs/2402.00067
Autor:
Letzelter, Victor, Fontaine, Mathieu, Chen, Mickaël, Pérez, Patrick, Essid, Slim, Richard, Gaël
Publikováno v:
Advances in neural information processing systems, Dec 2023, New Orleans, United States
We introduce Resilient Multiple Choice Learning (rMCL), an extension of the MCL approach for conditional distribution estimation in regression settings where multiple targets may be sampled for each training input. Multiple Choice Learning is a simpl
Externí odkaz:
http://arxiv.org/abs/2311.01052
We address the problem of accurately interpolating measured anechoic steering vectors with a deep learning framework called the neural field. This task plays a pivotal role in reducing the resource-intensive measurements required for precise sound so
Externí odkaz:
http://arxiv.org/abs/2305.04447
Autor:
Nugraha, Aditya Arie, Sekiguchi, Kouhei, Fontaine, Mathieu, Bando, Yoshiaki, Yoshii, Kazuyoshi
This paper describes a practical dual-process speech enhancement system that adapts environment-sensitive frame-online beamforming (front-end) with help from environment-free block-online source separation (back-end). To use minimum variance distorti
Externí odkaz:
http://arxiv.org/abs/2207.10934
Autor:
Sekiguchi, Kouhei, Nugraha, Aditya Arie, Du, Yicheng, Bando, Yoshiaki, Fontaine, Mathieu, Yoshii, Kazuyoshi
This paper describes the practical response- and performance-aware development of online speech enhancement for an augmented reality (AR) headset that helps a user understand conversations made in real noisy echoic environments (e.g., cocktail party)
Externí odkaz:
http://arxiv.org/abs/2207.07296