Zobrazeno 1 - 10
of 38
pro vyhledávání: '"Welker, Simon"'
Diffusion models have found great success in generating high quality, natural samples of speech, but their potential for density estimation for speech has so far remained largely unexplored. In this work, we leverage an unconditional diffusion model
Externí odkaz:
http://arxiv.org/abs/2410.17834
This paper presents an unsupervised method for single-channel blind dereverberation and room impulse response (RIR) estimation, called BUDDy. The algorithm is rooted in Bayesian posterior sampling: it combines a likelihood model enforcing fidelity to
Externí odkaz:
http://arxiv.org/abs/2408.07472
Autor:
Richter, Julius, Wu, Yi-Chiao, Krenn, Steven, Welker, Simon, Lay, Bunlong, Watanabe, Shinji, Richard, Alexander, Gerkmann, Timo
We release the EARS (Expressive Anechoic Recordings of Speech) dataset, a high-quality speech dataset comprising 107 speakers from diverse backgrounds, totaling in 100 hours of clean, anechoic speech data. The dataset covers a large range of differen
Externí odkaz:
http://arxiv.org/abs/2406.06185
To obtain improved speech enhancement models, researchers often focus on increasing performance according to specific instrumental metrics. However, when the same metric is used in a loss function to optimize models, it may be detrimental to aspects
Externí odkaz:
http://arxiv.org/abs/2406.03460
In this paper, we present an unsupervised single-channel method for joint blind dereverberation and room impulse response estimation, based on posterior sampling with diffusion models. We parameterize the reverberation operator using a filter with ex
Externí odkaz:
http://arxiv.org/abs/2405.04272
Autor:
Lemercier, Jean-Marie, Richter, Julius, Welker, Simon, Moliner, Eloi, Välimäki, Vesa, Gerkmann, Timo
Publikováno v:
IEEE Signal Processing Magazine, Jan 2025
With the development of audio playback devices and fast data transmission, the demand for high sound quality is rising for both entertainment and communications. In this quest for better sound quality, challenges emerge from distortions and interfere
Externí odkaz:
http://arxiv.org/abs/2402.09821
In this work, we demonstrate that the ptychographic phase problem can be solved in a live fashion during scanning, while data is still being collected. We propose a generally applicable modification of the widespread projection-based algorithms such
Externí odkaz:
http://arxiv.org/abs/2309.08639
Speech emotion conversion is the task of converting the expressed emotion of a spoken utterance to a target emotion while preserving the lexical content and speaker identity. While most existing works in speech emotion conversion rely on acted-out da
Externí odkaz:
http://arxiv.org/abs/2309.07828
Several recent contributions in the field of iterative STFT phase retrieval have demonstrated that the performance of the classical Griffin-Lim method can be considerably improved upon. By using the same projection operators as Griffin-Lim, but combi
Externí odkaz:
http://arxiv.org/abs/2309.07043
We present in this paper an informed single-channel dereverberation method based on conditional generation with diffusion models. With knowledge of the room impulse response, the anechoic utterance is generated via reverse diffusion using a measureme
Externí odkaz:
http://arxiv.org/abs/2306.12286