Zobrazeno 1 - 10
of 63
pro vyhledávání: '"Edler, Bernd"'
This paper introduces FlowMAC, a novel neural audio codec for high-quality general audio compression at low bit rates based on conditional flow matching (CFM). FlowMAC jointly learns a mel spectrogram encoder, quantizer and decoder. At inference time
Externí odkaz:
http://arxiv.org/abs/2409.17635
This paper proposes SEFGAN, a Deep Neural Network (DNN) combining maximum likelihood training and Generative Adversarial Networks (GANs) for efficient speech enhancement (SE). For this, a DNN is trained to synthesize the enhanced speech conditioned o
Externí odkaz:
http://arxiv.org/abs/2312.01744
Dialogue Enhancement (DE) enables the rebalancing of dialogue and background sounds to fit personal preferences and needs in the context of broadcast audio. When individual audio stems are unavailable from production, Dialogue Separation (DS) can be
Externí odkaz:
http://arxiv.org/abs/2305.19100
Deep generative models for Speech Enhancement (SE) received increasing attention in recent years. The most prominent example are Generative Adversarial Networks (GANs), while normalizing flows (NF) received less attention despite their potential. Bui
Externí odkaz:
http://arxiv.org/abs/2210.11654
Frequency domain processing, and in particular the use of Modified Discrete Cosine Transform (MDCT), is the most widespread approach to audio coding. However, at low bitrates, audio quality, especially for speech, degrades drastically due to the lack
Externí odkaz:
http://arxiv.org/abs/2201.12039
Autor:
Strauss, Martin, Edler, Bernd
Speech enhancement involves the distinction of a target speech signal from an intrusive background. Although generative approaches using Variational Autoencoders or Generative Adversarial Networks (GANs) have increasingly been used in recent years, n
Externí odkaz:
http://arxiv.org/abs/2106.09008
This paper describes a hands-on comparison on using state-of-the-art music source separation deep neural networks (DNNs) before and after task-specific fine-tuning for separating speech content from non-speech content in broadcast audio (i.e., dialog
Externí odkaz:
http://arxiv.org/abs/2106.09093
The task of estimating the maximum number of concurrent speakers from single channel mixtures is important for various audio-based applications, such as blind source separation, speaker diarisation, audio surveillance or auditory scene classification
Externí odkaz:
http://arxiv.org/abs/1712.04555
Autor:
Guo, Ning, Edler, Bernd
Publikováno v:
IEEE Signal Processing Letters; 2025, Vol. 32 Issue: 1 p31-35, 5p
Publikováno v:
2022 IEEE Spoken Language Technology Workshop (SLT).
Deep generative models for Speech Enhancement (SE) received increasing attention in recent years. The most prominent example are Generative Adversarial Networks (GANs), while normalizing flows (NF) received less attention despite their potential. Bui