Zobrazeno 1 - 10
of 1 032
pro vyhledávání: '"A, Cernák"'
Autor:
Coldenhoff, Jozef, Cernak, Milos
Assessment of audio by humans possesses the unique ability to attend to specific sources in a mixture of signals. Mimicking this human ability, we propose a semi-intrusive assessment where we frame the audio assessment task as a text prediction task
Externí odkaz:
http://arxiv.org/abs/2409.14069
Audio and speech coding lack unified evaluation and open-source testing. Many candidate systems were evaluated on proprietary, non-reproducible, or small data, and machine learning-based codecs are often tested on datasets with similar distributions
Externí odkaz:
http://arxiv.org/abs/2409.08374
Recently, diffusion-based generative models have demonstrated remarkable performance in speech enhancement tasks. However, these methods still encounter challenges, including the lack of structural information and poor performance in low Signal-to-No
Externí odkaz:
http://arxiv.org/abs/2409.05116
Autor:
Meng, Lingjun, Coldenhoff, Jozef, Kendrick, Paul, Stojkovic, Tijana, Harper, Andrew, Ratmanski, Kiril, Cernak, Milos
Recently, multi-stage systems have stood out among deep learning-based speech enhancement methods. However, these systems are always high in complexity, requiring millions of parameters and powerful computational resources, which limits their applica
Externí odkaz:
http://arxiv.org/abs/2312.12415
Previous methods for predicting room acoustic parameters and speech quality metrics have focused on the single-channel case, where room acoustics and Mean Opinion Score (MOS) are predicted for a single recording device. However, quality-based device
Externí odkaz:
http://arxiv.org/abs/2309.11976
Deep learning models have become widely adopted in various domains, but their performance heavily relies on a vast amount of data. Datasets often contain a large number of irrelevant or redundant samples, which can lead to computational inefficiencie
Externí odkaz:
http://arxiv.org/abs/2309.11922
The recent ubiquitous adoption of remote conferencing has been accompanied by omnipresent frustration with distorted or otherwise unclear voice communication. Audio enhancement can compensate for low-quality input signals from, for example, small tru
Externí odkaz:
http://arxiv.org/abs/2309.02393
Since the mental states of the speaker modulate speech, stress introduced by cognitive or physical loads could be detected in the voice. The existing voice stress detection benchmark has shown that the audio embeddings extracted from the Hybrid BYOL-
Externí odkaz:
http://arxiv.org/abs/2306.05915
This paper presents ALO-VC, a non-parallel low-latency one-shot phonetic posteriorgrams (PPGs) based voice conversion method. ALO-VC enables any-to-any voice conversion using only one utterance from the target speaker, with only 47.5 ms future look-a
Externí odkaz:
http://arxiv.org/abs/2306.01100
Estimating the quality of remote speech communication is a complex task influenced by the speaker, transmission channel, and listener. For example, the degradation of transmission quality can increase listeners' cognitive load, which can influence th
Externí odkaz:
http://arxiv.org/abs/2303.00630