Zobrazeno 1 - 10
of 29
pro vyhledávání: '"van Niekerk, Benjamin"'
We look at the long-standing problem of segmenting unlabeled speech into word-like segments and clustering these into a lexicon. Several previous methods use a scoring model coupled with dynamic programming to find an optimal segmentation. Here we pr
Externí odkaz:
http://arxiv.org/abs/2409.14486
Discovering a lexicon from unlabeled audio is a longstanding challenge for zero-resource speech processing. One approach is to search for frequently occurring patterns in speech. We revisit this idea with DUSTED: Discrete Unit Spoken-TErm Discovery.
Externí odkaz:
http://arxiv.org/abs/2408.14390
Autor:
Kamper, Herman, van Niekerk, Benjamin
We revisit a self-supervised method that segments unlabelled speech into word-like segments. We start from the two-stage duration-penalised dynamic programming method that performs zero-resource segmentation without learning an explicit lexicon. In t
Externí odkaz:
http://arxiv.org/abs/2401.17902
Voice conversion aims to transform source speech into a different target voice. However, typical voice conversion systems do not account for rhythm, which is an important factor in the perception of speaker identity. To bridge this gap, we introduce
Externí odkaz:
http://arxiv.org/abs/2307.06040
Any-to-any voice conversion aims to transform source speech into a target voice with just a few examples of the target speaker as a reference. Recent methods produce convincing conversions, but at the cost of increased complexity -- making results di
Externí odkaz:
http://arxiv.org/abs/2305.18975
We propose a visually grounded speech model that acquires new words and their visual depictions from just a few word-image example pairs. Given a set of test images and a spoken query, we ask the model which image depicts the query word. Previous wor
Externí odkaz:
http://arxiv.org/abs/2305.15937
Autor:
van Niekerk, Benjamin, Carbonneau, Marc-André, Zaïdi, Julian, Baas, Mathew, Seuté, Hugo, Kamper, Herman
The goal of voice conversion is to transform source speech into a target voice, keeping the content unchanged. In this paper, we focus on self-supervised representation learning for voice conversion. Specifically, we compare discrete and soft speech
Externí odkaz:
http://arxiv.org/abs/2111.02392
Publikováno v:
Proc. Interspeech (2022) 4591-4595
This paper presents Daft-Exprt, a multi-speaker acoustic model advancing the state-of-the-art for cross-speaker prosody transfer on any text. This is one of the most challenging, and rarely directly addressed, task in speech synthesis, especially for
Externí odkaz:
http://arxiv.org/abs/2108.02271
Contrastive predictive coding (CPC) aims to learn representations of speech by distinguishing future observations from a set of negative examples. Previous work has shown that linear classifiers trained on CPC features can accurately predict speaker
Externí odkaz:
http://arxiv.org/abs/2108.00917
Autor:
Van Niekerk, Benjamin
A research report submitted in partial fulfillment of the requirements for the degree of Master of Science in School of Computer Science and Applied Mathematics to the Faculty of Science University of Witwatersrand, 2019
Learning-based methods
Learning-based methods
Externí odkaz:
https://hdl.handle.net/10539/29536