Zobrazeno 1 - 10
of 169
pro vyhledávání: '"Kamper, Herman"'
We look at the long-standing problem of segmenting unlabeled speech into word-like segments and clustering these into a lexicon. Several previous methods use a scoring model coupled with dynamic programming to find an optimal segmentation. Here we pr
Externí odkaz:
http://arxiv.org/abs/2409.14486
Given an image query, visually prompted keyword localisation (VPKL) aims to find occurrences of the depicted word in a speech collection. This can be useful when transcriptions are not available for a low-resource language (e.g. if it is unwritten).
Externí odkaz:
http://arxiv.org/abs/2409.06013
Discovering a lexicon from unlabeled audio is a longstanding challenge for zero-resource speech processing. One approach is to search for frequently occurring patterns in speech. We revisit this idea with DUSTED: Discrete Unit Spoken-TErm Discovery.
Externí odkaz:
http://arxiv.org/abs/2408.14390
Autor:
Oneata, Dan, Kamper, Herman
Visually grounded speech models link speech to images. We extend this connection by linking images to text via an existing image captioning system, and as a result gain the ability to map speech audio directly to text. This approach can be used for s
Externí odkaz:
http://arxiv.org/abs/2406.07133
When children learn new words, they employ constraints such as the mutual exclusivity (ME) bias: a novel word is mapped to a novel object rather than a familiar one. This bias has been studied computationally, but only in models that use discrete wor
Externí odkaz:
http://arxiv.org/abs/2403.13922
Autor:
Kamper, Herman, van Niekerk, Benjamin
We revisit a self-supervised method that segments unlabelled speech into word-like segments. We start from the two-stage duration-penalised dynamic programming method that performs zero-resource segmentation without learning an explicit lexicon. In t
Externí odkaz:
http://arxiv.org/abs/2401.17902
Autor:
Baas, Matthew, Kamper, Herman
Voice conversion aims to convert source speech into a target voice using recordings of the target speaker as a reference. Newer models are producing increasingly realistic output. But what happens when models are fed with non-standard data, such as s
Externí odkaz:
http://arxiv.org/abs/2310.08104
Voice conversion aims to transform source speech into a different target voice. However, typical voice conversion systems do not account for rhythm, which is an important factor in the perception of speaker identity. To bridge this gap, we introduce
Externí odkaz:
http://arxiv.org/abs/2307.06040
Autor:
Jacobs, Christiaan, Kamper, Herman
Acoustic word embeddings (AWEs) are fixed-dimensional vector representations of speech segments that encode phonetic content so that different realisations of the same word have similar embeddings. In this paper we explore semantic AWE modelling. The
Externí odkaz:
http://arxiv.org/abs/2307.02083
Autor:
Baas, Matthew, Kamper, Herman
Can we develop a model that can synthesize realistic speech directly from a latent space, without explicit conditioning? Despite several efforts over the last decade, previous adversarial and diffusion-based approaches still struggle to achieve this,
Externí odkaz:
http://arxiv.org/abs/2307.01673