Zobrazeno 1 - 10
of 161
pro vyhledávání: '"Kamper, Herman"'
Discovering a lexicon from unlabeled audio is a longstanding challenge for zero-resource speech processing. One approach is to search for frequently occurring patterns in speech. We revisit this idea with DUSTED: Discrete Unit Spoken-TErm Discovery.
Externí odkaz:
http://arxiv.org/abs/2408.14390
Autor:
Oneata, Dan, Kamper, Herman
Visually grounded speech models link speech to images. We extend this connection by linking images to text via an existing image captioning system, and as a result gain the ability to map speech audio directly to text. This approach can be used for s
Externí odkaz:
http://arxiv.org/abs/2406.07133
When children learn new words, they employ constraints such as the mutual exclusivity (ME) bias: a novel word is mapped to a novel object rather than a familiar one. This bias has been studied computationally, but only in models that use discrete wor
Externí odkaz:
http://arxiv.org/abs/2403.13922
Autor:
Kamper, Herman, van Niekerk, Benjamin
We revisit a self-supervised method that segments unlabelled speech into word-like segments. We start from the two-stage duration-penalised dynamic programming method that performs zero-resource segmentation without learning an explicit lexicon. In t
Externí odkaz:
http://arxiv.org/abs/2401.17902
Autor:
Baas, Matthew, Kamper, Herman
Voice conversion aims to convert source speech into a target voice using recordings of the target speaker as a reference. Newer models are producing increasingly realistic output. But what happens when models are fed with non-standard data, such as s
Externí odkaz:
http://arxiv.org/abs/2310.08104
Voice conversion aims to transform source speech into a different target voice. However, typical voice conversion systems do not account for rhythm, which is an important factor in the perception of speaker identity. To bridge this gap, we introduce
Externí odkaz:
http://arxiv.org/abs/2307.06040
Autor:
Jacobs, Christiaan, Kamper, Herman
Acoustic word embeddings (AWEs) are fixed-dimensional vector representations of speech segments that encode phonetic content so that different realisations of the same word have similar embeddings. In this paper we explore semantic AWE modelling. The
Externí odkaz:
http://arxiv.org/abs/2307.02083
Autor:
Baas, Matthew, Kamper, Herman
Can we develop a model that can synthesize realistic speech directly from a latent space, without explicit conditioning? Despite several efforts over the last decade, previous adversarial and diffusion-based approaches still struggle to achieve this,
Externí odkaz:
http://arxiv.org/abs/2307.01673
We propose a visually grounded speech model that learns new words and their visual depictions from just a few word-image example pairs. Given a set of test images and a spoken query, we ask the model which image depicts the query word. Previous work
Externí odkaz:
http://arxiv.org/abs/2306.11371
Autor:
Jacobs, Christiaan, Rakotonirina, Nathanaël Carraz, Chimoto, Everlyn Asiko, Bassett, Bruce A., Kamper, Herman
We consider hate speech detection through keyword spotting on radio broadcasts. One approach is to build an automatic speech recognition (ASR) system for the target low-resource language. We compare this to using acoustic word embedding (AWE) models
Externí odkaz:
http://arxiv.org/abs/2306.00410