Výsledky vyhledávání - "Kamper, Herman"

Report

Spoken-Term Discovery using Discrete Speech Units

Autor: van Niekerk, Benjamin, Zaïdi, Julian, Carbonneau, Marc-André, Kamper, Herman

Discovering a lexicon from unlabeled audio is a longstanding challenge for zero-resource speech processing. One approach is to search for frequently occurring patterns in speech. We revisit this idea with DUSTED: Discrete Unit Spoken-TErm Discovery.

Externí odkaz: http://arxiv.org/abs/2408.14390

Zobrazit plný text záznamu

Report

Translating speech with just images

Autor: Oneata, Dan, Kamper, Herman

Visually grounded speech models link speech to images. We extend this connection by linking images to text via an existing image captioning system, and as a result gain the ability to map speech audio directly to text. This approach can be used for s

Externí odkaz: http://arxiv.org/abs/2406.07133

Zobrazit plný text záznamu

Report

Visually Grounded Speech Models have a Mutual Exclusivity Bias

Autor: Nortje, Leanne, Oneaţă, Dan, Matusevych, Yevgen, Kamper, Herman

When children learn new words, they employ constraints such as the mutual exclusivity (ME) bias: a novel word is mapped to a novel object rather than a familiar one. This bias has been studied computationally, but only in models that use discrete wor

Externí odkaz: http://arxiv.org/abs/2403.13922

Zobrazit plný text záznamu

Report

Revisiting speech segmentation and lexicon learning with better features

Autor: Kamper, Herman, van Niekerk, Benjamin

We revisit a self-supervised method that segments unlabelled speech into word-like segments. We start from the two-stage duration-penalised dynamic programming method that performs zero-resource segmentation without learning an explicit lexicon. In t

Externí odkaz: http://arxiv.org/abs/2401.17902

Zobrazit plný text záznamu

Report

Voice Conversion for Stuttered Speech, Instruments, Unseen Languages and Textually Described Voices

Autor: Baas, Matthew, Kamper, Herman

Voice conversion aims to convert source speech into a target voice using recordings of the target speaker as a reference. Newer models are producing increasingly realistic output. But what happens when models are fed with non-standard data, such as s

Externí odkaz: http://arxiv.org/abs/2310.08104

Zobrazit plný text záznamu

Report

Rhythm Modeling for Voice Conversion

Autor: van Niekerk, Benjamin, Carbonneau, Marc-André, Kamper, Herman

Voice conversion aims to transform source speech into a different target voice. However, typical voice conversion systems do not account for rhythm, which is an important factor in the perception of speaker identity. To bridge this gap, we introduce

Externí odkaz: http://arxiv.org/abs/2307.06040

Zobrazit plný text záznamu

Report

Leveraging multilingual transfer for unsupervised semantic acoustic word embeddings

Autor: Jacobs, Christiaan, Kamper, Herman

Acoustic word embeddings (AWEs) are fixed-dimensional vector representations of speech segments that encode phonetic content so that different realisations of the same word have similar embeddings. In this paper we explore semantic AWE modelling. The

Externí odkaz: http://arxiv.org/abs/2307.02083

Zobrazit plný text záznamu

Report

Disentanglement in a GAN for Unconditional Speech Synthesis

Autor: Baas, Matthew, Kamper, Herman

Can we develop a model that can synthesize realistic speech directly from a latent space, without explicit conditioning? Despite several efforts over the last decade, previous adversarial and diffusion-based approaches still struggle to achieve this,

Externí odkaz: http://arxiv.org/abs/2307.01673

Zobrazit plný text záznamu

Report

Visually grounded few-shot word learning in low-resource settings

Autor: Nortje, Leanne, Oneata, Dan, Kamper, Herman

We propose a visually grounded speech model that learns new words and their visual depictions from just a few word-image example pairs. Given a set of test images and a spoken query, we ask the model which image depicts the query word. Previous work

Externí odkaz: http://arxiv.org/abs/2306.11371

Zobrazit plný text záznamu

Report

Towards hate speech detection in low-resource languages: Comparing ASR to acoustic word embeddings on Wolof and Swahili

Autor: Jacobs, Christiaan, Rakotonirina, Nathanaël Carraz, Chimoto, Everlyn Asiko, Bassett, Bruce A., Kamper, Herman

We consider hate speech detection through keyword spotting on radio broadcasts. One approach is to build an automatic speech recognition (ASR) system for the target low-resource language. We compare this to using acoustic word embedding (AWE) models

Externí odkaz: http://arxiv.org/abs/2306.00410

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání