Revisiting speech segmentation and lexicon learning with better features

Autor:	Kamper, Herman, van Niekerk, Benjamin
Rok vydání:	2024
Předmět:	Electrical Engineering and Systems Science - Audio and Speech Processing Computer Science - Computation and Language Computer Science - Sound
Druh dokumentu:	Working Paper
Popis:	We revisit a self-supervised method that segments unlabelled speech into word-like segments. We start from the two-stage duration-penalised dynamic programming method that performs zero-resource segmentation without learning an explicit lexicon. In the first acoustic unit discovery stage, we replace contrastive predictive coding features with HuBERT. After word segmentation in the second stage, we get an acoustic word embedding for each segment by averaging HuBERT features. These embeddings are clustered using K-means to get a lexicon. The result is good full-coverage segmentation with a lexicon that achieves state-of-the-art performance on the ZeroSpeech benchmarks. Comment: 2 pages
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2401.17902 Zobrazit plný text záznamu View this record from Arxiv