Using Speech Foundational Models in Loss Functions for Hearing Aid Speech Enhancement

Autor:	Sutherland, Robert, Close, George, Hain, Thomas, Goetze, Stefan, Barker, Jon
Rok vydání:	2024
Předmět:	Computer Science - Sound Electrical Engineering and Systems Science - Audio and Speech Processing
Druh dokumentu:	Working Paper
Popis:	Machine learning techniques are an active area of research for speech enhancement for hearing aids, with one particular focus on improving the intelligibility of a noisy speech signal. Recent work has shown that feature encodings from self-supervised speech representation models can effectively capture speech intelligibility. In this work, it is shown that the distance between self-supervised speech representations of clean and noisy speech correlates more strongly with human intelligibility ratings than other signal-based metrics. Experiments show that training a speech enhancement model using this distance as part of a loss function improves the performance over using an SNR-based loss function, demonstrated by an increase in HASPI, STOI, PESQ and SI-SNR scores. This method takes inference of a high parameter count model only at training time, meaning the speech enhancement model can remain smaller, as is required for hearing aids. Comment: Accepted for EUSIPCO 2024
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2407.13333 Zobrazit plný text záznamu View this record from Arxiv