Online Own Voice Detection for a Multi-Channel Multi-Sensor In-Ear Device

Autor:	Ville Myllyla, Anu Huttunen, Eemi Fagerlund, Pasi Pertilä
Přispěvatelé:	Tampere University, Computing Sciences
Rok vydání:	2021
Předmět:	Voice activity detection Interference (communication) Computer science Noise (signal processing) 213 Electronic automation and communications engineering electronics Speech recognition Frame (networking) Wearable computer Context (language use) Electrical and Electronic Engineering Accelerometer Instrumentation Signal
Zdroj:	IEEE Sensors Journal. 21:27686-27697
ISSN:	2379-9153 1530-437X
Popis:	Voice activity detection (VAD) aims for detecting the presence of speech in a given input signal, and is often the first step in voice -based applications such as speech communication systems. In the context of personal devices, own voice detection (OVD) is a sub-task of VAD, since it targets speech detection of the person wearing the device, while ignoring other speakers in the presence of interference signals. This article first summarizes recent single and multi-microphone, multi-sensor, and hearing aids related VAD techniques. Then, a wearable in-ear device equipped with multiple microphones and an accelerometer is investigated for the OVD task using a neural network with input embedding and long short-term memory (LSTM) layers. The device picks up the user’s speech signal through air as well as vibrations through the body. However, besides external sounds the device is sensitive to user’s own non-speech vocal noises (e.g. coughing, yawning, etc.) and movement noise caused by physical activities. A signal mixing model is proposed to produce databases of noisy observations used for training and testing the frame-by-frame OVD method. The best model’s performance is further studied in the presence of different recorded interference. An ablation study reports the model’s performance on sub-sets of sensors. The results show that the OVD approach is robust towards both user motion and user generated vocal non-speech sounds in the presence of loud external interference. The approach is suitable for real-time operation and achieves 90-96 % OVD accuracy in challenging use scenarios with a short 10 ms processing frame length. publishedVersion
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::a0848f0ca458a684b58a1feaa50da7e2 https://doi.org/10.1109/jsen.2021.3122936 Zobrazit plný text záznamu