Wavelet Scattering Transform and CNN for Closed Set Speaker Identification

Autor:	Olivier Lezoray, Luc Brun, Wajdi Ghezaiel
Přispěvatelé:	Fédération Normande de Recherche en Sciences et Technologies de l’Information et de la Communication (NormaStic), Université de Caen Normandie (UNICAEN), Normandie Université (NU)-Normandie Université (NU)-École Nationale Supérieure d'Ingénieurs de Caen (ENSICAEN), Normandie Université (NU)-Université Le Havre Normandie (ULH), Normandie Université (NU)-Université de Rouen Normandie (UNIROUEN), Normandie Université (NU)-Institut national des sciences appliquées Rouen Normandie (INSA Rouen Normandie), Institut National des Sciences Appliquées (INSA)-Normandie Université (NU)-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS), Equipe Image - Laboratoire GREYC - UMR6072, Groupe de Recherche en Informatique, Image et Instrumentation de Caen (GREYC), Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Ingénieurs de Caen (ENSICAEN), Normandie Université (NU)-Normandie Université (NU)-Université de Caen Normandie (UNICAEN), Normandie Université (NU)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Ingénieurs de Caen (ENSICAEN), Normandie Université (NU), Lezoray, Olivier
Jazyk:	angličtina
Rok vydání:	2020
Předmět:	hybrid net- work Closed set Computer science Speech recognition Feature extraction Initialization convolutional neural network 02 engineering and technology Convolutional neural network Reduction (complexity) 030507 speech-language pathology & audiology 03 medical and health sciences Identification (information) Wavelet [INFO.INFO-TI] Computer Science [cs]/Image Processing [eess.IV] short utterances wavelet scattering transform [INFO.INFO-TI]Computer Science [cs]/Image Processing [eess.IV] 0202 electrical engineering electronic engineering information engineering Waveform 020201 artificial intelligence & image processing Speaker identification 0305 other medical science
Zdroj:	International Workshop on Multimedia Signal Processing (MMSP) International Workshop on Multimedia Signal Processing (MMSP), Sep 2020, Tampere (Virtual conference), Finland MMSP
Popis:	International audience; In real world applications, the performances of speaker identification systems degrade due to the reduction of both the amount and the quality of speech utterance. For that particular purpose, we propose a speaker identification system where short utterances with few training examples are used for person identification. Therefore, only a very small amount of data involving a sentence of 2-4 seconds is used. To achieve this, we propose a novel raw waveform end-to-end convolutional neural network (CNN) for text-independent speaker identification. We use wavelet scattering transform as a fixed initialization of the first layers of a CNN network, and learn the remaining layers in a supervised manner. The conducted experiments show that our hybrid architecture combining wavelet scattering transform and CNN can successfully perform efficient feature extraction for a speaker identification, even with a small number of short duration training samples.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::5695b22f381699e9e83f21eb8ef47921 https://hal.archives-ouvertes.fr/hal-02955532/file/MMSP2020.pdf Zobrazit plný text záznamu