Wavelet Scattering Transform and CNN for Closed Set Speaker Identification
Autor: | Olivier Lezoray, Luc Brun, Wajdi Ghezaiel |
---|---|
Přispěvatelé: | Fédération Normande de Recherche en Sciences et Technologies de l’Information et de la Communication (NormaStic), Université de Caen Normandie (UNICAEN), Normandie Université (NU)-Normandie Université (NU)-École Nationale Supérieure d'Ingénieurs de Caen (ENSICAEN), Normandie Université (NU)-Université Le Havre Normandie (ULH), Normandie Université (NU)-Université de Rouen Normandie (UNIROUEN), Normandie Université (NU)-Institut national des sciences appliquées Rouen Normandie (INSA Rouen Normandie), Institut National des Sciences Appliquées (INSA)-Normandie Université (NU)-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS), Equipe Image - Laboratoire GREYC - UMR6072, Groupe de Recherche en Informatique, Image et Instrumentation de Caen (GREYC), Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Ingénieurs de Caen (ENSICAEN), Normandie Université (NU)-Normandie Université (NU)-Université de Caen Normandie (UNICAEN), Normandie Université (NU)-Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Ingénieurs de Caen (ENSICAEN), Normandie Université (NU), Lezoray, Olivier |
Jazyk: | angličtina |
Rok vydání: | 2020 |
Předmět: |
hybrid net- work
Closed set Computer science Speech recognition Feature extraction Initialization convolutional neural network 02 engineering and technology Convolutional neural network Reduction (complexity) 030507 speech-language pathology & audiology 03 medical and health sciences Identification (information) Wavelet [INFO.INFO-TI] Computer Science [cs]/Image Processing [eess.IV] short utterances wavelet scattering transform [INFO.INFO-TI]Computer Science [cs]/Image Processing [eess.IV] 0202 electrical engineering electronic engineering information engineering Waveform 020201 artificial intelligence & image processing Speaker identification 0305 other medical science |
Zdroj: | International Workshop on Multimedia Signal Processing (MMSP) International Workshop on Multimedia Signal Processing (MMSP), Sep 2020, Tampere (Virtual conference), Finland MMSP |
Popis: | International audience; In real world applications, the performances of speaker identification systems degrade due to the reduction of both the amount and the quality of speech utterance. For that particular purpose, we propose a speaker identification system where short utterances with few training examples are used for person identification. Therefore, only a very small amount of data involving a sentence of 2-4 seconds is used. To achieve this, we propose a novel raw waveform end-to-end convolutional neural network (CNN) for text-independent speaker identification. We use wavelet scattering transform as a fixed initialization of the first layers of a CNN network, and learn the remaining layers in a supervised manner. The conducted experiments show that our hybrid architecture combining wavelet scattering transform and CNN can successfully perform efficient feature extraction for a speaker identification, even with a small number of short duration training samples. |
Databáze: | OpenAIRE |
Externí odkaz: |