Speaker characterization by means of attention pooling

Autor:	Federico Costa, Miquel India, Javier Hernando
Přispěvatelé:	Universitat Politècnica de Catalunya. Doctorat en Teoria del Senyal i Comunicacions, Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, Universitat Politècnica de Catalunya. IDEAI-UPC - Intelligent Data sciEnce and Artificial Intelligence Research Group
Jazyk:	angličtina
Rok vydání:	2022
Předmět:	Multi-head self-attention Speaker characterization Automatic speech recognition Double attention Enginyeria de la telecomunicació::Processament del senyal::Processament de la parla i del senyal acústic [Àrees temàtiques de la UPC] Speaker verification Deep learning Reconeixement automàtic de la parla Speech recognition Aprenentatge profund
Popis:	State-of-the-art Deep Learning systems for speaker verification are commonly based on speaker embedding extractors. These architectures are usually composed of a feature extractor front-end together with a pooling layer to encode variable length utterances into fixed-length speaker vectors. The authors have recently proposed the use of a Double Multi-Head Self Attention pooling for speaker recognition, placed between a CNN-based front-end and a set of fully connected layers. This has shown to be an excellent approach to efficiently select the most relevant features captured by the front-end from the speech signal. In this paper we show excellent experimental results by adapting this architecture to other different speaker characterization tasks, such as emotion recognition, sex classification and COVID-19 detection.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::5b8b201cf2a658d62d21ca37362b74a1 https://hdl.handle.net/2117/384802 Zobrazit plný text záznamu