Analysis of ABC Submission to NIST SRE 2019 CMN and VAST Challenge

Autor: Lukas Burget, Mohamed Dahmane, Gilles Boulianne, Pierre-Luc St-Charles, Oldrich Plchot, Josef Slavícek, Cedric Noiseux, Jahangir Alam, Pavel Matejka, Ondrej Novotný, Marc Lalonde, Themos Stafylakis, Ondrej Glembek, Hossein Zeinali, Johan Rohdin, Mireia Diez Sánchez, Petr Mizera, Anna Silnova, Alicia Lozano-Diez, Shuai Wang, Joao M. Monteiro, Ladislav Mosner
Přispěvatelé: UAM. Departamento de Tecnología Electrónica y de las Comunicaciones
Rok vydání: 2020
Předmět:
Zdroj: Biblos-e Archivo. Repositorio Institucional de la UAM
Universidad Camilo José Cela (UCJC)
Odyssey 2020 The Speaker and Language Recognition Workshop
Odyssey
Popis: We present a condensed description and analysis of the joint submission of ABC team for NIST SRE 2019, by BUT, CRIM, Phonexia, Omilia and UAM. We concentrate on challenges that arose during development and we analyze the results obtained on the evaluation data and on our development sets. The conversational telephone speech (CMN2) condition is challenging for current state-of-the-art systems, mainly due to the language mismatch between training and test data. We show that a combination of adversarial domain adaptation, backend adaptation and score normalization can mitigate this mismatch. On the VAST condition, we demonstrate the importance of deploying diarization when dealing with multi-speaker utterances and the drastic improvements that can be obtained by combining audio and visual modalities
BUT researchers were supported by Czech Ministry of Interior projects Nos. VI20152020025 “DRAPAK” and VI20192022169 “AI v TiV”, Czech National Science Foundation (GACR) project “NEUREM3” No. 19-26934X, European Union’s Marie Sklodowska-Curie grant agreement No. 843627, European Union’s Horizon 2020 grant agreement no. 833635 “ROXANNE” and by Czech Ministry of Education, Youth and Sports from the National Programme of Sustainability (NPU II) project “IT4Innovations excellence in science” - LQ1602. CRIM researchers wish to acknowledge funding from the Natural Sciences and Engineering Research Council of Canada (NSERC) through grant RGPIN-2019-05381 and Ministry of Economy and Innovation (MEI) of the Government of Quebec for the continued support
Databáze: OpenAIRE