Speech Taskonomy: Which Speech Tasks are the most Predictive of fMRI Brain Activity?

Autor: Reddy Oota, Subba, Veeral, Agarwal, Mounika, Marreddy, Manish, Gupta, Bapi, Raju Surampudi
Přispěvatelé: Laboratoire Bordelais de Recherche en Informatique (LaBRI), Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS), Mnemonic Synergy (Mnemosyne), Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Institut des Maladies Neurodégénératives [Bordeaux] (IMN), Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-Centre National de la Recherche Scientifique (CNRS), International Institute of Information Technology, Hyderabad [Hyderabad] (IIIT-H), Microsoft Research (MSR)
Jazyk: angličtina
Rok vydání: 2023
Předmět:
Zdroj: INTERSPEECH 2023-24th INTERSPEECH Conference
INTERSPEECH 2023-24th INTERSPEECH Conference, Aug 2023, Dublin, Ireland
Popis: International audience; Self-supervised speech based models have been found to be successful in predicting brain recordings of subjects experiencing naturalistic story listening. Inspired by the recent progress on deep learning models for various speech-processing tasks, existing literature has leveraged pretrained speech Transformer models for brain encoding. However, there is no work on exploring the efficacy of task-specific finetuned Transformer representations for this task. Hence, in this paper, we explore transfer learning from representations finetuned for eight different tasks from Speech processing Universal PERformance Benchmark (SUPERB) for predicting brain responses. Encoding models based on task features are used to predict activity in different regions across the whole brain, and also in language and auditory brain regions. Our experiments on finetuning the Wav2Vec2.0 model for these eight tasks show that the model finetuned on automatic speech recognition (ASR) yields the best encoding performance for the whole brain, language and auditory regions.
Databáze: OpenAIRE