Regression versus classification for neural network based audio source localization

Autor:	Alexandre Défossez, Alexandre Guerin, Emmanuel Vincent, Romain Serizel, Laureline Perotin
Přispěvatelé:	Perotin, Lauréline, Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Laboratoire d'informatique de l'école normale supérieure (LIENS), École normale supérieure - Paris (ENS Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS), Orange Labs [Cesson-Sévigné], Orange Labs, IEEE, Département d'informatique - ENS Paris (DI-ENS), École normale supérieure - Paris (ENS-PSL), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS-PSL), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Paris (ENS Paris)
Jazyk:	angličtina
Rok vydání:	2019
Předmět:	Mean squared error soft target Computer science [INFO.INFO-TS] Computer Science [cs]/Signal and Image Processing [INFO.INFO-NE] Computer Science [cs]/Neural and Evolutionary Computing [cs.NE] 02 engineering and technology [INFO.INFO-NE]Computer Science [cs]/Neural and Evolutionary Computing [cs.NE] law.invention 030507 speech-language pathology & audiology 03 medical and health sciences [INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing law training criterion 0202 electrical engineering electronic engineering information engineering Cartesian coordinate system angular loss Cost-sensitive classification Artificial neural network Angular distance Probabilistic logic Spherical coordinate system 020206 networking & telecommunications Grid Regression Direction-of-arrival 0305 other medical science Algorithm
Zdroj:	WASPAA WASPAA 2019-IEEE Workshop on Applications of Signal Processing to Audio and Acoustics WASPAA 2019-IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, IEEE, Oct 2019, New Paltz, United States
Popis:	International audience; We compare the performance of regression and classification neural networks for single-source direction-of-arrival estimation. Since the output space is continuous and structured, regression seems more appropriate. However, classification on a discrete spherical grid is widely believed to perform better and is predominantly used in the literature. For regression, we propose two ways to account for the spherical geometry of the output space based either on the angular distance between spherical coordinates or on the mean squared error between Cartesian coordinates. For classification, we propose two alternatives to the classical one-hot encoding framework: we derive a Gibbs distribution from the squared angular distance between grid points and use the corresponding probabilities either as soft targets or as cross-entropy weights that retain a clear probabilis-tic interpretation. We show that regression on Cartesian coordinates is generally more accurate, except when localized interference is present, in which case classification appears to be more robust.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::e53e7a2f36dfe70e0f0c28479789e1f0 https://hal.inria.fr/hal-02125985v2 Zobrazit plný text záznamu