Creating a new research community on detection and classification of acoustic scenes and events: Lessons from the first ten years of DCASE challenges and workshops

Autor: Mark Plumbley, Tuomas Virtanen
Rok vydání: 2023
Zdroj: INTER-NOISE and NOISE-CON Congress and Conference Proceedings. 265:4472-4479
ISSN: 0736-2935
DOI: 10.3397/in_2022_0643
Popis: Research work on automatic speech recognition and automatic music transcription has been around for several decades, supported by dedicated conferences or conference sessions. However, while individual researchers have been working on recognition of more general environmental sounds, until ten years ago there were no regular workshops or conference sessions where this research, or its researchers, could be found. There was also little available data for researchers to work on or to benchmark their work. In this talk we will outline how a new research community working on Detection and Classification of Acoustic Scenes and Events (DCASE) has grown over the last ten years, from two challenges on acoustic scene classification and sound event detection with a small workshop poster session, to an annual data challenge with six tasks and a dedicated annual workshop, attracting hundreds of delegates and strong industry interest. We will also describe how the analysis methods have evolved, from mel frequency cepstral coefficients (MFCCs) or cochelograms classified by support vector machines (SVMs) or hidden Markov models (HMMs), to deep learning methods such as transfer learning, transformers, and self-supervised learning. We will finish by suggesting some potential future directions for automatic sound recognition and the DCASE community.
Databáze: OpenAIRE