Audiovisual event detection towards scene understanding

Autor: Xavier Giró, Javier Hernando, Taras Butko, Climent Nadeu, Josep R. Casas, Carlos Segura, Cristian Canton-Ferrer
Přispěvatelé: Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, Universitat Politècnica de Catalunya. GPI - Grup de Processament d'Imatge i Vídeo, Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla
Rok vydání: 2009
Předmět:
Zdroj: CVPR Workshops
Recercat. Dipósit de la Recerca de Catalunya
instname
UPCommons. Portal del coneixement obert de la UPC
Universitat Politècnica de Catalunya (UPC)
DOI: 10.1109/cvprw.2009.5204264
Popis: Acoustic events produced in meeting environments may contain useful information for perceptually aware interfaces and multimodal behavior analysis. In this paper, a system to detect and recognize these events from a multimodal perspective is presented combining information from multiple cameras and microphones. First, spectral and temporal features are extracted from a single audio channel and spatial localization is achieved by exploiting cross-correlation among microphone arrays. Second, several video cues obtained from multiperson tracking, motion analysis, face recognition, and object detection provide the visual counterpart of the acoustic events to be detected. A multimodal data fusion at score level is carried out using two approaches: weighted mean average and fuzzy integral. Finally, a multimodal database containing a rich variety of acoustic events has been recorded including manual annotations of the data. A set of metrics allow assessing the performance of the presented algorithms. This dataset is made publicly available for research purposes.
Databáze: OpenAIRE