Audiovisual event detection towards scene understanding
Autor: | Xavier Giró, Javier Hernando, Taras Butko, Climent Nadeu, Josep R. Casas, Carlos Segura, Cristian Canton-Ferrer |
---|---|
Přispěvatelé: | Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions, Universitat Politècnica de Catalunya. GPI - Grup de Processament d'Imatge i Vídeo, Universitat Politècnica de Catalunya. VEU - Grup de Tractament de la Parla |
Rok vydání: | 2009 |
Předmět: |
Motion analysis
Object detection Computer science Feature extraction Motion estimation computer.software_genre Facial recognition system Human face recognition (Computer science) Informàtica [Àrees temàtiques de la UPC] Computer vision Face recognition Audio signal processing Sensor fusion Reconeixement facial (Informàtica) business.industry Event (computing) Pattern recognition Transforms Video signal processing Enginyeria de la telecomunicació::Processament del senyal::Processament de la parla i del senyal acústic [Àrees temàtiques de la UPC] Artificial intelligence business computer |
Zdroj: | CVPR Workshops Recercat. Dipósit de la Recerca de Catalunya instname UPCommons. Portal del coneixement obert de la UPC Universitat Politècnica de Catalunya (UPC) |
DOI: | 10.1109/cvprw.2009.5204264 |
Popis: | Acoustic events produced in meeting environments may contain useful information for perceptually aware interfaces and multimodal behavior analysis. In this paper, a system to detect and recognize these events from a multimodal perspective is presented combining information from multiple cameras and microphones. First, spectral and temporal features are extracted from a single audio channel and spatial localization is achieved by exploiting cross-correlation among microphone arrays. Second, several video cues obtained from multiperson tracking, motion analysis, face recognition, and object detection provide the visual counterpart of the acoustic events to be detected. A multimodal data fusion at score level is carried out using two approaches: weighted mean average and fuzzy integral. Finally, a multimodal database containing a rich variety of acoustic events has been recorded including manual annotations of the data. A set of metrics allow assessing the performance of the presented algorithms. This dataset is made publicly available for research purposes. |
Databáze: | OpenAIRE |
Externí odkaz: |