Výsledky vyhledávání - "Berghi, Davide"

Report

Text-Queried Target Sound Event Localization

Autor: Zhao, Jinzheng, Qian, Xinyuan, Xu, Yong, Liu, Haohe, Cao, Yin, Berghi, Davide, Wang, Wenwu

Sound event localization and detection (SELD) aims to determine the appearance of sound classes, together with their Direction of Arrival (DOA). However, current SELD systems can only predict the activities of specific classes, for example, 13 classe

Externí odkaz: http://arxiv.org/abs/2406.16058

Zobrazit plný text záznamu

Report

Audio-Visual Talker Localization in Video for Spatial Sound Reproduction

Autor: Berghi, Davide, Jackson, Philip J. B.

Object-based audio production requires the positional metadata to be defined for each point-source object, including the key elements in the foreground of the sound scene. In many media production use cases, both cameras and microphones are employed

Externí odkaz: http://arxiv.org/abs/2406.00495

Zobrazit plný text záznamu

Report

Leveraging Visual Supervision for Array-based Active Speaker Detection and Localization

Autor: Berghi, Davide, Jackson, Philip J. B.

Conventional audio-visual approaches for active speaker detection (ASD) typically rely on visually pre-extracted face tracks and the corresponding single-channel audio to find the speaker in a video. Therefore, they tend to fail every time the face o

Externí odkaz: http://arxiv.org/abs/2312.14021

Zobrazit plný text záznamu

Report

Fusion of Audio and Visual Embeddings for Sound Event Localization and Detection

Autor: Berghi, Davide, Wu, Peipei, Zhao, Jinzheng, Wang, Wenwu, Jackson, Philip J. B.

Sound event localization and detection (SELD) combines two subtasks: sound event detection (SED) and direction of arrival (DOA) estimation. SELD is usually tackled as an audio-only problem, but visual information has been recently included. Few audio

Externí odkaz: http://arxiv.org/abs/2312.09034

Zobrazit plný text záznamu

Report

Audio-Visual Speaker Tracking: Progress, Challenges, and Future Directions

Autor: Zhao, Jinzheng, Xu, Yong, Qian, Xinyuan, Berghi, Davide, Wu, Peipei, Cui, Meng, Sun, Jianyuan, Jackson, Philip J. B., Wang, Wenwu

Audio-visual speaker tracking has drawn increasing attention over the past few years due to its academic values and wide application. Audio and visual modalities can provide complementary information for localization and tracking. With audio and visu

Externí odkaz: http://arxiv.org/abs/2310.14778

Zobrazit plný text záznamu

Report

Audio Inputs for Active Speaker Detection and Localization via Microphone Array

Autor: Berghi, Davide, Jackson, Philip J. B.

This study considers the problem of detecting and locating an active talker's horizontal position from multichannel audio captured by a microphone array. We refer to this as active speaker detection and localization (ASDL). Our goal was to investigat

Externí odkaz: http://arxiv.org/abs/2307.14739

Zobrazit plný text záznamu

Report

Tragic Talkers: A Shakespearean Sound- and Light-Field Dataset for Audio-Visual Machine Learning Research

Autor: Berghi, Davide, Volino, Marco, Jackson, Philip J. B.

3D audio-visual production aims to deliver immersive and interactive experiences to the consumer. Yet, faithfully reproducing real-world 3D scenes remains a challenging task. This is partly due to the lack of available datasets enabling audio-visual

Externí odkaz: http://arxiv.org/abs/2212.01892

Zobrazit plný text záznamu

Report

Visually Supervised Speaker Detection and Localization via Microphone Array

Autor: Berghi, Davide, Hilton, Adrian, Jackson, Philip J. B.

Publikováno v: IEEE 23rd International Workshop on Multimedia Signal Processing (MMSP), 2021

Active speaker detection (ASD) is a multi-modal task that aims to identify who, if anyone, is speaking from a set of candidates. Current audio-visual approaches for ASD typically rely on visually pre-extracted face tracks (sequences of consecutive fa

Externí odkaz: http://arxiv.org/abs/2203.03291

Zobrazit plný text záznamu

Report

Naturalistic audio-visual volumetric sequences dataset of sounding actions for six degree-of-freedom interaction

Autor: Stenzel, Hanne, Berghi, Davide, Volino, Marco, Jackson, Philip J. B.

As audio-visual systems increasingly bring immersive and interactive capabilities into our work and leisure activities, so the need for naturalistic test material grows. New volumetric datasets have captured high-quality 3D video, but accompanying au

Externí odkaz: http://arxiv.org/abs/2105.00641

Zobrazit plný text záznamu

Report

Audio-Visual Spatial Aligment Requirements of Central and Peripheral Object Events

Autor: Berghi, Davide, Stenzel, Hanne, Volino, Marco, Hilton, Adrian, Jackson, Philip J. B.

Publikováno v: IEEE VR 2020

Immersive audio-visual perception relies on the spatial integration of both auditory and visual information which are heterogeneous sensing modalities with different fields of reception and spatial resolution. This study investigates the perceived co

Externí odkaz: http://arxiv.org/abs/2003.06656

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání