Zobrazeno 1 - 10
of 2 394
pro vyhledávání: '"Zisserman, A."'
In this paper, we present a novel keypoint-based classification model designed to recognise British Sign Language (BSL) words within continuous signing sequences. Our model's performance is assessed using the BOBSL dataset, revealing that the keypoin
Externí odkaz:
http://arxiv.org/abs/2412.09475
Scoliosis is traditionally assessed based solely on 2D lateral deviations, but recent studies have also revealed the importance of other imaging planes in understanding the deformation of the spine. Consequently, extracting the spinal geometry in 3D
Externí odkaz:
http://arxiv.org/abs/2412.01504
Following the successful 2023 edition, we organised the Second Perception Test challenge as a half-day workshop alongside the IEEE/CVF European Conference on Computer Vision (ECCV) 2024, with the goal of benchmarking state-of-the-art video models and
Externí odkaz:
http://arxiv.org/abs/2411.19941
We study the connection between audio-visual observations and the underlying physics of a mundane yet intriguing everyday activity: pouring liquids. Given only the sound of liquid pouring into a container, our objective is to automatically infer phys
Externí odkaz:
http://arxiv.org/abs/2411.11222
We discuss some consistent issues on how RepNet has been evaluated in various papers. As a way to mitigate these issues, we report RepNet performance results on different datasets, and release evaluation code and the RepNet checkpoint to obtain these
Externí odkaz:
http://arxiv.org/abs/2411.08878
Publikováno v:
vol 15005, 2024, pp 101-111
We propose a general pipeline to automate the extraction of labels from radiology reports using large language models, which we validate on spinal MRI reports. The efficacy of our labelling method is measured on five distinct conditions: spinal cance
Externí odkaz:
http://arxiv.org/abs/2410.17235
Long videos contain many repeating actions, events and shots. These repetitions are frequently given identical captions, which makes it difficult to retrieve the exact desired clip using a text search. In this paper, we formulate the problem of uniqu
Externí odkaz:
http://arxiv.org/abs/2410.11702
Autor:
Huh, Jaesung, Zisserman, Andrew
This paper presents an improved framework for character-aware audio-visual subtitling in TV shows. Our approach integrates speech recognition, speaker diarisation, and character recognition, utilising both audio and visual cues. This holistic solutio
Externí odkaz:
http://arxiv.org/abs/2410.11068
Autor:
Huh, Jaesung, Chung, Joon Son, Nagrani, Arsha, Brown, Andrew, Jung, Jee-weon, Garcia-Romero, Daniel, Zisserman, Andrew
The VoxCeleb Speaker Recognition Challenges (VoxSRC) were a series of challenges and workshops that ran annually from 2019 to 2023. The challenges primarily evaluated the tasks of speaker recognition and diarisation under various settings including:
Externí odkaz:
http://arxiv.org/abs/2408.14886
Autor:
Bhalgat, Yash, Tschernezki, Vadim, Laina, Iro, Henriques, João F., Vedaldi, Andrea, Zisserman, Andrew
Egocentric videos present unique challenges for 3D scene understanding due to rapid camera motion, frequent object occlusions, and limited object visibility. This paper introduces a novel approach to instance segmentation and tracking in first-person
Externí odkaz:
http://arxiv.org/abs/2408.09860