Zobrazeno 1 - 10
of 318
pro vyhledávání: '"Hilton, Adrian"'
Recent illumination estimation methods have focused on enhancing the resolution and improving the quality and diversity of the generated textures. However, few have explored tailoring the neural network architecture to the Equirectangular Panorama (E
Externí odkaz:
http://arxiv.org/abs/2410.13566
We present a novel framework to reconstruct complete 3D human shapes from a given target image by leveraging monocular unconstrained images. The objective of this work is to reproduce high-quality details in regions of the reconstructed human body th
Externí odkaz:
http://arxiv.org/abs/2407.10586
We introduce a new benchmark analysis focusing on 3D canine pose estimation from monocular in-the-wild images. A multi-modal dataset 3DDogs-Lab was captured indoors, featuring various dog breeds trotting on a walkway. It includes data from optical ma
Externí odkaz:
http://arxiv.org/abs/2406.14412
Autor:
Nadeem, Asmar, Sardari, Faegheh, Dawes, Robert, Husain, Syed Sameed, Hilton, Adrian, Mustafa, Armin
Existing video captioning benchmarks and models lack coherent representations of causal-temporal narrative, which is sequences of events linked through cause and effect, unfolding over time and driven by characters or agents. This lack of narrative r
Externí odkaz:
http://arxiv.org/abs/2406.06499
Unlike the sparse label action detection task, where a single action occurs in each timestamp of a video, in a dense multi-label scenario, actions can overlap. To address this challenging task, it is necessary to simultaneously learn (i) temporal dep
Externí odkaz:
http://arxiv.org/abs/2406.06187
Autor:
Yang, Haosen, Zhang, Chenhao, Wang, Wenqing, Volino, Marco, Hilton, Adrian, Zhang, Li, Zhu, Xiatian
Point management is a critical component in optimizing 3D Gaussian Splatting (3DGS) models, as the point initiation (e.g., via structure from motion) is distributionally inappropriate. Typically, the Adaptive Density Control (ADC) algorithm is applie
Externí odkaz:
http://arxiv.org/abs/2406.04251
Weakly supervised audio-visual video parsing (AVVP) methods aim to detect audible-only, visible-only, and audible-visible events using only video-level labels. Existing approaches tackle this by leveraging unimodal and cross-modal contexts. However,
Externí odkaz:
http://arxiv.org/abs/2405.10690
Autor:
Pesavento, Marco, Xu, Yuanlu, Sarafianos, Nikolaos, Maier, Robert, Wang, Ziyan, Yao, Chun-Han, Volino, Marco, Boyer, Edmond, Hilton, Adrian, Tung, Tony
Recent progress in human shape learning, shows that neural implicit models are effective in generating 3D human surfaces from limited number of views, and even from a single RGB image. However, existing monocular approaches still struggle to recover
Externí odkaz:
http://arxiv.org/abs/2403.10357
In the context of Audio Visual Question Answering (AVQA) tasks, the audio visual modalities could be learnt on three levels: 1) Spatial, 2) Temporal, and 3) Semantic. Existing AVQA methods suffer from two major shortcomings; the audio-visual (AV) inf
Externí odkaz:
http://arxiv.org/abs/2310.16754