Zobrazeno 1 - 10
of 9 063
pro vyhledávání: '"A. Damen"'
We introduce EgoPoints, a benchmark for point tracking in egocentric videos. We annotate 4.7K challenging tracks in egocentric sequences. Compared to the popular TAP-Vid-DAVIS evaluation benchmark, we include 9x more points that go out-of-view and 59
Externí odkaz:
http://arxiv.org/abs/2412.04592
The goal of this work is to generate step-by-step visual instructions in the form of a sequence of images, given an input image that provides the scene context and the sequence of textual instructions. This is a challenging problem as it requires gen
Externí odkaz:
http://arxiv.org/abs/2412.01987
Following the successful 2023 edition, we organised the Second Perception Test challenge as a half-day workshop alongside the IEEE/CVF European Conference on Computer Vision (ECCV) 2024, with the goal of benchmarking state-of-the-art video models and
Externí odkaz:
http://arxiv.org/abs/2411.19941
Large-scale multimodal representation learning successfully optimizes for zero-shot transfer at test time. Yet the standard pretraining paradigm (contrastive learning on large amounts of image-text data) does not explicitly encourage representations
Externí odkaz:
http://arxiv.org/abs/2411.15099
Autor:
Gao, Tianqi, Alsulimane, Mohammad, Burdin, Sergey, DAmen, Gabriele, Da Via, Cinzia, Mavrokoridis, Konstantinos, Nomerotski, Andrei, Roberts, Adam, Svihra, Peter, Taylor, Jon, Tricoli, Alessandro
The feasibility study of a new technique for thermal neutron detection using a Timepix3 camera (TPX3Cam) with custom-made optical add-ons operated in event-mode data acquisition is presented. The camera has a spatial resolution of ~ 16 um and a tempo
Externí odkaz:
http://arxiv.org/abs/2411.12095
Long videos contain many repeating actions, events and shots. These repetitions are frequently given identical captions, which makes it difficult to retrieve the exact desired clip using a text search. In this paper, we formulate the problem of uniqu
Externí odkaz:
http://arxiv.org/abs/2410.11702
Egocentric videos provide a unique perspective into individuals' daily experiences, yet their unstructured nature presents challenges for perception. In this paper, we introduce AMEGO, a novel approach aimed at enhancing the comprehension of very-lon
Externí odkaz:
http://arxiv.org/abs/2409.10917
Publikováno v:
IDETC 2024
Design inspiration is crucial for establishing the direction of a design as well as evoking feelings and conveying meanings during the conceptual design process. Many practice designers use text-based searches on platforms like Pinterest to gather im
Externí odkaz:
http://arxiv.org/abs/2407.11991
Teaching robots novel skills with demonstrations via human-in-the-loop data collection techniques like kinesthetic teaching or teleoperation puts a heavy burden on human supervisors. In contrast to this paradigm, it is often significantly easier to p
Externí odkaz:
http://arxiv.org/abs/2404.14735
Large Vision Language Models (VLMs) are now the de facto state-of-the-art for a number of tasks including visual question answering, recognising objects, and spatial referral. In this work, we propose the HOI-Ref task for egocentric images that aims
Externí odkaz:
http://arxiv.org/abs/2404.09933