Zobrazeno 1 - 10
of 2 446
pro vyhledávání: '"P. Damen"'
Long videos contain many repeating actions, events and shots. These repetitions are frequently given identical captions, which makes it difficult to retrieve the exact desired clip using a text search. In this paper, we formulate the problem of uniqu
Externí odkaz:
http://arxiv.org/abs/2410.11702
Egocentric videos provide a unique perspective into individuals' daily experiences, yet their unstructured nature presents challenges for perception. In this paper, we introduce AMEGO, a novel approach aimed at enhancing the comprehension of very-lon
Externí odkaz:
http://arxiv.org/abs/2409.10917
Publikováno v:
IDETC 2024
Design inspiration is crucial for establishing the direction of a design as well as evoking feelings and conveying meanings during the conceptual design process. Many practice designers use text-based searches on platforms like Pinterest to gather im
Externí odkaz:
http://arxiv.org/abs/2407.11991
Teaching robots novel skills with demonstrations via human-in-the-loop data collection techniques like kinesthetic teaching or teleoperation puts a heavy burden on human supervisors. In contrast to this paradigm, it is often significantly easier to p
Externí odkaz:
http://arxiv.org/abs/2404.14735
Large Vision Language Models (VLMs) are now the de facto state-of-the-art for a number of tasks including visual question answering, recognising objects, and spatial referral. In this work, we propose the HOI-Ref task for egocentric images that aims
Externí odkaz:
http://arxiv.org/abs/2404.09933
Diverse actions give rise to rich audio-visual signals in long videos. Recent works showcase that the two modalities of audio and video exhibit different temporal extents of events and distinct labels. We address the interplay between the two modalit
Externí odkaz:
http://arxiv.org/abs/2404.05559
As humans move around, performing their daily tasks, they are able to recall where they have positioned objects in their environment, even if these objects are currently out of sight. In this paper, we aim to mimic this spatial cognition ability. We
Externí odkaz:
http://arxiv.org/abs/2404.05072
Video repetition counting infers the number of repetitions of recurring actions or motion within a video. We propose an exemplar-based approach that discovers visual correspondence of video exemplars across repetitions within target videos. Our propo
Externí odkaz:
http://arxiv.org/abs/2403.18074
In this paper, we propose a deep learning (DL)-based approach for efficiently computing the inverse of Hermitian matrices using truncated polynomial expansion (TPE). Our model-driven approach involves optimizing the coefficients of the TPE during an
Externí odkaz:
http://arxiv.org/abs/2402.12595
Though pre-training vision-language models have demonstrated significant benefits in boosting video-text retrieval performance from large-scale web videos, fine-tuning still plays a critical role with manually annotated clips with start and end times
Externí odkaz:
http://arxiv.org/abs/2402.02335