Výsledky vyhledávání

Report

It's Just Another Day: Unique Video Captioning by Discriminative Prompting

Autor: Perrett, Toby, Han, Tengda, Damen, Dima, Zisserman, Andrew

Long videos contain many repeating actions, events and shots. These repetitions are frequently given identical captions, which makes it difficult to retrieve the exact desired clip using a text search. In this paper, we formulate the problem of uniqu

Externí odkaz: http://arxiv.org/abs/2410.11702

Zobrazit plný text záznamu

Report

AMEGO: Active Memory from long EGOcentric videos

Autor: Goletto, Gabriele, Nagarajan, Tushar, Averta, Giuseppe, Damen, Dima

Egocentric videos provide a unique perspective into individuals' daily experiences, yet their unstructured nature presents challenges for perception. In this paper, we introduce AMEGO, a novel approach aimed at enhancing the comprehension of very-lon

Externí odkaz: http://arxiv.org/abs/2409.10917

Zobrazit plný text záznamu

Report

Inspired by AI? A Novel Generative AI System To Assist Conceptual Automotive Design

Autor: Wang, Ye, Damen, Nicole B., Gale, Thomas, Seo, Voho, Shayani, Hooman

Publikováno v: IDETC 2024

Design inspiration is crucial for establishing the direction of a design as well as evoking feelings and conveying meanings during the conceptual design process. Many practice designers use text-based searches on platforms like Pinterest to gather im

Externí odkaz: http://arxiv.org/abs/2407.11991

Zobrazit plný text záznamu

Report

Rank2Reward: Learning Shaped Reward Functions from Passive Video

Autor: Yang, Daniel, Tjia, Davin, Berg, Jacob, Damen, Dima, Agrawal, Pulkit, Gupta, Abhishek

Teaching robots novel skills with demonstrations via human-in-the-loop data collection techniques like kinesthetic teaching or teleoperation puts a heavy burden on human supervisors. In contrast to this paradigm, it is often significantly easier to p

Externí odkaz: http://arxiv.org/abs/2404.14735

Zobrazit plný text záznamu

Report

HOI-Ref: Hand-Object Interaction Referral in Egocentric Vision

Autor: Bansal, Siddhant, Wray, Michael, Damen, Dima

Large Vision Language Models (VLMs) are now the de facto state-of-the-art for a number of tasks including visual question answering, recognising objects, and spatial referral. In this work, we propose the HOI-Ref task for egocentric images that aims

Externí odkaz: http://arxiv.org/abs/2404.09933

Zobrazit plný text záznamu

Report

TIM: A Time Interval Machine for Audio-Visual Action Recognition

Autor: Chalk, Jacob, Huh, Jaesung, Kazakos, Evangelos, Zisserman, Andrew, Damen, Dima

Diverse actions give rise to rich audio-visual signals in long videos. Recent works showcase that the two modalities of audio and video exhibit different temporal extents of events and distinct labels. We address the interplay between the two modalit

Externí odkaz: http://arxiv.org/abs/2404.05559

Zobrazit plný text záznamu

Report

Spatial Cognition from Egocentric Video: Out of Sight, Not Out of Mind

Autor: Plizzari, Chiara, Goel, Shubham, Perrett, Toby, Chalk, Jacob, Kanazawa, Angjoo, Damen, Dima

As humans move around, performing their daily tasks, they are able to recall where they have positioned objects in their environment, even if these objects are currently out of sight. In this paper, we aim to mimic this spatial cognition ability. We

Externí odkaz: http://arxiv.org/abs/2404.05072

Zobrazit plný text záznamu

Report

Every Shot Counts: Using Exemplars for Repetition Counting in Videos

Autor: Sinha, Saptarshi, Stergiou, Alexandros, Damen, Dima

Video repetition counting infers the number of repetitions of recurring actions or motion within a video. We propose an exemplar-based approach that discovers visual correspondence of video exemplars across repetitions within target videos. Our propo

Externí odkaz: http://arxiv.org/abs/2403.18074

Zobrazit plný text záznamu

Report

Truncated Polynomial Expansion-Based Detection in Massive MIMO: A Model-Driven Deep Learning Approach

Autor: Izadinasab, Kazem, Shaban, Ahmed Wagdy, Damen, Oussama

In this paper, we propose a deep learning (DL)-based approach for efficiently computing the inverse of Hermitian matrices using truncated polynomial expansion (TPE). Our model-driven approach involves optimizing the coefficients of the TPE during an

Externí odkaz: http://arxiv.org/abs/2402.12595

Zobrazit plný text záznamu

Report

Video Editing for Video Retrieval

Autor: Zhu, Bin, Flanagan, Kevin, Fragomeni, Adriano, Wray, Michael, Damen, Dima

Though pre-training vision-language models have demonstrated significant benefits in boosting video-text retrieval performance from large-scale web videos, fine-tuning still plays a critical role with manually annotated clips with start and end times

Externí odkaz: http://arxiv.org/abs/2402.02335

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání