Výsledky vyhledávání - "Jenni, Simon"

Report

Sync from the Sea: Retrieving Alignable Videos from Large-Scale Datasets

Autor: Dave, Ishan Rajendrakumar, Heilbron, Fabian Caba, Shah, Mubarak, Jenni, Simon

Temporal video alignment aims to synchronize the key events like object interactions or action phase transitions in two videos. Such methods could benefit various video editing, processing, and understanding tasks. However, existing approaches operat

Externí odkaz: http://arxiv.org/abs/2409.01445

Zobrazit plný text záznamu

Report

FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction

Autor: Hua, Hang, Shi, Jing, Kafle, Kushal, Jenni, Simon, Zhang, Daoan, Collomosse, John, Cohen, Scott, Luo, Jiebo

Recent progress in large-scale pre-training has led to the development of advanced vision-language models (VLMs) with remarkable proficiency in comprehending and generating multimodal content. Despite the impressive ability to perform complex reasoni

Externí odkaz: http://arxiv.org/abs/2404.14715

Zobrazit plný text záznamu

Report

Concept Weaver: Enabling Multi-Concept Fusion in Text-to-Image Models

Autor: Kwon, Gihyun, Jenni, Simon, Li, Dingzeyu, Lee, Joon-Young, Ye, Jong Chul, Heilbron, Fabian Caba

While there has been significant progress in customizing text-to-image generation models, generating images that combine multiple personalized concepts remains challenging. In this work, we introduce Concept Weaver, a method for composing customized

Externí odkaz: http://arxiv.org/abs/2404.03913

Zobrazit plný text záznamu

Report

No More Shortcuts: Realizing the Potential of Temporal Self-Supervision

Autor: Dave, Ishan Rajendrakumar, Jenni, Simon, Shah, Mubarak

Self-supervised approaches for video have shown impressive results in video understanding tasks. However, unlike early works that leverage temporal self-supervision, current state-of-the-art methods primarily rely on tasks from the image domain (e.g.

Externí odkaz: http://arxiv.org/abs/2312.13008

Zobrazit plný text záznamu

Report

DECORAIT -- DECentralized Opt-in/out Registry for AI Training

Autor: Balan, Kar, Black, Alex, Jenni, Simon, Gilbert, Andrew, Parsons, Andy, Collomosse, John

We present DECORAIT; a decentralized registry through which content creators may assert their right to opt in or out of AI training as well as receive reward for their contributions. Generative AI (GenAI) enables images to be synthesized using AI mod

Externí odkaz: http://arxiv.org/abs/2309.14400

Zobrazit plný text záznamu

Report

Meta-Personalizing Vision-Language Models to Find Named Instances in Video

Autor: Yeh, Chun-Hsiao, Russell, Bryan, Sivic, Josef, Heilbron, Fabian Caba, Jenni, Simon

Large-scale vision-language models (VLM) have shown impressive results for language-guided search applications. While these models allow category-level queries, they currently struggle with personalized searches for moments in a video where a specifi

Externí odkaz: http://arxiv.org/abs/2306.10169

Zobrazit plný text záznamu

Report

EKILA: Synthetic Media Provenance and Attribution for Generative Art

Autor: Balan, Kar, Agarwal, Shruti, Jenni, Simon, Parsons, Andy, Gilbert, Andrew, Collomosse, John

We present EKILA; a decentralized framework that enables creatives to receive recognition and reward for their contributions to generative AI (GenAI). EKILA proposes a robust visual attribution technique and combines this with an emerging content pro

Externí odkaz: http://arxiv.org/abs/2304.04639

Zobrazit plný text záznamu

Report

VADER: Video Alignment Differencing and Retrieval

Autor: Black, Alexander, Jenni, Simon, Bui, Tu, Tanjim, Md. Mehrab, Petrangeli, Stefano, Sinha, Ritwik, Swaminathan, Viswanathan, Collomosse, John

We propose VADER, a spatio-temporal matching, alignment, and change summarization method to help fight misinformation spread via manipulated videos. VADER matches and coarsely aligns partial video fragments to candidate videos using a robust visual d

Externí odkaz: http://arxiv.org/abs/2303.13193

Zobrazit plný text záznamu

Report

Audio-Visual Contrastive Learning with Temporal Self-Supervision

Autor: Jenni, Simon, Black, Alexander, Collomosse, John

We propose a self-supervised learning approach for videos that learns representations of both the RGB frames and the accompanying audio without human supervision. In contrast to images that capture the static scene appearance, videos also contain sou

Externí odkaz: http://arxiv.org/abs/2302.07702

Zobrazit plný text záznamu

Report

Spatio-Temporal Crop Aggregation for Video Representation Learning

Autor: Sameni, Sepehr, Jenni, Simon, Favaro, Paolo

We propose Spatio-temporal Crop Aggregation for video representation LEarning (SCALE), a novel method that enjoys high scalability at both training and inference time. Our model builds long-range video features by learning from sets of video clip-lev

Externí odkaz: http://arxiv.org/abs/2211.17042

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání