Zobrazeno 1 - 10
of 159
pro vyhledávání: '"Jenni, Simon"'
Temporal video alignment aims to synchronize the key events like object interactions or action phase transitions in two videos. Such methods could benefit various video editing, processing, and understanding tasks. However, existing approaches operat
Externí odkaz:
http://arxiv.org/abs/2409.01445
Autor:
Hua, Hang, Shi, Jing, Kafle, Kushal, Jenni, Simon, Zhang, Daoan, Collomosse, John, Cohen, Scott, Luo, Jiebo
Recent progress in large-scale pre-training has led to the development of advanced vision-language models (VLMs) with remarkable proficiency in comprehending and generating multimodal content. Despite the impressive ability to perform complex reasoni
Externí odkaz:
http://arxiv.org/abs/2404.14715
Autor:
Kwon, Gihyun, Jenni, Simon, Li, Dingzeyu, Lee, Joon-Young, Ye, Jong Chul, Heilbron, Fabian Caba
While there has been significant progress in customizing text-to-image generation models, generating images that combine multiple personalized concepts remains challenging. In this work, we introduce Concept Weaver, a method for composing customized
Externí odkaz:
http://arxiv.org/abs/2404.03913
Self-supervised approaches for video have shown impressive results in video understanding tasks. However, unlike early works that leverage temporal self-supervision, current state-of-the-art methods primarily rely on tasks from the image domain (e.g.
Externí odkaz:
http://arxiv.org/abs/2312.13008
We present DECORAIT; a decentralized registry through which content creators may assert their right to opt in or out of AI training as well as receive reward for their contributions. Generative AI (GenAI) enables images to be synthesized using AI mod
Externí odkaz:
http://arxiv.org/abs/2309.14400
Large-scale vision-language models (VLM) have shown impressive results for language-guided search applications. While these models allow category-level queries, they currently struggle with personalized searches for moments in a video where a specifi
Externí odkaz:
http://arxiv.org/abs/2306.10169
We present EKILA; a decentralized framework that enables creatives to receive recognition and reward for their contributions to generative AI (GenAI). EKILA proposes a robust visual attribution technique and combines this with an emerging content pro
Externí odkaz:
http://arxiv.org/abs/2304.04639
Autor:
Black, Alexander, Jenni, Simon, Bui, Tu, Tanjim, Md. Mehrab, Petrangeli, Stefano, Sinha, Ritwik, Swaminathan, Viswanathan, Collomosse, John
We propose VADER, a spatio-temporal matching, alignment, and change summarization method to help fight misinformation spread via manipulated videos. VADER matches and coarsely aligns partial video fragments to candidate videos using a robust visual d
Externí odkaz:
http://arxiv.org/abs/2303.13193
We propose a self-supervised learning approach for videos that learns representations of both the RGB frames and the accompanying audio without human supervision. In contrast to images that capture the static scene appearance, videos also contain sou
Externí odkaz:
http://arxiv.org/abs/2302.07702
We propose Spatio-temporal Crop Aggregation for video representation LEarning (SCALE), a novel method that enjoys high scalability at both training and inference time. Our model builds long-range video features by learning from sets of video clip-lev
Externí odkaz:
http://arxiv.org/abs/2211.17042