Zobrazeno 1 - 10
of 273
pro vyhledávání: '"Kim, Changick"'
We introduce VideoMamba, a novel adaptation of the pure Mamba architecture, specifically designed for video recognition. Unlike transformers that rely on self-attention mechanisms leading to high computational costs by quadratic complexity, VideoMamb
Externí odkaz:
http://arxiv.org/abs/2407.08476
Autor:
Nugroho, Muhammad Adi, Woo, Sangmin, Lee, Sumin, Park, Jinyoung, Wang, Yooseung, Kim, Donguk, Kim, Changick
Weakly-Supervised Group Activity Recognition (WSGAR) aims to understand the activity performed together by a group of individuals with the video-level label and without actor-level labels. We propose Flow-Assisted Motion Learning Network (Flaming-Net
Externí odkaz:
http://arxiv.org/abs/2405.18012
Don't Miss the Forest for the Trees: Attentional Vision Calibration for Large Vision Language Models
This study addresses the issue observed in Large Vision Language Models (LVLMs), where excessive attention on a few image tokens, referred to as blind tokens, leads to hallucinatory responses in tasks requiring fine-grained understanding of visual ob
Externí odkaz:
http://arxiv.org/abs/2405.17820
We present Diffusion Model Patching (DMP), a simple method to boost the performance of pre-trained diffusion models that have already reached convergence, with a negligible increase in parameters. DMP inserts a small, learnable set of prompts into th
Externí odkaz:
http://arxiv.org/abs/2405.17825
Recent advancements in Large Vision Language Models (LVLMs) have revolutionized how machines understand and generate textual responses based on visual inputs. Despite their impressive capabilities, they often produce "hallucinatory" outputs that do n
Externí odkaz:
http://arxiv.org/abs/2405.17821
Panoramic Activity Recognition (PAR) seeks to identify diverse human activities across different scales, from individual actions to social group and global activities in crowded panoramic scenes. PAR presents two major challenges: 1) recognizing the
Externí odkaz:
http://arxiv.org/abs/2403.14113
Diffusion models have achieved remarkable success across a range of generative tasks. Recent efforts to enhance diffusion model architectures have reimagined them as a form of multi-task learning, where each task corresponds to a denoising task at a
Externí odkaz:
http://arxiv.org/abs/2403.09176
Recent progress in single-image 3D generation highlights the importance of multi-view coherency, leveraging 3D priors from large-scale diffusion models pretrained on Internet-scale images. However, the aspect of novel-view diversity remains underexpl
Externí odkaz:
http://arxiv.org/abs/2312.15980
Recently, multimodal prompting, which introduces learnable missing-aware prompts for all missing modality cases, has exhibited impressive performance. However, it encounters two critical issues: 1) The number of prompts grows exponentially as the num
Externí odkaz:
http://arxiv.org/abs/2312.15890
Adversarial training integrates adversarial examples during model training to enhance robustness. However, its application in fixed dataset settings differs from real-world dynamics, where data accumulates incrementally. In this study, we investigate
Externí odkaz:
http://arxiv.org/abs/2312.03289