Výsledky vyhledávání - "Kim, Changick"

Report

VideoMamba: Spatio-Temporal Selective State Space Model

Autor: Park, Jinyoung, Kim, Hee-Seon, Ko, Kangwook, Kim, Minbeom, Kim, Changick

We introduce VideoMamba, a novel adaptation of the pure Mamba architecture, specifically designed for video recognition. Unlike transformers that rely on self-attention mechanisms leading to high computational costs by quadratic complexity, VideoMamb

Externí odkaz: http://arxiv.org/abs/2407.08476

Zobrazit plný text záznamu

Report

Flow-Assisted Motion Learning Network for Weakly-Supervised Group Activity Recognition

Autor: Nugroho, Muhammad Adi, Woo, Sangmin, Lee, Sumin, Park, Jinyoung, Wang, Yooseung, Kim, Donguk, Kim, Changick

Weakly-Supervised Group Activity Recognition (WSGAR) aims to understand the activity performed together by a group of individuals with the video-level label and without actor-level labels. We propose Flow-Assisted Motion Learning Network (Flaming-Net

Externí odkaz: http://arxiv.org/abs/2405.18012

Zobrazit plný text záznamu

Report

Don't Miss the Forest for the Trees: Attentional Vision Calibration for Large Vision Language Models

Autor: Woo, Sangmin, Kim, Donguk, Jang, Jaehyuk, Choi, Yubin, Kim, Changick

This study addresses the issue observed in Large Vision Language Models (LVLMs), where excessive attention on a few image tokens, referred to as blind tokens, leads to hallucinatory responses in tasks requiring fine-grained understanding of visual ob

Externí odkaz: http://arxiv.org/abs/2405.17820

Zobrazit plný text záznamu

Report

Diffusion Model Patching via Mixture-of-Prompts

Autor: Ham, Seokil, Woo, Sangmin, Kim, Jin-Young, Go, Hyojun, Park, Byeongjun, Kim, Changick

We present Diffusion Model Patching (DMP), a simple method to boost the performance of pre-trained diffusion models that have already reached convergence, with a negligible increase in parameters. DMP inserts a small, learnable set of prompts into th

Externí odkaz: http://arxiv.org/abs/2405.17825

Zobrazit plný text záznamu

Report

RITUAL: Random Image Transformations as a Universal Anti-hallucination Lever in LVLMs

Autor: Woo, Sangmin, Jang, Jaehyuk, Kim, Donguk, Choi, Yubin, Kim, Changick

Recent advancements in Large Vision Language Models (LVLMs) have revolutionized how machines understand and generate textual responses based on visual inputs. Despite their impressive capabilities, they often produce "hallucinatory" outputs that do n

Externí odkaz: http://arxiv.org/abs/2405.17821

Zobrazit plný text záznamu

Report

Spatio-Temporal Proximity-Aware Dual-Path Model for Panoramic Activity Recognition

Autor: Lee, Sumin, Wang, Yooseung, Woo, Sangmin, Kim, Changick

Panoramic Activity Recognition (PAR) seeks to identify diverse human activities across different scales, from individual actions to social group and global activities in crowded panoramic scenes. PAR presents two major challenges: 1) recognizing the

Externí odkaz: http://arxiv.org/abs/2403.14113

Zobrazit plný text záznamu

Report

Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-Experts

Autor: Park, Byeongjun, Go, Hyojun, Kim, Jin-Young, Woo, Sangmin, Ham, Seokil, Kim, Changick

Diffusion models have achieved remarkable success across a range of generative tasks. Recent efforts to enhance diffusion model architectures have reimagined them as a form of multi-task learning, where each task corresponds to a denoising task at a

Externí odkaz: http://arxiv.org/abs/2403.09176

Zobrazit plný text záznamu

Report

HarmonyView: Harmonizing Consistency and Diversity in One-Image-to-3D

Autor: Woo, Sangmin, Park, Byeongjun, Go, Hyojun, Kim, Jin-Young, Kim, Changick

Recent progress in single-image 3D generation highlights the importance of multi-view coherency, leveraging 3D priors from large-scale diffusion models pretrained on Internet-scale images. However, the aspect of novel-view diversity remains underexpl

Externí odkaz: http://arxiv.org/abs/2312.15980

Zobrazit plný text záznamu

Report

Towards Robust Multimodal Prompting With Missing Modalities

Autor: Jang, Jaehyuk, Wang, Yooseung, Kim, Changick

Recently, multimodal prompting, which introduces learnable missing-aware prompts for all missing modality cases, has exhibited impressive performance. However, it encounters two critical issues: 1) The number of prompts grows exponentially as the num

Externí odkaz: http://arxiv.org/abs/2312.15890

Zobrazit plný text záznamu

Report

Class Incremental Learning for Adversarial Robustness

Autor: Cho, Seungju, Lee, Hongsin, Kim, Changick

Adversarial training integrates adversarial examples during model training to enhance robustness. However, its application in fixed dataset settings differs from real-world dynamics, where data accumulates incrementally. In this study, we investigate

Externí odkaz: http://arxiv.org/abs/2312.03289

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání