Výsledky vyhledávání - "Woo, SangMin"

Report

Flow-Assisted Motion Learning Network for Weakly-Supervised Group Activity Recognition

Autor: Nugroho, Muhammad Adi, Woo, Sangmin, Lee, Sumin, Park, Jinyoung, Wang, Yooseung, Kim, Donguk, Kim, Changick

Weakly-Supervised Group Activity Recognition (WSGAR) aims to understand the activity performed together by a group of individuals with the video-level label and without actor-level labels. We propose Flow-Assisted Motion Learning Network (Flaming-Net

Externí odkaz: http://arxiv.org/abs/2405.18012

Zobrazit plný text záznamu

Report

Don't Miss the Forest for the Trees: Attentional Vision Calibration for Large Vision Language Models

Autor: Woo, Sangmin, Kim, Donguk, Jang, Jaehyuk, Choi, Yubin, Kim, Changick

This study addresses the issue observed in Large Vision Language Models (LVLMs), where excessive attention on a few image tokens, referred to as blind tokens, leads to hallucinatory responses in tasks requiring fine-grained understanding of visual ob

Externí odkaz: http://arxiv.org/abs/2405.17820

Zobrazit plný text záznamu

Report

Diffusion Model Patching via Mixture-of-Prompts

Autor: Ham, Seokil, Woo, Sangmin, Kim, Jin-Young, Go, Hyojun, Park, Byeongjun, Kim, Changick

We present Diffusion Model Patching (DMP), a simple method to boost the performance of pre-trained diffusion models that have already reached convergence, with a negligible increase in parameters. DMP inserts a small, learnable set of prompts into th

Externí odkaz: http://arxiv.org/abs/2405.17825

Zobrazit plný text záznamu

Report

RITUAL: Random Image Transformations as a Universal Anti-hallucination Lever in LVLMs

Autor: Woo, Sangmin, Jang, Jaehyuk, Kim, Donguk, Choi, Yubin, Kim, Changick

Recent advancements in Large Vision Language Models (LVLMs) have revolutionized how machines understand and generate textual responses based on visual inputs. Despite their impressive capabilities, they often produce "hallucinatory" outputs that do n

Externí odkaz: http://arxiv.org/abs/2405.17821

Zobrazit plný text záznamu

Report

Spatio-Temporal Proximity-Aware Dual-Path Model for Panoramic Activity Recognition

Autor: Lee, Sumin, Wang, Yooseung, Woo, Sangmin, Kim, Changick

Panoramic Activity Recognition (PAR) seeks to identify diverse human activities across different scales, from individual actions to social group and global activities in crowded panoramic scenes. PAR presents two major challenges: 1) recognizing the

Externí odkaz: http://arxiv.org/abs/2403.14113

Zobrazit plný text záznamu

Report

Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-Experts

Autor: Park, Byeongjun, Go, Hyojun, Kim, Jin-Young, Woo, Sangmin, Ham, Seokil, Kim, Changick

Diffusion models have achieved remarkable success across a range of generative tasks. Recent efforts to enhance diffusion model architectures have reimagined them as a form of multi-task learning, where each task corresponds to a denoising task at a

Externí odkaz: http://arxiv.org/abs/2403.09176

Zobrazit plný text záznamu

Report

HarmonyView: Harmonizing Consistency and Diversity in One-Image-to-3D

Autor: Woo, Sangmin, Park, Byeongjun, Go, Hyojun, Kim, Jin-Young, Kim, Changick

Recent progress in single-image 3D generation highlights the importance of multi-view coherency, leveraging 3D priors from large-scale diffusion models pretrained on Internet-scale images. However, the aspect of novel-view diversity remains underexpl

Externí odkaz: http://arxiv.org/abs/2312.15980

Zobrazit plný text záznamu

Report

Modality Mixer Exploiting Complementary Information for Multi-modal Action Recognition

Autor: Lee, Sumin, Woo, Sangmin, Nugroho, Muhammad Adi, Kim, Changick

Due to the distinctive characteristics of sensors, each modality exhibits unique physical properties. For this reason, in the context of multi-modal action recognition, it is important to consider not only the overall action content but also the comp

Externí odkaz: http://arxiv.org/abs/2311.12344

Zobrazit plný text záznamu

Report

Denoising Task Routing for Diffusion Models

Autor: Park, Byeongjun, Woo, Sangmin, Go, Hyojun, Kim, Jin-Young, Kim, Changick

Diffusion models generate highly realistic images by learning a multi-step denoising process, naturally embodying the principles of multi-task learning (MTL). Despite the inherent connection between diffusion models and MTL, there remains an unexplor

Externí odkaz: http://arxiv.org/abs/2310.07138

Zobrazit plný text záznamu

Report

Audio-Visual Glance Network for Efficient Video Recognition

Autor: Nugroho, Muhammad Adi, Woo, Sangmin, Lee, Sumin, Kim, Changick

Deep learning has made significant strides in video understanding tasks, but the computation required to classify lengthy and massive videos using clip-level video classifiers remains impractical and prohibitively expensive. To address this issue, we

Externí odkaz: http://arxiv.org/abs/2308.09322

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání