Výsledky vyhledávání - "Ko, Byungsoo"

Report

Intriguing Properties of Large Language and Vision Models

Autor: Lee, Young-Jun, Ko, Byungsoo, Kim, Han-Gyu, Hwang, Yechan, Choi, Ho-Jin

Recently, large language and vision models (LLVMs) have received significant attention and development efforts due to their remarkable generalization performance across a wide range of tasks requiring perception and cognitive abilities. A key factor

Externí odkaz: http://arxiv.org/abs/2410.04751

Zobrazit plný text záznamu

Report

Stark: Social Long-Term Multi-Modal Conversation with Persona Commonsense Knowledge

Autor: Lee, Young-Jun, Lee, Dokyong, Youn, Junyoung, Oh, Kyeongjin, Ko, Byungsoo, Hyeon, Jonghwan, Choi, Ho-Jin

Humans share a wide variety of images related to their personal experiences within conversations via instant messaging tools. However, existing works focus on (1) image-sharing behavior in singular sessions, leading to limited long-term social intera

Externí odkaz: http://arxiv.org/abs/2407.03958

Zobrazit plný text záznamu

Report

VVS: Video-to-Video Retrieval with Irrelevant Frame Suppression

Autor: Jo, Won, Lim, Geuntaek, Lee, Gwangjin, Kim, Hyunwoo, Ko, Byungsoo, Choi, Yukyung

In content-based video retrieval (CBVR), dealing with large-scale collections, efficiency is as important as accuracy; thus, several video-level feature-based studies have actively been conducted. Nevertheless, owing to the severe difficulty of embed

Externí odkaz: http://arxiv.org/abs/2303.08906

Zobrazit plný text záznamu

Report

DialogCC: An Automated Pipeline for Creating High-Quality Multi-Modal Dialogue Dataset

Autor: Lee, Young-Jun, Ko, Byungsoo, Kim, Han-Gyu, Hyeon, Jonghwan, Choi, Ho-Jin

As sharing images in an instant message is a crucial factor, there has been active research on learning an image-text multi-modal dialogue models. However, training a well-generalized multi-modal dialogue model remains challenging due to the low qual

Externí odkaz: http://arxiv.org/abs/2212.04119

Zobrazit plný text záznamu

Report

Group Generalized Mean Pooling for Vision Transformer

Autor: Ko, Byungsoo, Kim, Han-Gyu, Heo, Byeongho, Yun, Sangdoo, Chun, Sanghyuk, Gu, Geonmo, Kim, Wonjae

Vision Transformer (ViT) extracts the final representation from either class token or an average of all patch tokens, following the architecture of Transformer in Natural Language Processing (NLP) or Convolutional Neural Networks (CNNs) in computer v

Externí odkaz: http://arxiv.org/abs/2212.04114

Zobrazit plný text záznamu

Report

Granularity-aware Adaptation for Image Retrieval over Multiple Tasks

Autor: Almazán, Jon, Ko, Byungsoo, Gu, Geonmo, Larlus, Diane, Kalantidis, Yannis

Strong image search models can be learned for a specific domain, ie. set of labels, provided that some labeled images of that domain are available. A practical visual search model, however, should be versatile enough to solve multiple retrieval tasks

Externí odkaz: http://arxiv.org/abs/2210.02254

Zobrazit plný text záznamu

Report

Large-scale Bilingual Language-Image Contrastive Learning

Autor: Ko, Byungsoo, Gu, Geonmo

This paper is a technical report to share our experience and findings building a Korean and English bilingual multimodal model. While many of the multimodal datasets focus on English and multilingual multimodal research uses machine-translated texts,

Externí odkaz: http://arxiv.org/abs/2203.14463

Zobrazit plný text záznamu

Report

Deep Hash Distillation for Image Retrieval

Autor: Jang, Young Kyun, Gu, Geonmo, Ko, Byungsoo, Kang, Isaac, Cho, Nam Ik

In hash-based image retrieval systems, degraded or transformed inputs usually generate different codes from the original, deteriorating the retrieval accuracy. To mitigate this issue, data augmentation can be applied during training. However, even if

Externí odkaz: http://arxiv.org/abs/2112.08816

Zobrazit plný text záznamu

Report

Towards Light-weight and Real-time Line Segment Detection

Autor: Gu, Geonmo, Ko, Byungsoo, Go, SeoungHyun, Lee, Sung-Hyun, Lee, Jingeun, Shin, Minchul

Previous deep learning-based line segment detection (LSD) suffers from the immense model size and high computational cost for line prediction. This constrains them from real-time inference on computationally restricted environments. In this paper, we

Externí odkaz: http://arxiv.org/abs/2106.00186

Zobrazit plný text záznamu

Report

RTIC: Residual Learning for Text and Image Composition using Graph Convolutional Network

Autor: Shin, Minchul, Cho, Yoonjae, Ko, Byungsoo, Gu, Geonmo

In this paper, we study the compositional learning of images and texts for image retrieval. The query is given in the form of an image and text that describes the desired modifications to the image; the goal is to retrieve the target image that satis

Externí odkaz: http://arxiv.org/abs/2104.03015

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání