Výsledky vyhledávání - "Lim, Ser-Nam"

Report

Composing Object Relations and Attributes for Image-Text Matching

Autor: Pham, Khoi, Huynh, Chuong, Lim, Ser-Nam, Shrivastava, Abhinav

We study the visual semantic embedding problem for image-text matching. Most existing work utilizes a tailored cross-attention mechanism to perform local alignment across the two image and text modalities. This is computationally expensive, even thou

Externí odkaz: http://arxiv.org/abs/2406.11820

Zobrazit plný text záznamu

Report

UVIS: Unsupervised Video Instance Segmentation

Autor: Huang, Shuaiyi, Suri, Saksham, Gupta, Kamal, Rambhatla, Sai Saketh, Lim, Ser-nam, Shrivastava, Abhinav

Video instance segmentation requires classifying, segmenting, and tracking every object across video frames. Unlike existing approaches that rely on masks, boxes, or category labels, we propose UVIS, a novel Unsupervised Video Instance Segmentation (

Externí odkaz: http://arxiv.org/abs/2406.06908

Zobrazit plný text záznamu

Report

Distilling Vision-Language Pretraining for Efficient Cross-Modal Retrieval

Autor: Jang, Young Kyun, Kim, Donghyun, Lim, Ser-nam

``Learning to hash'' is a practical solution for efficient retrieval, offering fast search speed and low storage cost. It is widely applied in various applications, such as image-text cross-modal search. In this paper, we explore the potential of enh

Externí odkaz: http://arxiv.org/abs/2405.14726

Zobrazit plný text záznamu

Report

Towards Cross-modal Backward-compatible Representation Learning for Vision-Language Models

Autor: Jang, Young Kyun, Lim, Ser-nam

Modern retrieval systems often struggle with upgrading to new and more powerful models due to the incompatibility of embeddings between the old and new models. This necessitates a costly process known as backfilling, which involves re-computing the e

Externí odkaz: http://arxiv.org/abs/2405.14715

Zobrazit plný text záznamu

Report

Spherical Linear Interpolation and Text-Anchoring for Zero-shot Composed Image Retrieval

Autor: Jang, Young Kyun, Huynh, Dat, Shah, Ashish, Chen, Wen-Kai, Lim, Ser-Nam

Composed Image Retrieval (CIR) is a complex task that retrieves images using a query, which is configured with an image and a caption that describes desired modifications to that image. Supervised CIR approaches have shown strong performance, but the

Externí odkaz: http://arxiv.org/abs/2405.00571

Zobrazit plný text záznamu

Report

Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval

Autor: Jang, Young Kyun, Kim, Donghyun, Meng, Zihang, Huynh, Dat, Lim, Ser-Nam

Composed Image Retrieval (CIR) is a task that retrieves images similar to a query, based on a provided textual modification. Current techniques rely on supervised learning for CIR models using labeled triplets of the reference image, text, target ima

Externí odkaz: http://arxiv.org/abs/2404.15516

Zobrazit plný text záznamu

Report

MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

Autor: He, Bo, Li, Hengduo, Jang, Young Kyun, Jia, Menglin, Cao, Xuefei, Shah, Ashish, Shrivastava, Abhinav, Lim, Ser-Nam

With the success of large language models (LLMs), integrating the vision model into LLMs to build vision-language foundation models has gained much more interest recently. However, existing LLM-based large multimodal models (e.g., Video-LLaMA, VideoC

Externí odkaz: http://arxiv.org/abs/2404.05726

Zobrazit plný text záznamu

Report

Mitigating Dialogue Hallucination for Large Vision Language Models via Adversarial Instruction Tuning

Autor: Park, Dongmin, Qian, Zhaofang, Han, Guangxing, Lim, Ser-Nam

Mitigating hallucinations of Large Vision Language Models,(LVLMs) is crucial to enhance their reliability for general-purpose assistants. This paper shows that such hallucinations of LVLMs can be significantly exacerbated by preceding user-system dia

Externí odkaz: http://arxiv.org/abs/2403.10492

Zobrazit plný text záznamu

Report

FSViewFusion: Few-Shots View Generation of Novel Objects

Autor: Hussain, Rukhshanda, Lim, Hui Xian Grace, Chen, Borchun, Shah, Mubarak, Lim, Ser Nam

Novel view synthesis has observed tremendous developments since the arrival of NeRFs. However, Nerf models overfit on a single scene, lacking generalization to out of distribution objects. Recently, diffusion models have exhibited remarkable performa

Externí odkaz: http://arxiv.org/abs/2403.06394

Zobrazit plný text záznamu

Report

Universal Pyramid Adversarial Training for Improved ViT Performance

Autor: Chiang, Ping-yeh, Zhou, Yipin, Poursaeed, Omid, Shukla, Satya Narayan, Shah, Ashish, Goldstein, Tom, Lim, Ser-Nam

Recently, Pyramid Adversarial training (Herrmann et al., 2022) has been shown to be very effective for improving clean accuracy and distribution-shift robustness of vision transformers. However, due to the iterative nature of adversarial training, th

Externí odkaz: http://arxiv.org/abs/2312.16339

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání