Výsledky vyhledávání - "Shen, Heng Tao"

Report

MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct

Autor: Luo, Run, Zhang, Haonan, Chen, Longze, Lin, Ting-En, Liu, Xiong, Wu, Yuchuan, Yang, Min, Wang, Minzheng, Zeng, Pengpeng, Gao, Lianli, Shen, Heng Tao, Li, Yunshui, Xia, Xiaobo, Huang, Fei, Song, Jingkuan, Li, Yongbin

The development of Multimodal Large Language Models (MLLMs) has seen significant advancements with increasing demands in various fields (e.g., multimodal agents, embodied intelligence). While model-driven approaches attempt to enhance MLLMs capabilit

Externí odkaz: http://arxiv.org/abs/2409.05840

Zobrazit plný text záznamu

Report

VQ-Flow: Taming Normalizing Flows for Multi-Class Anomaly Detection via Hierarchical Vector Quantization

Autor: Zhou, Yixuan, Xu, Xing, Sun, Zhe, Song, Jingkuan, Cichocki, Andrzej, Shen, Heng Tao

Normalizing flows, a category of probabilistic models famed for their capabilities in modeling complex data distributions, have exhibited remarkable efficacy in unsupervised anomaly detection. This paper explores the potential of normalizing flows in

Externí odkaz: http://arxiv.org/abs/2409.00942

Zobrazit plný text záznamu

Report

DiffLoRA: Generating Personalized Low-Rank Adaptation Weights with Diffusion

Autor: Wu, Yujia, Shi, Yiming, Wei, Jiwei, Sun, Chengwei, Zhou, Yuyang, Yang, Yang, Shen, Heng Tao

Personalized text-to-image generation has gained significant attention for its capability to generate high-fidelity portraits of specific identities conditioned on user-defined prompts. Existing methods typically involve test-time fine-tuning or inst

Externí odkaz: http://arxiv.org/abs/2408.06740

Zobrazit plný text záznamu

Report

GalleryGPT: Analyzing Paintings with Large Multimodal Models

Autor: Bin, Yi, Shi, Wenhao, Ding, Yujuan, Hu, Zhiqiang, Wang, Zheng, Yang, Yang, Ng, See-Kiong, Shen, Heng Tao

Artwork analysis is important and fundamental skill for art appreciation, which could enrich personal aesthetic sensibility and facilitate the critical thinking ability. Understanding artworks is challenging due to its subjective nature, diverse inte

Externí odkaz: http://arxiv.org/abs/2408.00491

Zobrazit plný text záznamu

Report

Leveraging Weak Cross-Modal Guidance for Coherence Modelling via Iterative Learning

Autor: Bin, Yi, Liao, Junrong, Ding, Yujuan, Li, Haoxuan, Yang, Yang, Ng, See-Kiong, Shen, Heng Tao

Cross-modal coherence modeling is essential for intelligent systems to help them organize and structure information, thereby understanding and creating content of the physical world coherently like human-beings. Previous work on cross-modal coherence

Externí odkaz: http://arxiv.org/abs/2408.00305

Zobrazit plný text záznamu

Report

Anti-Collapse Loss for Deep Metric Learning Based on Coding Rate Metric

Autor: Jiang, Xiruo, Yao, Yazhou, Dai, Xili, Shen, Fumin, Hua, Xian-Sheng, Shen, Heng-Tao

Deep metric learning (DML) aims to learn a discriminative high-dimensional embedding space for downstream tasks like classification, clustering, and retrieval. Prior literature predominantly focuses on pair-based and proxy-based methods to maximize i

Externí odkaz: http://arxiv.org/abs/2407.03106

Zobrazit plný text záznamu

Report

Alleviating Hallucinations in Large Vision-Language Models through Hallucination-Induced Optimization

Autor: Chen, Beitao, Lyu, Xinyu, Gao, Lianli, Song, Jingkuan, Shen, Heng Tao

Although Large Visual Language Models (LVLMs) have demonstrated exceptional abilities in understanding multimodal data, they invariably suffer from hallucinations, leading to a disconnect between the generated text and the corresponding images. Almos

Externí odkaz: http://arxiv.org/abs/2405.15356

Zobrazit plný text záznamu

Report

AICL: Action In-Context Learning for Video Diffusion Model

Autor: Liu, Jianzhi, Zhu, Junchen, Gao, Lianli, Shen, Heng Tao, Song, Jingkuan

The open-domain video generation models are constrained by the scale of the training video datasets, and some less common actions still cannot be generated. Some researchers explore video editing methods and achieve action generation by editing the s

Externí odkaz: http://arxiv.org/abs/2403.11535

Zobrazit plný text záznamu

Report

Region-aware Distribution Contrast: A Novel Approach to Multi-Task Partially Supervised Learning

Autor: Li, Meixuan, Li, Tianyu, Wang, Guoqing, Wang, Peng, Yang, Yang, Shen, Heng Tao

In this study, we address the intricate challenge of multi-task dense prediction, encompassing tasks such as semantic segmentation, depth estimation, and surface normal estimation, particularly when dealing with partially annotated data (MTPSL). The

Externí odkaz: http://arxiv.org/abs/2403.10252

Zobrazit plný text záznamu

Report

Learning with Imbalanced Noisy Data by Preventing Bias in Sample Selection

Autor: Liu, Huafeng, Sheng, Mengmeng, Sun, Zeren, Yao, Yazhou, Hua, Xian-Sheng, Shen, Heng-Tao

Learning with noisy labels has gained increasing attention because the inevitable imperfect labels in real-world scenarios can substantially hurt the deep model performance. Recent studies tend to regard low-loss samples as clean ones and discard hig

Externí odkaz: http://arxiv.org/abs/2402.11242

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání