Zobrazeno 1 - 10
of 843
pro vyhledávání: '"Shen, Heng Tao"'
Autor:
Luo, Run, Zhang, Haonan, Chen, Longze, Lin, Ting-En, Liu, Xiong, Wu, Yuchuan, Yang, Min, Wang, Minzheng, Zeng, Pengpeng, Gao, Lianli, Shen, Heng Tao, Li, Yunshui, Xia, Xiaobo, Huang, Fei, Song, Jingkuan, Li, Yongbin
The development of Multimodal Large Language Models (MLLMs) has seen significant advancements with increasing demands in various fields (e.g., multimodal agents, embodied intelligence). While model-driven approaches attempt to enhance MLLMs capabilit
Externí odkaz:
http://arxiv.org/abs/2409.05840
Normalizing flows, a category of probabilistic models famed for their capabilities in modeling complex data distributions, have exhibited remarkable efficacy in unsupervised anomaly detection. This paper explores the potential of normalizing flows in
Externí odkaz:
http://arxiv.org/abs/2409.00942
Personalized text-to-image generation has gained significant attention for its capability to generate high-fidelity portraits of specific identities conditioned on user-defined prompts. Existing methods typically involve test-time fine-tuning or inst
Externí odkaz:
http://arxiv.org/abs/2408.06740
Autor:
Bin, Yi, Shi, Wenhao, Ding, Yujuan, Hu, Zhiqiang, Wang, Zheng, Yang, Yang, Ng, See-Kiong, Shen, Heng Tao
Artwork analysis is important and fundamental skill for art appreciation, which could enrich personal aesthetic sensibility and facilitate the critical thinking ability. Understanding artworks is challenging due to its subjective nature, diverse inte
Externí odkaz:
http://arxiv.org/abs/2408.00491
Cross-modal coherence modeling is essential for intelligent systems to help them organize and structure information, thereby understanding and creating content of the physical world coherently like human-beings. Previous work on cross-modal coherence
Externí odkaz:
http://arxiv.org/abs/2408.00305
Deep metric learning (DML) aims to learn a discriminative high-dimensional embedding space for downstream tasks like classification, clustering, and retrieval. Prior literature predominantly focuses on pair-based and proxy-based methods to maximize i
Externí odkaz:
http://arxiv.org/abs/2407.03106
Although Large Visual Language Models (LVLMs) have demonstrated exceptional abilities in understanding multimodal data, they invariably suffer from hallucinations, leading to a disconnect between the generated text and the corresponding images. Almos
Externí odkaz:
http://arxiv.org/abs/2405.15356
The open-domain video generation models are constrained by the scale of the training video datasets, and some less common actions still cannot be generated. Some researchers explore video editing methods and achieve action generation by editing the s
Externí odkaz:
http://arxiv.org/abs/2403.11535
In this study, we address the intricate challenge of multi-task dense prediction, encompassing tasks such as semantic segmentation, depth estimation, and surface normal estimation, particularly when dealing with partially annotated data (MTPSL). The
Externí odkaz:
http://arxiv.org/abs/2403.10252
Learning with noisy labels has gained increasing attention because the inevitable imperfect labels in real-world scenarios can substantially hurt the deep model performance. Recent studies tend to regard low-loss samples as clean ones and discard hig
Externí odkaz:
http://arxiv.org/abs/2402.11242