Zobrazeno 1 - 10
of 805
pro vyhledávání: '"Lu, Huchuan"'
Autor:
Liu, Tingwei, Zhang, Miao, Liu, Leiye, Zhong, Jialong, Wang, Shuyao, Piao, Yongri, Lu, Huchuan
Recently, the Diffusion Probabilistic Model (DPM)-based methods have achieved substantial success in the field of medical image segmentation. However, most of these methods fail to enable the diffusion model to learn edge features and non-edge featur
Externí odkaz:
http://arxiv.org/abs/2406.14186
Existing vision-language models (VLMs) mostly rely on vision encoders to extract visual features followed by large language models (LLMs) for visual-language tasks. However, the vision encoders set a strong inductive bias in abstracting visual repres
Externí odkaz:
http://arxiv.org/abs/2406.11832
Autor:
Zhang, Zaibin, Tang, Shiyu, Zhang, Yuanhang, Fu, Talas, Wang, Yifan, Liu, Yang, Wang, Dong, Shao, Jing, Wang, Lijun, Lu, Huchuan
Due to the impressive capabilities of multimodal large language models (MLLMs), recent works have focused on employing MLLM-based agents for autonomous driving in large-scale and dynamic environments. However, prevalent approaches often directly tran
Externí odkaz:
http://arxiv.org/abs/2406.03474
Referring Image Segmentation (RIS) consistently requires language and appearance semantics to more understand each other. The need becomes acute especially under hard situations. To achieve, existing works tend to resort to various trans-representing
Externí odkaz:
http://arxiv.org/abs/2405.09006
Different from the context-independent (CI) concepts such as human, car, and airplane, context-dependent (CD) concepts require higher visual understanding ability, such as camouflaged object and medical lesion. Despite the rapid advance of many CD un
Externí odkaz:
http://arxiv.org/abs/2405.01002
Image-text matching remains a challenging task due to heterogeneous semantic diversity across modalities and insufficient distance separability within triplets. Different from previous approaches focusing on enhancing multi-modal representations or e
Externí odkaz:
http://arxiv.org/abs/2404.18114
Recently, Segment Anything Model (SAM) shows exceptional performance in generating high-quality object masks and achieving zero-shot image segmentation. However, as a versatile vision model, SAM is primarily trained with large-scale natural light ima
Externí odkaz:
http://arxiv.org/abs/2404.15700
Recent advances in text-to-image models have opened new frontiers in human-centric generation. However, these models cannot be directly employed to generate images with consistent newly coined identities. In this work, we propose CharacterFactory, a
Externí odkaz:
http://arxiv.org/abs/2404.15677
Object Re-Identification (Re-ID) aims to identify and retrieve specific objects from images captured at different places and times. Recently, object Re-ID has achieved great success with the advances of Vision Transformers (ViT). However, the effects
Externí odkaz:
http://arxiv.org/abs/2404.14985
Autor:
Chen, Kai, Li, Yanze, Zhang, Wenhua, Liu, Yanxin, Li, Pengxiang, Gao, Ruiyuan, Hong, Lanqing, Tian, Meng, Zhao, Xinhai, Li, Zhenguo, Yeung, Dit-Yan, Lu, Huchuan, Jia, Xu
Large Vision-Language Models (LVLMs) have received widespread attention in advancing the interpretable self-driving. Existing evaluations of LVLMs primarily focus on the multi-faceted capabilities in natural circumstances, lacking automated and quant
Externí odkaz:
http://arxiv.org/abs/2404.10595