Výsledky vyhledávání

Report

CriDiff: Criss-cross Injection Diffusion Framework via Generative Pre-train for Prostate Segmentation

Autor: Liu, Tingwei, Zhang, Miao, Liu, Leiye, Zhong, Jialong, Wang, Shuyao, Piao, Yongri, Lu, Huchuan

Recently, the Diffusion Probabilistic Model (DPM)-based methods have achieved substantial success in the field of medical image segmentation. However, most of these methods fail to enable the diffusion model to learn edge features and non-edge featur

Externí odkaz: http://arxiv.org/abs/2406.14186

Zobrazit plný text záznamu

Report

Unveiling Encoder-Free Vision-Language Models

Autor: Diao, Haiwen, Cui, Yufeng, Li, Xiaotong, Wang, Yueze, Lu, Huchuan, Wang, Xinlong

Existing vision-language models (VLMs) mostly rely on vision encoders to extract visual features followed by large language models (LLMs) for visual-language tasks. However, the vision encoders set a strong inductive bias in abstracting visual repres

Externí odkaz: http://arxiv.org/abs/2406.11832

Zobrazit plný text záznamu

Report

AD-H: Autonomous Driving with Hierarchical Agents

Autor: Zhang, Zaibin, Tang, Shiyu, Zhang, Yuanhang, Fu, Talas, Wang, Yifan, Liu, Yang, Wang, Dong, Shao, Jing, Wang, Lijun, Lu, Huchuan

Due to the impressive capabilities of multimodal large language models (MLLMs), recent works have focused on employing MLLM-based agents for autonomous driving in large-scale and dynamic environments. However, prevalent approaches often directly tran

Externí odkaz: http://arxiv.org/abs/2406.03474

Zobrazit plný text záznamu

Report

Spatial Semantic Recurrent Mining for Referring Image Segmentation

Autor: Yang, Jiaxing, Zhang, Lihe, Sun, Jiayu, Lu, Huchuan

Referring Image Segmentation (RIS) consistently requires language and appearance semantics to more understand each other. The need becomes acute especially under hard situations. To achieve, existing works tend to resort to various trans-representing

Externí odkaz: http://arxiv.org/abs/2405.09006

Zobrazit plný text záznamu

Report

Spider: A Unified Framework for Context-dependent Concept Segmentation

Autor: Zhao, Xiaoqi, Pang, Youwei, Ji, Wei, Sheng, Baicheng, Zuo, Jiaming, Zhang, Lihe, Lu, Huchuan

Different from the context-independent (CI) concepts such as human, car, and airplane, context-dependent (CD) concepts require higher visual understanding ability, such as camouflaged object and medical lesion. Despite the rapid advance of many CD un

Externí odkaz: http://arxiv.org/abs/2405.01002

Zobrazit plný text záznamu

Report

Deep Boosting Learning: A Brand-new Cooperative Approach for Image-Text Matching

Autor: Diao, Haiwen, Zhang, Ying, Gao, Shang, Ruan, Xiang, Lu, Huchuan

Image-text matching remains a challenging task due to heterogeneous semantic diversity across modalities and insufficient distance separability within triplets. Different from previous approaches focusing on enhancing multi-modal representations or e

Externí odkaz: http://arxiv.org/abs/2404.18114

Zobrazit plný text záznamu

Report

MAS-SAM: Segment Any Marine Animal with Aggregated Features

Autor: Yan, Tianyu, Wan, Zifu, Deng, Xinhao, Zhang, Pingping, Liu, Yang, Lu, Huchuan

Recently, Segment Anything Model (SAM) shows exceptional performance in generating high-quality object masks and achieving zero-shot image segmentation. However, as a versatile vision model, SAM is primarily trained with large-scale natural light ima

Externí odkaz: http://arxiv.org/abs/2404.15700

Zobrazit plný text záznamu

Report

CharacterFactory: Sampling Consistent Characters with GANs for Diffusion Models

Autor: Wang, Qinghe, Li, Baolu, Li, Xiaomin, Cao, Bing, Ma, Liqian, Lu, Huchuan, Jia, Xu

Recent advances in text-to-image models have opened new frontiers in human-centric generation. However, these models cannot be directly employed to generate images with consistent newly coined identities. In this work, we propose CharacterFactory, a

Externí odkaz: http://arxiv.org/abs/2404.15677

Zobrazit plný text záznamu

Report

Other Tokens Matter: Exploring Global and Local Features of Vision Transformers for Object Re-Identification

Autor: Wang, Yingquan, Zhang, Pingping, Wang, Dong, Lu, Huchuan

Object Re-Identification (Re-ID) aims to identify and retrieve specific objects from images captured at different places and times. Recently, object Re-ID has achieved great success with the advances of Vision Transformers (ViT). However, the effects

Externí odkaz: http://arxiv.org/abs/2404.14985

Zobrazit plný text záznamu

Report

Automated Evaluation of Large Vision-Language Models on Self-driving Corner Cases

Autor: Chen, Kai, Li, Yanze, Zhang, Wenhua, Liu, Yanxin, Li, Pengxiang, Gao, Ruiyuan, Hong, Lanqing, Tian, Meng, Zhao, Xinhai, Li, Zhenguo, Yeung, Dit-Yan, Lu, Huchuan, Jia, Xu

Large Vision-Language Models (LVLMs) have received widespread attention in advancing the interpretable self-driving. Existing evaluations of LVLMs primarily focus on the multi-faceted capabilities in natural circumstances, lacking automated and quant

Externí odkaz: http://arxiv.org/abs/2404.10595

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání