Výsledky vyhledávání

Report

MMedPO: Aligning Medical Vision-Language Models with Clinical-Aware Multimodal Preference Optimization

Autor: Zhu, Kangyu, Xia, Peng, Li, Yun, Zhu, Hongtu, Wang, Sheng, Yao, Huaxiu

The advancement of Large Vision-Language Models (LVLMs) has propelled their application in the medical field. However, Medical LVLMs (Med-LVLMs) encounter factuality challenges due to modality misalignment, where the models prioritize textual knowled

Externí odkaz: http://arxiv.org/abs/2412.06141

Zobrazit plný text záznamu

Report

MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models

Autor: Xia, Peng, Zhu, Kangyu, Li, Haoran, Wang, Tianze, Shi, Weijia, Wang, Sheng, Zhang, Linjun, Zou, James, Yao, Huaxiu

Artificial Intelligence (AI) has demonstrated significant potential in healthcare, particularly in disease diagnosis and treatment planning. Recent progress in Medical Large Vision-Language Models (Med-LVLMs) has opened up new possibilities for inter

Externí odkaz: http://arxiv.org/abs/2410.13085

Zobrazit plný text záznamu

Report

MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models

Autor: Xia, Peng, Han, Siwei, Qiu, Shi, Zhou, Yiyang, Wang, Zhaoyang, Zheng, Wenhao, Chen, Zhaorun, Cui, Chenhang, Ding, Mingyu, Li, Linjie, Wang, Lijuan, Yao, Huaxiu

Interleaved multimodal comprehension and generation, enabling models to produce and interpret both images and text in arbitrary sequences, have become a pivotal area in multimodal learning. Despite significant advancements, the evaluation of this cap

Externí odkaz: http://arxiv.org/abs/2410.10139

Zobrazit plný text záznamu

Report

RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models

Autor: Xia, Peng, Zhu, Kangyu, Li, Haoran, Zhu, Hongtu, Li, Yun, Li, Gang, Zhang, Linjun, Yao, Huaxiu

The recent emergence of Medical Large Vision Language Models (Med-LVLMs) has enhanced medical diagnosis. However, current Med-LVLMs frequently encounter factual issues, often generating responses that do not align with established medical facts. Retr

Externí odkaz: http://arxiv.org/abs/2407.05131

Zobrazit plný text záznamu

Report

TP-DRSeg: Improving Diabetic Retinopathy Lesion Segmentation with Explicit Text-Prompts Assisted SAM

Autor: Li, Wenxue, Xiong, Xinyu, Xia, Peng, Ju, Lie, Ge, Zongyuan

Recent advances in large foundation models, such as the Segment Anything Model (SAM), have demonstrated considerable promise across various tasks. Despite their progress, these models still encounter challenges in specialized medical image analysis,

Externí odkaz: http://arxiv.org/abs/2406.15764

Zobrazit plný text záznamu

Report

OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding

Autor: Hu, Ming, Xia, Peng, Wang, Lin, Yan, Siyuan, Tang, Feilong, Xu, Zhongxing, Luo, Yimin, Song, Kaimin, Leitner, Jurgen, Cheng, Xuelian, Cheng, Jun, Liu, Chi, Zhou, Kaijing, Ge, Zongyuan

Surgical scene perception via videos is critical for advancing robotic surgery, telesurgery, and AI-assisted surgery, particularly in ophthalmology. However, the scarcity of diverse and richly annotated video datasets has hindered the development of

Externí odkaz: http://arxiv.org/abs/2406.07471

Zobrazit plný text záznamu

Report

Generalizing to Unseen Domains in Diabetic Retinopathy with Disentangled Representations

Autor: Xia, Peng, Hu, Ming, Tang, Feilong, Li, Wenxue, Zheng, Wenhao, Ju, Lie, Duan, Peibo, Yao, Huaxiu, Ge, Zongyuan

Diabetic Retinopathy (DR), induced by diabetes, poses a significant risk of visual impairment. Accurate and effective grading of DR aids in the treatment of this condition. Yet existing models experience notable performance degradation on unseen doma

Externí odkaz: http://arxiv.org/abs/2406.06384

Zobrazit plný text záznamu

Report

CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models

Artificial intelligence has significantly impacted medical applications, particularly with the advent of Medical Large Vision Language Models (Med-LVLMs), sparking optimism for the future of automated and personalized healthcare. However, the trustwo

Externí odkaz: http://arxiv.org/abs/2406.06007

Zobrazit plný text záznamu

Report

Diffusion Model Driven Test-Time Image Adaptation for Robust Skin Lesion Classification

Autor: Hu, Ming, Yan, Siyuan, Xia, Peng, Tang, Feilong, Li, Wenxue, Duan, Peibo, Zhang, Lin, Ge, Zongyuan

Deep learning-based diagnostic systems have demonstrated potential in skin disease diagnosis. However, their performance can easily degrade on test domains due to distribution shifts caused by input-level corruptions, such as imaging equipment variab

Externí odkaz: http://arxiv.org/abs/2405.11289

Zobrazit plný text záznamu

Report

HGCLIP: Exploring Vision-Language Models with Graph Representations for Hierarchical Understanding

Autor: Xia, Peng, Yu, Xingtong, Hu, Ming, Ju, Lie, Wang, Zhiyong, Duan, Peibo, Ge, Zongyuan

Object categories are typically organized into a multi-granularity taxonomic hierarchy. When classifying categories at different hierarchy levels, traditional uni-modal approaches focus primarily on image features, revealing limitations in complex sc

Externí odkaz: http://arxiv.org/abs/2311.14064

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání