Výsledky vyhledávání - "Huang, Haifeng"

Report

A Refer-and-Ground Multimodal Large Language Model for Biomedicine

Autor: Huang, Xiaoshuang, Huang, Haifeng, Shen, Lingdong, Yang, Yehui, Shang, Fangxin, Liu, Junwei, Liu, Jia

With the rapid development of multimodal large language models (MLLMs), especially their capabilities in visual chat through refer and ground functionalities, their significance is increasingly recognized. However, the biomedical field currently exhi

Externí odkaz: http://arxiv.org/abs/2406.18146

Zobrazit plný text záznamu

Report

MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations

Autor: Lyu, Ruiyuan, Wang, Tai, Lin, Jingli, Yang, Shuai, Mao, Xiaohan, Chen, Yilun, Xu, Runsen, Huang, Haifeng, Zhu, Chenming, Lin, Dahua, Pang, Jiangmiao

With the emergence of LLMs and their integration with other data modalities, multi-modal 3D perception attracts more attention due to its connectivity to the physical world and makes rapid progress. However, limited by existing datasets, previous wor

Externí odkaz: http://arxiv.org/abs/2406.09401

Zobrazit plný text záznamu

Report

Grounded 3D-LLM with Referent Tokens

Autor: Chen, Yilun, Yang, Shuai, Huang, Haifeng, Wang, Tai, Lyu, Ruiyuan, Xu, Runsen, Lin, Dahua, Pang, Jiangmiao

Prior studies on 3D scene understanding have primarily developed specialized models for specific tasks or required task-specific fine-tuning. In this study, we propose Grounded 3D-LLM, which explores the potential of 3D large multi-modal models (3D L

Externí odkaz: http://arxiv.org/abs/2405.10370

Zobrazit plný text záznamu

Report

FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion

Autor: Wang, Zehan, Zhang, Ziang, Cheng, Xize, Huang, Rongjie, Liu, Luping, Ye, Zhenhui, Huang, Haifeng, Zhao, Yang, Jin, Tao, Gao, Peng, Zhao, Zhou

Unified multi-model representation spaces are the foundation of multimodal understanding and generation. However, the billions of model parameters and catastrophic forgetting problems make it challenging to further enhance pre-trained unified spaces.

Externí odkaz: http://arxiv.org/abs/2405.04883

Zobrazit plný text záznamu

Report

SegICL: A Multimodal In-context Learning Framework for Enhanced Segmentation in Medical Imaging

Autor: Shen, Lingdong, Shang, Fangxin, Huang, Xiaoshuang, Yang, Yehui, Huang, Haifeng, Xiang, Shiming

In the field of medical image segmentation, tackling Out-of-Distribution (OOD) segmentation tasks in a cost-effective manner remains a significant challenge. Universal segmentation models is a solution, which aim to generalize across the diverse moda

Externí odkaz: http://arxiv.org/abs/2403.16578

Zobrazit plný text záznamu

Report

Multi-Modal Domain Adaptation Across Video Scenes for Temporal Video Grounding

Autor: Huang, Haifeng, Zhao, Yang, Wang, Zehan, Xia, Yan, Zhao, Zhou

Temporal Video Grounding (TVG) aims to localize the temporal boundary of a specific segment in an untrimmed video based on a given language query. Since datasets in this domain are often gathered from limited video scenes, models tend to overfit to s

Externí odkaz: http://arxiv.org/abs/2312.13633

Zobrazit plný text záznamu

Report

Chat-3D v2: Bridging 3D Scene and Large Language Models with Object Identifiers

Autor: Huang, Haifeng, Wang, Zehan, Huang, Rongjie, Liu, Luping, Cheng, Xize, Zhao, Yang, Jin, Tao, Zhao, Zhou

Recent research has evidenced the significant potentials of Large Language Models (LLMs) in handling challenging tasks within 3D scenes. However, current models are constrained to addressing object-centric tasks, where each question-answer pair focus

Externí odkaz: http://arxiv.org/abs/2312.08168

Zobrazit plný text záznamu

Report

SynFundus-1M: A High-quality Million-scale Synthetic fundus images Dataset with Fifteen Types of Annotation

Autor: Shang, Fangxin, Fu, Jie, Yang, Yehui, Huang, Haifeng, Liu, Junwei, Ma, Lei

Large-scale public datasets with high-quality annotations are rarely available for intelligent medical imaging research, due to data privacy concerns and the cost of annotations. In this paper, we release SynFundus-1M, a high-quality synthetic datase

Externí odkaz: http://arxiv.org/abs/2312.00377

Zobrazit plný text záznamu

Report

Extending Multi-modal Contrastive Representations

Autor: Wang, Zehan, Zhang, Ziang, Liu, Luping, Zhao, Yang, Huang, Haifeng, Jin, Tao, Zhao, Zhou

Multi-modal contrastive representation (MCR) of more than three modalities is critical in multi-modal learning. Although recent methods showcase impressive achievements, the high dependence on large-scale, high-quality paired data and the expensive t

Externí odkaz: http://arxiv.org/abs/2310.08884

Zobrazit plný text záznamu

Report

Chat-3D: Data-efficiently Tuning Large Language Model for Universal Dialogue of 3D Scenes

Autor: Wang, Zehan, Huang, Haifeng, Zhao, Yang, Zhang, Ziang, Zhao, Zhou

3D scene understanding has gained significant attention due to its wide range of applications. However, existing methods for 3D scene understanding are limited to specific downstream tasks, which hinders their practicality in real-world applications.

Externí odkaz: http://arxiv.org/abs/2308.08769

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání