Zobrazeno 1 - 10
of 978
pro vyhledávání: '"Huang, Haifeng"'
Autor:
Huang, Xiaoshuang, Huang, Haifeng, Shen, Lingdong, Yang, Yehui, Shang, Fangxin, Liu, Junwei, Liu, Jia
With the rapid development of multimodal large language models (MLLMs), especially their capabilities in visual chat through refer and ground functionalities, their significance is increasingly recognized. However, the biomedical field currently exhi
Externí odkaz:
http://arxiv.org/abs/2406.18146
Autor:
Lyu, Ruiyuan, Wang, Tai, Lin, Jingli, Yang, Shuai, Mao, Xiaohan, Chen, Yilun, Xu, Runsen, Huang, Haifeng, Zhu, Chenming, Lin, Dahua, Pang, Jiangmiao
With the emergence of LLMs and their integration with other data modalities, multi-modal 3D perception attracts more attention due to its connectivity to the physical world and makes rapid progress. However, limited by existing datasets, previous wor
Externí odkaz:
http://arxiv.org/abs/2406.09401
Autor:
Chen, Yilun, Yang, Shuai, Huang, Haifeng, Wang, Tai, Lyu, Ruiyuan, Xu, Runsen, Lin, Dahua, Pang, Jiangmiao
Prior studies on 3D scene understanding have primarily developed specialized models for specific tasks or required task-specific fine-tuning. In this study, we propose Grounded 3D-LLM, which explores the potential of 3D large multi-modal models (3D L
Externí odkaz:
http://arxiv.org/abs/2405.10370
Autor:
Wang, Zehan, Zhang, Ziang, Cheng, Xize, Huang, Rongjie, Liu, Luping, Ye, Zhenhui, Huang, Haifeng, Zhao, Yang, Jin, Tao, Gao, Peng, Zhao, Zhou
Unified multi-model representation spaces are the foundation of multimodal understanding and generation. However, the billions of model parameters and catastrophic forgetting problems make it challenging to further enhance pre-trained unified spaces.
Externí odkaz:
http://arxiv.org/abs/2405.04883
Autor:
Shen, Lingdong, Shang, Fangxin, Huang, Xiaoshuang, Yang, Yehui, Huang, Haifeng, Xiang, Shiming
In the field of medical image segmentation, tackling Out-of-Distribution (OOD) segmentation tasks in a cost-effective manner remains a significant challenge. Universal segmentation models is a solution, which aim to generalize across the diverse moda
Externí odkaz:
http://arxiv.org/abs/2403.16578
Temporal Video Grounding (TVG) aims to localize the temporal boundary of a specific segment in an untrimmed video based on a given language query. Since datasets in this domain are often gathered from limited video scenes, models tend to overfit to s
Externí odkaz:
http://arxiv.org/abs/2312.13633
Autor:
Huang, Haifeng, Wang, Zehan, Huang, Rongjie, Liu, Luping, Cheng, Xize, Zhao, Yang, Jin, Tao, Zhao, Zhou
Recent research has evidenced the significant potentials of Large Language Models (LLMs) in handling challenging tasks within 3D scenes. However, current models are constrained to addressing object-centric tasks, where each question-answer pair focus
Externí odkaz:
http://arxiv.org/abs/2312.08168
Large-scale public datasets with high-quality annotations are rarely available for intelligent medical imaging research, due to data privacy concerns and the cost of annotations. In this paper, we release SynFundus-1M, a high-quality synthetic datase
Externí odkaz:
http://arxiv.org/abs/2312.00377
Multi-modal contrastive representation (MCR) of more than three modalities is critical in multi-modal learning. Although recent methods showcase impressive achievements, the high dependence on large-scale, high-quality paired data and the expensive t
Externí odkaz:
http://arxiv.org/abs/2310.08884
3D scene understanding has gained significant attention due to its wide range of applications. However, existing methods for 3D scene understanding are limited to specific downstream tasks, which hinders their practicality in real-world applications.
Externí odkaz:
http://arxiv.org/abs/2308.08769