Zobrazeno 1 - 10
of 357
pro vyhledávání: '"Gao, LianLi"'
Autor:
Su, Sitong, Cai, Xiao, Gao, Lianli, Zeng, Pengpeng, Du, Qinhong, Li, Mengqi, Shen, Heng Tao, Song, Jingkuan
Recent advances in General Text-to-3D (GT23D) have been significant. However, the lack of a benchmark has hindered systematic evaluation and progress due to issues in datasets and metrics: 1) The largest 3D dataset Objaverse suffers from omitted anno
Externí odkaz:
http://arxiv.org/abs/2412.09997
Autor:
Cui, Chenhang, Deng, Gelei, Zhang, An, Zheng, Jingnan, Li, Yicong, Gao, Lianli, Zhang, Tianwei, Chua, Tat-Seng
Recent advances in Large Vision-Language Models (LVLMs) have showcased strong reasoning abilities across multiple modalities, achieving significant breakthroughs in various real-world applications. Despite this great success, the safety guardrail of
Externí odkaz:
http://arxiv.org/abs/2411.11496
Autor:
Cai, Xiao, Zeng, Pengpeng, Gao, Lianli, Zhu, Junchen, Zhang, Jiaxin, Su, Sitong, Shen, Heng Tao, Song, Jingkuan
Recent advancements in generic 3D content generation from text prompts have been remarkable by fine-tuning text-to-image diffusion (T2I) models or employing these T2I models as priors to learn a general text-to-3D model. While fine-tuning-based metho
Externí odkaz:
http://arxiv.org/abs/2410.07658
Mitigating the detrimental effects of noisy labels on the training process has become increasingly critical, as obtaining entirely clean or human-annotated samples for large-scale pre-training tasks is often impractical. Nonetheless, existing noise m
Externí odkaz:
http://arxiv.org/abs/2410.01944
Autor:
Luo, Run, Zhang, Haonan, Chen, Longze, Lin, Ting-En, Liu, Xiong, Wu, Yuchuan, Yang, Min, Wang, Minzheng, Zeng, Pengpeng, Gao, Lianli, Shen, Heng Tao, Li, Yunshui, Xia, Xiaobo, Huang, Fei, Song, Jingkuan, Li, Yongbin
The development of Multimodal Large Language Models (MLLMs) has seen significant advancements with increasing demands in various fields (e.g., multimodal agents, embodied intelligence). While model-driven approaches attempt to enhance MLLMs capabilit
Externí odkaz:
http://arxiv.org/abs/2409.05840
Targeted adversarial attack, which aims to mislead a model to recognize any image as a target object by imperceptible perturbations, has become a mainstream tool for vulnerability assessment of deep neural networks (DNNs). Since existing targeted att
Externí odkaz:
http://arxiv.org/abs/2407.12292
Although Large Visual Language Models (LVLMs) have demonstrated exceptional abilities in understanding multimodal data, they invariably suffer from hallucinations, leading to a disconnect between the generated text and the corresponding images. Almos
Externí odkaz:
http://arxiv.org/abs/2405.15356
Autor:
Zhang, Haonan, Zeng, Pengpeng, Gao, Lianli, Song, Jingkuan, Duan, Yihang, Lyu, Xinyu, Shen, Hengtao
Adapting large-scale image-text pre-training models, e.g., CLIP, to the video domain represents the current state-of-the-art for text-video retrieval. The primary approaches involve transferring text-video pairs to a common embedding space and levera
Externí odkaz:
http://arxiv.org/abs/2405.12710
Autor:
Zhu, Xiaosu, Sheng, Hualian, Cai, Sijia, Deng, Bing, Yang, Shaopeng, Liang, Qiao, Chen, Ken, Gao, Lianli, Song, Jingkuan, Ye, Jieping
We introduce RoScenes, the largest multi-view roadside perception dataset, which aims to shed light on the development of vision-centric Bird's Eye View (BEV) approaches for more challenging traffic scenes. The highlights of RoScenes include signific
Externí odkaz:
http://arxiv.org/abs/2405.09883
The open-domain video generation models are constrained by the scale of the training video datasets, and some less common actions still cannot be generated. Some researchers explore video editing methods and achieve action generation by editing the s
Externí odkaz:
http://arxiv.org/abs/2403.11535