Zobrazeno 1 - 10
of 136
pro vyhledávání: '"Zhang, Renrui"'
Autor:
Liu, Jiaming, Liu, Mengzhen, Wang, Zhenyu, Lee, Lily, Zhou, Kaichen, An, Pengju, Yang, Senqiao, Zhang, Renrui, Guo, Yandong, Zhang, Shanghang
A fundamental objective in robot manipulation is to enable models to comprehend visual scenes and execute actions. Although existing robot Multimodal Large Language Models (MLLMs) can handle a range of basic tasks, they still face challenges in two a
Externí odkaz:
http://arxiv.org/abs/2406.04339
Autor:
Fu, Chaoyou, Dai, Yuhan, Luo, Yongdong, Li, Lei, Ren, Shuhuai, Zhang, Renrui, Wang, Zihan, Zhou, Chenyu, Shen, Yunhang, Zhang, Mengdan, Chen, Peixian, Li, Yanwei, Lin, Shaohui, Zhao, Sirui, Li, Ke, Xu, Tong, Zheng, Xiawu, Chen, Enhong, Ji, Rongrong, Sun, Xing
In the quest for artificial general intelligence, Multi-modal Large Language Models (MLLMs) have emerged as a focal point in recent advancements. However, the predominant focus remains on developing their capabilities in static image understanding. T
Externí odkaz:
http://arxiv.org/abs/2405.21075
Autor:
Wang, Jiaze, Wang, Yi, Guo, Ziyu, Zhang, Renrui, Zhou, Donghao, Chen, Guangyong, Liu, Anfeng, Heng, Pheng-Ann
Data augmentation has proven to be a vital tool for enhancing the generalization capabilities of deep learning models, especially in the context of 3D vision where traditional datasets are often limited. Despite previous advancements, existing method
Externí odkaz:
http://arxiv.org/abs/2405.18523
Autor:
Liu, Jiaming, Li, Chenxuan, Wang, Guanqun, Lee, Lily, Zhou, Kaichen, Chen, Sixiang, Xiong, Chuyan, Ge, Jiaxin, Zhang, Renrui, Zhang, Shanghang
Robot manipulation policies have shown unsatisfactory action performance when confronted with novel task or object instances. Hence, the capability to automatically detect and self-correct failure action is essential for a practical robotic system. R
Externí odkaz:
http://arxiv.org/abs/2405.17418
Large Language Models (LLMs) have become pivotal in advancing the field of artificial intelligence, yet their immense sizes pose significant challenges for both fine-tuning and deployment. Current post-training pruning methods, while reducing the siz
Externí odkaz:
http://arxiv.org/abs/2405.16057
Autor:
Lu, Xudong, Zhou, Aojun, Lin, Ziyi, Liu, Qi, Xu, Yuhui, Zhang, Renrui, Wen, Yafei, Ren, Shuai, Gao, Peng, Yan, Junchi, Li, Hongsheng
Recent developments in large-scale pre-trained text-to-image diffusion models have significantly improved the generation of high-fidelity images, particularly with the emergence of diffusion models based on transformer architecture (DiTs). Among thes
Externí odkaz:
http://arxiv.org/abs/2405.14854
Autor:
Gao, Peng, Zhuo, Le, Liu, Dongyang, Du, Ruoyi, Luo, Xu, Qiu, Longtian, Zhang, Yuhang, Lin, Chen, Huang, Rongjie, Geng, Shijie, Zhang, Renrui, Xi, Junlin, Shao, Wenqi, Jiang, Zhengkai, Yang, Tianshuo, Ye, Weicai, Tong, He, He, Jingwen, Qiao, Yu, Li, Hongsheng
Sora unveils the potential of scaling Diffusion Transformer for generating photorealistic images and videos at arbitrary resolutions, aspect ratios, and durations, yet it still lacks sufficient implementation details. In this technical report, we int
Externí odkaz:
http://arxiv.org/abs/2405.05945
Autor:
Ying, Kaining, Meng, Fanqing, Wang, Jin, Li, Zhiqian, Lin, Han, Yang, Yue, Zhang, Hao, Zhang, Wenbo, Lin, Yuqi, Liu, Shuo, Lei, Jiayi, Lu, Quanfeng, Chen, Runjian, Xu, Peng, Zhang, Renrui, Zhang, Haozhe, Gao, Peng, Wang, Yali, Qiao, Yu, Luo, Ping, Zhang, Kaipeng, Shao, Wenqi
Large Vision-Language Models (LVLMs) show significant strides in general-purpose multimodal applications such as visual dialogue and embodied navigation. However, existing multimodal evaluation benchmarks cover a limited number of multimodal tasks te
Externí odkaz:
http://arxiv.org/abs/2404.16006
Autor:
Zhu, Xiangyang, Zhang, Renrui, He, Bowei, Guo, Ziyu, Liu, Jiaming, Xiao, Han, Fu, Chaoyou, Dong, Hao, Gao, Peng
To reduce the reliance on large-scale datasets, recent works in 3D segmentation resort to few-shot learning. Current 3D few-shot segmentation methods first pre-train models on 'seen' classes, and then evaluate their generalization performance on 'uns
Externí odkaz:
http://arxiv.org/abs/2404.04050
Autor:
Jiang, Dongzhi, Song, Guanglu, Wu, Xiaoshi, Zhang, Renrui, Shen, Dazhong, Zong, Zhuofan, Liu, Yu, Li, Hongsheng
Diffusion models have demonstrated great success in the field of text-to-image generation. However, alleviating the misalignment between the text prompts and images is still challenging. The root reason behind the misalignment has not been extensivel
Externí odkaz:
http://arxiv.org/abs/2404.03653