Zobrazeno 1 - 10
of 4 964
pro vyhledávání: '"xu, Hang"'
Autor:
Wang, Chunwei, Lu, Guansong, Yang, Junwei, Huang, Runhui, Han, Jianhua, Hou, Lu, Zhang, Wei, Xu, Hang
In this paper, we introduce ILLUME, a unified multimodal large language model (MLLM) that seamlessly integrates multimodal understanding and generation capabilities within a single large language model through a unified next-token prediction formulat
Externí odkaz:
http://arxiv.org/abs/2412.06673
Autor:
Peng, Wujian, Meng, Lingchen, Chen, Yitong, Xie, Yiweng, Liu, Yang, Gui, Tao, Xu, Hang, Qiu, Xipeng, Wu, Zuxuan, Jiang, Yu-Gang
Large Multimodal Models (LMMs) have made significant breakthroughs with the advancement of instruction tuning. However, while existing models can understand images and videos at a holistic level, they still struggle with instance-level understanding
Externí odkaz:
http://arxiv.org/abs/2412.03565
Autor:
Yu, Junqiu, Ren, Xinlin, Gu, Yongchong, Lin, Haitao, Wang, Tianyu, Zhu, Yi, Xu, Hang, Jiang, Yu-Gang, Xue, Xiangyang, Fu, Yanwei
Language-guided robotic grasping is a rapidly advancing field where robots are instructed using human language to grasp specific objects. However, existing methods often depend on dense camera views and struggle to quickly update scenes, limiting the
Externí odkaz:
http://arxiv.org/abs/2412.02140
Autor:
Sun, Jianhan, Lv, Jianfeng, Tian, Shang, Liu, Juntao, Zhang, Zihao, Xu, Hang, Lin, Lin, Huang, Senlin
DC-SRF-II gun, a high-brightness continuous-wave photocathode gun, has greater potential in electron beam irradiation applications. This paper presents the in-vacuum and in-air irradiation dosimetry study of the high repetition rate electron beam fro
Externí odkaz:
http://arxiv.org/abs/2411.16247
Autor:
Xiang, Kun, Liu, Zhili, Jiang, Zihao, Nie, Yunshuang, Huang, Runhui, Fan, Haoxiang, Li, Hanhui, Huang, Weiran, Zeng, Yihan, Han, Jianhua, Hong, Lanqing, Xu, Hang, Liang, Xiaodan
In this paper, we address the challenging task of multimodal mathematical reasoning by incorporating the ability of ``slow thinking" into multimodal large language models (MLLMs). Contrary to existing methods that rely on direct or fast thinking, our
Externí odkaz:
http://arxiv.org/abs/2411.11930
Recent advancements utilizing large-scale video data for learning video generation models demonstrate significant potential in understanding complex physical dynamics. It suggests the feasibility of leveraging diverse robot trajectory data to develop
Externí odkaz:
http://arxiv.org/abs/2411.09153
Autor:
Zhang, Kaidong, Ren, Pengzhen, Lin, Bingqian, Lin, Junfan, Ma, Shikui, Xu, Hang, Liang, Xiaodan
Language-guided robotic manipulation is a challenging task that requires an embodied agent to follow abstract user instructions to accomplish various complex manipulation tasks. Previous work trivially fitting the data without revealing the relation
Externí odkaz:
http://arxiv.org/abs/2410.10394
This study utilizes deep reinforcement learning (DRL) to develop flow control strategies for circular and square cylinders, enhancing energy efficiency and minimizing energy consumption while addressing the limitations of traditional methods.We find
Externí odkaz:
http://arxiv.org/abs/2410.00424
By leveraging the high dimensional nonlinear mapping capabilities of artificial neural networks in conjunction with the powerful control mechanisms of reinforcement learning, we attain real-time, precise modulation of synthetic jet flow rates over el
Externí odkaz:
http://arxiv.org/abs/2410.00421
Autor:
Chen, Kai, Gou, Yunhao, Huang, Runhui, Liu, Zhili, Tan, Daxin, Xu, Jing, Wang, Chunwei, Zhu, Yi, Zeng, Yihan, Yang, Kuo, Wang, Dingdong, Xiang, Kun, Li, Haoyuan, Bai, Haoli, Han, Jianhua, Li, Xiaohui, Jin, Weike, Xie, Nian, Zhang, Yu, Kwok, James T., Zhao, Hengshuang, Liang, Xiaodan, Yeung, Dit-Yan, Chen, Xiao, Li, Zhenguo, Zhang, Wei, Liu, Qun, Yao, Jun, Hong, Lanqing, Hou, Lu, Xu, Hang
GPT-4o, an omni-modal model that enables vocal conversations with diverse emotions and tones, marks a milestone for omni-modal foundation models. However, empowering Large Language Models to perceive and generate images, texts, and speeches end-to-en
Externí odkaz:
http://arxiv.org/abs/2409.18042