Zobrazeno 1 - 10
of 11 203
pro vyhledávání: '"TANG Jie"'
Intelligent reflecting surface (IRS) has become a cost-effective solution for constructing a smart and adaptive radio environment. Most previous works on IRS have jointly designed the active and passive precoding based on perfectly or partially known
Externí odkaz:
http://arxiv.org/abs/2409.14088
Autor:
Jiang, Zhihuan, Yang, Zhen, Chen, Jinhao, Du, Zhengxiao, Wang, Weihan, Xu, Bin, Dong, Yuxiao, Tang, Jie
Multi-modal large language models (MLLMs) have demonstrated promising capabilities across various tasks by integrating textual and visual information to achieve visual understanding in complex scenarios. Despite the availability of several benchmarks
Externí odkaz:
http://arxiv.org/abs/2409.13730
Autor:
Yang, Zhen, Chen, Jinhao, Du, Zhengxiao, Yu, Wenmeng, Wang, Weihan, Hong, Wenyi, Jiang, Zhihuan, Xu, Bin, Dong, Yuxiao, Tang, Jie
Large language models (LLMs) have demonstrated significant capabilities in mathematical reasoning, particularly with text-based mathematical problems. However, current multi-modal large language models (MLLMs), especially those specialized in mathema
Externí odkaz:
http://arxiv.org/abs/2409.13729
Autor:
Hong, Wenyi, Wang, Weihan, Ding, Ming, Yu, Wenmeng, Lv, Qingsong, Wang, Yan, Cheng, Yean, Huang, Shiyu, Ji, Junhui, Xue, Zhao, Zhao, Lei, Yang, Zhuoyi, Gu, Xiaotao, Zhang, Xiaohan, Feng, Guanyu, Yin, Da, Wang, Zihan, Qi, Ji, Song, Xixuan, Zhang, Peng, Liu, Debing, Xu, Bin, Li, Juanzi, Dong, Yuxiao, Tang, Jie
Beginning with VisualGLM and CogVLM, we are continuously exploring VLMs in pursuit of enhanced vision-language fusion, efficient higher-resolution architecture, and broader modalities and applications. Here we propose the CogVLM2 family, a new genera
Externí odkaz:
http://arxiv.org/abs/2408.16500
Large Language Models (LLMs) are becoming increasingly powerful and capable of handling complex tasks, e.g., building single agents and multi-agent systems. Compared to single agents, multi-agent systems have higher requirements for the collaboration
Externí odkaz:
http://arxiv.org/abs/2408.15971
Autor:
Gui, Jiayi, Liu, Yiming, Cheng, Jiale, Gu, Xiaotao, Liu, Xiao, Wang, Hongning, Dong, Yuxiao, Tang, Jie, Huang, Minlie
Large Language Models (LLMs) have demonstrated notable capabilities across various tasks, showcasing complex problem-solving abilities. Understanding and executing complex rules, along with multi-step planning, are fundamental to logical reasoning an
Externí odkaz:
http://arxiv.org/abs/2408.15778
Autor:
Bai, Yushi, Zhang, Jiajie, Lv, Xin, Zheng, Linzhi, Zhu, Siqi, Hou, Lei, Dong, Yuxiao, Tang, Jie, Li, Juanzi
Current long context large language models (LLMs) can process inputs up to 100,000 tokens, yet struggle to generate outputs exceeding even a modest length of 2,000 words. Through controlled experiments, we find that the model's effective generation l
Externí odkaz:
http://arxiv.org/abs/2408.07055
Autor:
Liu, Xiao, Zhang, Tianjie, Gu, Yu, Iong, Iat Long, Xu, Yifan, Song, Xixuan, Zhang, Shudan, Lai, Hanyu, Liu, Xinyi, Zhao, Hanlin, Sun, Jiadai, Yang, Xinyue, Yang, Yu, Qi, Zehan, Yao, Shuntian, Sun, Xueqiao, Cheng, Siyi, Zheng, Qinkai, Yu, Hao, Zhang, Hanchen, Hong, Wenyi, Ding, Ming, Pan, Lihang, Gu, Xiaotao, Zeng, Aohan, Du, Zhengxiao, Song, Chan Hee, Su, Yu, Dong, Yuxiao, Tang, Jie
Large Multimodal Models (LMMs) have ushered in a new era in artificial intelligence, merging capabilities in both language and vision to form highly capable Visual Foundation Agents. These agents are postulated to excel across a myriad of tasks, pote
Externí odkaz:
http://arxiv.org/abs/2408.06327
Autor:
Yang, Zhuoyi, Teng, Jiayan, Zheng, Wendi, Ding, Ming, Huang, Shiyu, Xu, Jiazheng, Yang, Yuanming, Hong, Wenyi, Zhang, Xiaohan, Feng, Guanyu, Yin, Da, Gu, Xiaotao, Zhang, Yuxuan, Wang, Weihan, Cheng, Yean, Liu, Ting, Xu, Bin, Dong, Yuxiao, Tang, Jie
We present CogVideoX, a large-scale text-to-video generation model based on diffusion transformer, which can generate 10-second continuous videos aligned with text prompt, with a frame rate of 16 fps and resolution of 768 * 1360 pixels. Previous vide
Externí odkaz:
http://arxiv.org/abs/2408.06072
While existing Audio-Visual Speech Separation (AVSS) methods primarily concentrate on the audio-visual fusion strategy for two-speaker separation, they demonstrate a severe performance drop in the multi-speaker separation scenarios. Typically, AVSS m
Externí odkaz:
http://arxiv.org/abs/2407.19224