Zobrazeno 1 - 10
of 22
pro vyhledávání: '"Hong, Wenyi"'
Autor:
Yang, Zhen, Chen, Jinhao, Du, Zhengxiao, Yu, Wenmeng, Wang, Weihan, Hong, Wenyi, Jiang, Zhihuan, Xu, Bin, Dong, Yuxiao, Tang, Jie
Large language models (LLMs) have demonstrated significant capabilities in mathematical reasoning, particularly with text-based mathematical problems. However, current multi-modal large language models (MLLMs), especially those specialized in mathema
Externí odkaz:
http://arxiv.org/abs/2409.13729
Autor:
Hong, Wenyi, Wang, Weihan, Ding, Ming, Yu, Wenmeng, Lv, Qingsong, Wang, Yan, Cheng, Yean, Huang, Shiyu, Ji, Junhui, Xue, Zhao, Zhao, Lei, Yang, Zhuoyi, Gu, Xiaotao, Zhang, Xiaohan, Feng, Guanyu, Yin, Da, Wang, Zihan, Qi, Ji, Song, Xixuan, Zhang, Peng, Liu, Debing, Xu, Bin, Li, Juanzi, Dong, Yuxiao, Tang, Jie
Beginning with VisualGLM and CogVLM, we are continuously exploring VLMs in pursuit of enhanced vision-language fusion, efficient higher-resolution architecture, and broader modalities and applications. Here we propose the CogVLM2 family, a new genera
Externí odkaz:
http://arxiv.org/abs/2408.16500
Autor:
Liu, Xiao, Zhang, Tianjie, Gu, Yu, Iong, Iat Long, Xu, Yifan, Song, Xixuan, Zhang, Shudan, Lai, Hanyu, Liu, Xinyi, Zhao, Hanlin, Sun, Jiadai, Yang, Xinyue, Yang, Yu, Qi, Zehan, Yao, Shuntian, Sun, Xueqiao, Cheng, Siyi, Zheng, Qinkai, Yu, Hao, Zhang, Hanchen, Hong, Wenyi, Ding, Ming, Pan, Lihang, Gu, Xiaotao, Zeng, Aohan, Du, Zhengxiao, Song, Chan Hee, Su, Yu, Dong, Yuxiao, Tang, Jie
Large Multimodal Models (LMMs) have ushered in a new era in artificial intelligence, merging capabilities in both language and vision to form highly capable Visual Foundation Agents. These agents are postulated to excel across a myriad of tasks, pote
Externí odkaz:
http://arxiv.org/abs/2408.06327
Autor:
Yang, Zhuoyi, Teng, Jiayan, Zheng, Wendi, Ding, Ming, Huang, Shiyu, Xu, Jiazheng, Yang, Yuanming, Hong, Wenyi, Zhang, Xiaohan, Feng, Guanyu, Yin, Da, Gu, Xiaotao, Zhang, Yuxuan, Wang, Weihan, Cheng, Yean, Liu, Ting, Xu, Bin, Dong, Yuxiao, Tang, Jie
We present CogVideoX, a large-scale text-to-video generation model based on diffusion transformer, which can generate 10-second continuous videos aligned with text prompt, with a frame rate of 16 fps and resolution of 768 * 1360 pixels. Previous vide
Externí odkaz:
http://arxiv.org/abs/2408.06072
Autor:
Wang, Weihan, He, Zehai, Hong, Wenyi, Cheng, Yean, Zhang, Xiaohan, Qi, Ji, Gu, Xiaotao, Huang, Shiyu, Xu, Bin, Dong, Yuxiao, Ding, Ming, Tang, Jie
Recent progress in multimodal large language models has markedly enhanced the understanding of short videos (typically under one minute), and several evaluation datasets have emerged accordingly. However, these advancements fall short of meeting the
Externí odkaz:
http://arxiv.org/abs/2406.08035
Autor:
Yang, Zhuoyi, Jiang, Heyang, Hong, Wenyi, Teng, Jiayan, Zheng, Wendi, Dong, Yuxiao, Ding, Ming, Tang, Jie
Diffusion models have shown remarkable performance in image generation in recent years. However, due to a quadratic increase in memory during generating ultra-high-resolution images (e.g. 4096*4096), the resolution of generated images is often limite
Externí odkaz:
http://arxiv.org/abs/2405.04312
Autor:
Qi, Ji, Ding, Ming, Wang, Weihan, Bai, Yushi, Lv, Qingsong, Hong, Wenyi, Xu, Bin, Hou, Lei, Li, Juanzi, Dong, Yuxiao, Tang, Jie
Vision-Language Models (VLMs) have demonstrated their broad effectiveness thanks to extensive training in aligning visual instructions to responses. However, such training of conclusive alignment leads models to ignore essential visual reasoning, fur
Externí odkaz:
http://arxiv.org/abs/2402.04236
Autor:
Hong, Wenyi, Wang, Weihan, Lv, Qingsong, Xu, Jiazheng, Yu, Wenmeng, Ji, Junhui, Wang, Yan, Wang, Zihan, Zhang, Yuxuan, Li, Juanzi, Xu, Bin, Dong, Yuxiao, Ding, Ming, Tang, Jie
People are spending an enormous amount of time on digital devices through graphical user interfaces (GUIs), e.g., computer or smartphone screens. Large language models (LLMs) such as ChatGPT can assist people in tasks like writing emails, but struggl
Externí odkaz:
http://arxiv.org/abs/2312.08914
Autor:
Wang, Weihan, Lv, Qingsong, Yu, Wenmeng, Hong, Wenyi, Qi, Ji, Wang, Yan, Ji, Junhui, Yang, Zhuoyi, Zhao, Lei, Song, Xixuan, Xu, Jiazheng, Xu, Bin, Li, Juanzi, Dong, Yuxiao, Ding, Ming, Tang, Jie
We introduce CogVLM, a powerful open-source visual language foundation model. Different from the popular shallow alignment method which maps image features into the input space of language model, CogVLM bridges the gap between the frozen pretrained l
Externí odkaz:
http://arxiv.org/abs/2311.03079
Autor:
Teng, Jiayan, Zheng, Wendi, Ding, Ming, Hong, Wenyi, Wangni, Jianqiao, Yang, Zhuoyi, Tang, Jie
Diffusion models achieved great success in image synthesis, but still face challenges in high-resolution generation. Through the lens of discrete cosine transformation, we find the main reason is that \emph{the same noise level on a higher resolution
Externí odkaz:
http://arxiv.org/abs/2309.03350