Zobrazeno 1 - 10
of 508
pro vyhledávání: '"WANG Weihan"'
Multi-modal large language models (MLLMs) have demonstrated promising capabilities across various tasks by integrating textual and visual information to achieve visual understanding in complex scenarios. Despite the availability of several benchmarks
Externí odkaz:
http://arxiv.org/abs/2409.13730
Autor:
Yang, Zhen, Chen, Jinhao, Du, Zhengxiao, Yu, Wenmeng, Wang, Weihan, Hong, Wenyi, Jiang, Zhihuan, Xu, Bin, Tang, Jie
Large language models (LLMs) have demonstrated significant capabilities in mathematical reasoning, particularly with text-based mathematical problems. However, current multi-modal large language models (MLLMs), especially those specialized in mathema
Externí odkaz:
http://arxiv.org/abs/2409.13729
Autor:
Hong, Wenyi, Wang, Weihan, Ding, Ming, Yu, Wenmeng, Lv, Qingsong, Wang, Yan, Cheng, Yean, Huang, Shiyu, Ji, Junhui, Xue, Zhao, Zhao, Lei, Yang, Zhuoyi, Gu, Xiaotao, Zhang, Xiaohan, Feng, Guanyu, Yin, Da, Wang, Zihan, Qi, Ji, Song, Xixuan, Zhang, Peng, Liu, Debing, Xu, Bin, Li, Juanzi, Dong, Yuxiao, Tang, Jie
Beginning with VisualGLM and CogVLM, we are continuously exploring VLMs in pursuit of enhanced vision-language fusion, efficient higher-resolution architecture, and broader modalities and applications. Here we propose the CogVLM2 family, a new genera
Externí odkaz:
http://arxiv.org/abs/2408.16500
Autor:
Yang, Zhuoyi, Teng, Jiayan, Zheng, Wendi, Ding, Ming, Huang, Shiyu, Xu, Jiazheng, Yang, Yuanming, Hong, Wenyi, Zhang, Xiaohan, Feng, Guanyu, Yin, Da, Gu, Xiaotao, Zhang, Yuxuan, Wang, Weihan, Cheng, Yean, Liu, Ting, Xu, Bin, Dong, Yuxiao, Tang, Jie
We present CogVideoX, a large-scale text-to-video generation model based on diffusion transformer, which can generate 10-second continuous videos aligned with text prompt, with a frame rate of 16 fps and resolution of 768 * 1360 pixels. Previous vide
Externí odkaz:
http://arxiv.org/abs/2408.06072
Autor:
Ming, Yuhang, Xu, Minyang, Yang, Xingrui, Ye, Weicai, Wang, Weihan, Peng, Yong, Dai, Weichen, Kong, Wanzeng
Visual place recognition (VPR) is an essential component of many autonomous and augmented/virtual reality systems. It enables the systems to robustly localize themselves in large-scale environments. Existing VPR methods demonstrate attractive perform
Externí odkaz:
http://arxiv.org/abs/2407.21416
Autor:
Wang, Weihan, He, Zehai, Hong, Wenyi, Cheng, Yean, Zhang, Xiaohan, Qi, Ji, Gu, Xiaotao, Huang, Shiyu, Xu, Bin, Dong, Yuxiao, Ding, Ming, Tang, Jie
Recent progress in multimodal large language models has markedly enhanced the understanding of short videos (typically under one minute), and several evaluation datasets have emerged accordingly. However, these advancements fall short of meeting the
Externí odkaz:
http://arxiv.org/abs/2406.08035
Autor:
Ming, Yuhang, Yang, Xingrui, Wang, Weihan, Chen, Zheng, Feng, Jinglun, Xing, Yifan, Zhang, Guofeng
Publikováno v:
Engineering Applications of Artificial Intelligence, Volume 140, 15 January 2025, 109685
Neural Radiance Fields (NeRF) have emerged as a powerful paradigm for 3D scene representation, offering high-fidelity renderings and reconstructions from a set of sparse and unstructured sensor data. In the context of autonomous robotics, where perce
Externí odkaz:
http://arxiv.org/abs/2405.05526
Bitcoin stands as a groundbreaking development in decentralized exchange throughout human history, enabling transactions without the need for intermediaries. By leveraging cryptographic proof mechanisms, Bitcoin eliminates the reliance on third-party
Externí odkaz:
http://arxiv.org/abs/2404.04841
Autor:
Wang, Weihan, Chou, Chieh, Sevagamoorthy, Ganesh, Chen, Kevin, Chen, Zheng, Feng, Ziyue, Xia, Youjie, Cai, Feiyang, Xu, Yi, Mordohai, Philippos
We propose an accurate and robust initialization approach for stereo visual-inertial SLAM systems. Unlike the current state-of-the-art method, which heavily relies on the accuracy of a pure visual SLAM system to estimate inertial variables without up
Externí odkaz:
http://arxiv.org/abs/2403.07225
Autor:
Zheng, Wendi, Teng, Jiayan, Yang, Zhuoyi, Wang, Weihan, Chen, Jidong, Gu, Xiaotao, Dong, Yuxiao, Ding, Ming, Tang, Jie
Recent advancements in text-to-image generative systems have been largely driven by diffusion models. However, single-stage text-to-image diffusion models still face challenges, in terms of computational efficiency and the refinement of image details
Externí odkaz:
http://arxiv.org/abs/2403.05121