Výsledky vyhledávání

Report

VisScience: An Extensive Benchmark for Evaluating K12 Educational Multi-modal Scientific Reasoning

Autor: Jiang, Zhihuan, Yang, Zhen, Chen, Jinhao, Du, Zhengxiao, Wang, Weihan, Xu, Bin, Tang, Jie

Multi-modal large language models (MLLMs) have demonstrated promising capabilities across various tasks by integrating textual and visual information to achieve visual understanding in complex scenarios. Despite the availability of several benchmarks

Externí odkaz: http://arxiv.org/abs/2409.13730

Zobrazit plný text záznamu

Report

MathGLM-Vision: Solving Mathematical Problems with Multi-Modal Large Language Model

Autor: Yang, Zhen, Chen, Jinhao, Du, Zhengxiao, Yu, Wenmeng, Wang, Weihan, Hong, Wenyi, Jiang, Zhihuan, Xu, Bin, Tang, Jie

Large language models (LLMs) have demonstrated significant capabilities in mathematical reasoning, particularly with text-based mathematical problems. However, current multi-modal large language models (MLLMs), especially those specialized in mathema

Externí odkaz: http://arxiv.org/abs/2409.13729

Zobrazit plný text záznamu

Report

CogVLM2: Visual Language Models for Image and Video Understanding

Beginning with VisualGLM and CogVLM, we are continuously exploring VLMs in pursuit of enhanced vision-language fusion, efficient higher-resolution architecture, and broader modalities and applications. Here we propose the CogVLM2 family, a new genera

Externí odkaz: http://arxiv.org/abs/2408.16500

Zobrazit plný text záznamu

Report

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

Autor: Yang, Zhuoyi, Teng, Jiayan, Zheng, Wendi, Ding, Ming, Huang, Shiyu, Xu, Jiazheng, Yang, Yuanming, Hong, Wenyi, Zhang, Xiaohan, Feng, Guanyu, Yin, Da, Gu, Xiaotao, Zhang, Yuxuan, Wang, Weihan, Cheng, Yean, Liu, Ting, Xu, Bin, Dong, Yuxiao, Tang, Jie

We present CogVideoX, a large-scale text-to-video generation model based on diffusion transformer, which can generate 10-second continuous videos aligned with text prompt, with a frame rate of 16 fps and resolution of 768 * 1360 pixels. Previous vide

Externí odkaz: http://arxiv.org/abs/2408.06072

Zobrazit plný text záznamu

Report

VIPeR: Visual Incremental Place Recognition with Adaptive Mining and Lifelong Learning

Autor: Ming, Yuhang, Xu, Minyang, Yang, Xingrui, Ye, Weicai, Wang, Weihan, Peng, Yong, Dai, Weichen, Kong, Wanzeng

Visual place recognition (VPR) is an essential component of many autonomous and augmented/virtual reality systems. It enables the systems to robustly localize themselves in large-scale environments. Existing VPR methods demonstrate attractive perform

Externí odkaz: http://arxiv.org/abs/2407.21416

Zobrazit plný text záznamu

Report

LVBench: An Extreme Long Video Understanding Benchmark

Autor: Wang, Weihan, He, Zehai, Hong, Wenyi, Cheng, Yean, Zhang, Xiaohan, Qi, Ji, Gu, Xiaotao, Huang, Shiyu, Xu, Bin, Dong, Yuxiao, Ding, Ming, Tang, Jie

Recent progress in multimodal large language models has markedly enhanced the understanding of short videos (typically under one minute), and several evaluation datasets have emerged accordingly. However, these advancements fall short of meeting the

Externí odkaz: http://arxiv.org/abs/2406.08035

Zobrazit plný text záznamu

Report

Benchmarking Neural Radiance Fields for Autonomous Robots: An Overview

Autor: Ming, Yuhang, Yang, Xingrui, Wang, Weihan, Chen, Zheng, Feng, Jinglun, Xing, Yifan, Zhang, Guofeng

Publikováno v: Engineering Applications of Artificial Intelligence, Volume 140, 15 January 2025, 109685

Neural Radiance Fields (NeRF) have emerged as a powerful paradigm for 3D scene representation, offering high-fidelity renderings and reconstructions from a set of sparse and unstructured sensor data. In the context of autonomous robotics, where perce

Externí odkaz: http://arxiv.org/abs/2405.05526

Zobrazit plný text záznamu

Report

Unveiling Decentralization: A Comprehensive Review of Technologies, Comparison, Challenges in Bitcoin, Ethereum, and Solana Blockchain

Autor: Song, Han, Wei, Yihao, Qu, Zhongche, Wang, Weihan

Bitcoin stands as a groundbreaking development in decentralized exchange throughout human history, enabling transactions without the need for intermediaries. By leveraging cryptographic proof mechanisms, Bitcoin eliminates the reliance on third-party

Externí odkaz: http://arxiv.org/abs/2404.04841

Zobrazit plný text záznamu

Report

Stereo-NEC: Enhancing Stereo Visual-Inertial SLAM Initialization with Normal Epipolar Constraints

Autor: Wang, Weihan, Chou, Chieh, Sevagamoorthy, Ganesh, Chen, Kevin, Chen, Zheng, Feng, Ziyue, Xia, Youjie, Cai, Feiyang, Xu, Yi, Mordohai, Philippos

We propose an accurate and robust initialization approach for stereo visual-inertial SLAM systems. Unlike the current state-of-the-art method, which heavily relies on the accuracy of a pure visual SLAM system to estimate inertial variables without up

Externí odkaz: http://arxiv.org/abs/2403.07225

Zobrazit plný text záznamu

Report

CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion

Autor: Zheng, Wendi, Teng, Jiayan, Yang, Zhuoyi, Wang, Weihan, Chen, Jidong, Gu, Xiaotao, Dong, Yuxiao, Ding, Ming, Tang, Jie

Recent advancements in text-to-image generative systems have been largely driven by diffusion models. However, single-stage text-to-image diffusion models still face challenges, in terms of computational efficiency and the refinement of image details

Externí odkaz: http://arxiv.org/abs/2403.05121

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání