Výsledky vyhledávání

Report

MinerU: An Open-Source Solution for Precise Document Content Extraction

Autor: Wang, Bin, Xu, Chao, Zhao, Xiaomeng, Ouyang, Linke, Wu, Fan, Zhao, Zhiyuan, Xu, Rui, Liu, Kaiwen, Qu, Yuan, Shang, Fukai, Zhang, Bo, Wei, Liqun, Sui, Zhihao, Li, Wei, Shi, Botian, Qiao, Yu, Lin, Dahua, He, Conghui

Document content analysis has been a crucial research area in computer vision. Despite significant advancements in methods such as OCR, layout detection, and formula recognition, existing open-source solutions struggle to consistently deliver high-qu

Externí odkaz: http://arxiv.org/abs/2409.18839

Zobrazit plný text záznamu

Report

DreamForge: Motion-Aware Autoregressive Video Generation for Multi-View Driving Scenes

Autor: Mei, Jianbiao, Ma, Yukai, Yang, Xuemeng, Wen, Licheng, Wei, Tiantian, Dou, Min, Shi, Botian, Liu, Yong

Recent advances in diffusion models have significantly enhanced the cotrollable generation of streetscapes for and facilitated downstream perception and planning tasks. However, challenges such as maintaining temporal coherence, generating long video

Externí odkaz: http://arxiv.org/abs/2409.04003

Zobrazit plný text záznamu

Report

DriveArena: A Closed-loop Generative Simulation Platform for Autonomous Driving

Autor: Yang, Xuemeng, Wen, Licheng, Ma, Yukai, Mei, Jianbiao, Li, Xin, Wei, Tiantian, Lei, Wenjie, Fu, Daocheng, Cai, Pinlong, Dou, Min, Shi, Botian, He, Liang, Liu, Yong, Qiao, Yu

This paper presented DriveArena, the first high-fidelity closed-loop simulation system designed for driving agents navigating in real scenarios. DriveArena features a flexible, modular architecture, allowing for the seamless interchange of its core c

Externí odkaz: http://arxiv.org/abs/2408.00415

Zobrazit plný text záznamu

Report

LiCROcc: Teach Radar for Accurate Semantic Occupancy Prediction using LiDAR and Camera

Autor: Ma, Yukai, Mei, Jianbiao, Yang, Xuemeng, Wen, Licheng, Xu, Weihua, Zhang, Jiangning, Shi, Botian, Liu, Yong, Zuo, Xingxing

Semantic Scene Completion (SSC) is pivotal in autonomous driving perception, frequently confronted with the complexities of weather and illumination changes. The long-term strategy involves fusing multi-modal information to bolster the system's robus

Externí odkaz: http://arxiv.org/abs/2407.16197

Zobrazit plný text záznamu

Report

DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models

Scientific documents record research findings and valuable human knowledge, comprising a vast corpus of high-quality data. Leveraging multi-modality data extracted from these documents and assessing large models' abilities to handle scientific docume

Externí odkaz: http://arxiv.org/abs/2406.11633

Zobrazit plný text záznamu

Report

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

Image-text interleaved data, consisting of multiple images and texts arranged in a natural document format, aligns with the presentation paradigm of internet data and closely resembles human reading habits. Recent studies have shown that such data ai

Externí odkaz: http://arxiv.org/abs/2406.08418

Zobrazit plný text záznamu

Report

Continuously Learning, Adapting, and Improving: A Dual-Process Approach to Autonomous Driving

Autor: Mei, Jianbiao, Ma, Yukai, Yang, Xuemeng, Wen, Licheng, Cai, Xinyu, Li, Xin, Fu, Daocheng, Zhang, Bo, Cai, Pinlong, Dou, Min, Shi, Botian, He, Liang, Liu, Yong, Qiao, Yu

Autonomous driving has advanced significantly due to sensors, machine learning, and artificial intelligence improvements. However, prevailing methods struggle with intricate scenarios and causal relationships, hindering adaptability and interpretabil

Externí odkaz: http://arxiv.org/abs/2405.15324

Zobrazit plný text záznamu

Report

Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond

Autor: Zhu, Zheng, Wang, Xiaofeng, Zhao, Wangbo, Min, Chen, Deng, Nianchen, Dou, Min, Wang, Yuqi, Shi, Botian, Wang, Kai, Zhang, Chi, You, Yang, Zhang, Zhaoxiang, Zhao, Dawei, Xiao, Liang, Zhao, Jian, Lu, Jiwen, Huang, Guan

General world models represent a crucial pathway toward achieving Artificial General Intelligence (AGI), serving as the cornerstone for various applications ranging from virtual environments to decision-making systems. Recently, the emergence of the

Externí odkaz: http://arxiv.org/abs/2405.03520

Zobrazit plný text záznamu

Akademický článek

Microsoft Concept Graph: Mining Semantic Concepts for Short Text Understanding

Autor: Ji, Lei, Wang, Yujing, Shi, Botian, Zhang, Dawei, Wang, Zhongyuan, Yan, Jun

Publikováno v: Data Intelligence, Vol 1, Iss 3, Pp 238-270 (2019)

Knowlege is important for text-related applications. In this paper, we introduce Microsoft Concept Graph, a knowledge graph engine that provides concept tagging APIs to facilitate the understanding of human languages. Microsoft Concept Graph is built

Externí odkaz: https://doaj.org/article/d497b5fa0aa443979e5ec843027e877c

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání