Výsledky vyhledávání

Report

Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks

Autor: Jia, Mengzhao, Yu, Wenhao, Ma, Kaixin, Fang, Tianqing, Zhang, Zhihan, Ouyang, Siru, Zhang, Hongming, Jiang, Meng, Yu, Dong

Text-rich images, where text serves as the central visual element guiding the overall understanding, are prevalent in real-world applications, such as presentation slides, scanned documents, and webpage snapshots. Tasks involving multiple text-rich i

Externí odkaz: http://arxiv.org/abs/2410.01744

Zobrazit plný text záznamu

Report

MHRC: Closed-loop Decentralized Multi-Heterogeneous Robot Collaboration with Large Language Models

Autor: Yu, Wenhao, Peng, Jie, Ying, Yueliang, Li, Sai, Ji, Jianmin, Zhang, Yanyong

The integration of large language models (LLMs) with robotics has significantly advanced robots' abilities in perception, cognition, and task planning. The use of natural language interfaces offers a unified approach for expressing the capability dif

Externí odkaz: http://arxiv.org/abs/2409.16030

Zobrazit plný text záznamu

Report

Agile Continuous Jumping in Discontinuous Terrains

Autor: Yang, Yuxiang, Shi, Guanya, Lin, Changyi, Meng, Xiangyun, Scalise, Rosario, Castro, Mateo Guaman, Yu, Wenhao, Zhang, Tingnan, Zhao, Ding, Tan, Jie, Boots, Byron

We focus on agile, continuous, and terrain-adaptive jumping of quadrupedal robots in discontinuous terrains such as stairs and stepping stones. Unlike single-step jumping, continuous jumping requires accurately executing highly dynamic motions over l

Externí odkaz: http://arxiv.org/abs/2409.10923

Zobrazit plný text záznamu

Report

Cognitive Kernel: An Open-source Agent System towards Generalist Autopilots

Autor: Zhang, Hongming, Pan, Xiaoman, Wang, Hongwei, Ma, Kaixin, Yu, Wenhao, Yu, Dong

We introduce Cognitive Kernel, an open-source agent system towards the goal of generalist autopilots. Unlike copilot systems, which primarily rely on users to provide essential state information (e.g., task descriptions) and assist users by answering

Externí odkaz: http://arxiv.org/abs/2409.10277

Zobrazit plný text záznamu

Report

DSBench: How Far Are Data Science Agents to Becoming Data Science Experts?

Autor: Jing, Liqiang, Huang, Zhehui, Wang, Xiaoyang, Yao, Wenlin, Yu, Wenhao, Ma, Kaixin, Zhang, Hongming, Du, Xinya, Yu, Dong

Large Language Models (LLMs) and Large Vision-Language Models (LVLMs) have demonstrated impressive language/vision reasoning abilities, igniting the recent trend of building agents for targeted applications such as shopping assistants or AI software

Externí odkaz: http://arxiv.org/abs/2409.07703

Zobrazit plný text záznamu

Report

OASIS: Conditional Distribution Shaping for Offline Safe Reinforcement Learning

Autor: Yao, Yihang, Cen, Zhepeng, Ding, Wenhao, Lin, Haohong, Liu, Shiqi, Zhang, Tingnan, Yu, Wenhao, Zhao, Ding

Offline safe reinforcement learning (RL) aims to train a policy that satisfies constraints using a pre-collected dataset. Most current methods struggle with the mismatch between imperfect demonstrations and the desired safe and rewarding performance.

Externí odkaz: http://arxiv.org/abs/2407.14653

Zobrazit plný text záznamu

Report

DOCBENCH: A Benchmark for Evaluating LLM-based Document Reading Systems

Autor: Zou, Anni, Yu, Wenhao, Zhang, Hongming, Ma, Kaixin, Cai, Deng, Zhang, Zhuosheng, Zhao, Hai, Yu, Dong

Recently, there has been a growing interest among large language model (LLM) developers in LLM-based document reading systems, which enable users to upload their own documents and pose questions related to the document contents, going beyond simple r

Externí odkaz: http://arxiv.org/abs/2407.10701

Zobrazit plný text záznamu

Report

Mobility VLA: Multimodal Instruction Navigation with Long-Context VLMs and Topological Graphs

An elusive goal in navigation research is to build an intelligent agent that can understand multimodal instructions including natural language and image, and perform useful navigation. To achieve this, we study a widely useful category of navigation

Externí odkaz: http://arxiv.org/abs/2407.07775

Zobrazit plný text záznamu

Report

LDP: A Local Diffusion Planner for Efficient Robot Navigation and Collision Avoidance

Autor: Yu, Wenhao, Peng, Jie, Yang, Huanyu, Zhang, Junrui, Duan, Yifan, Ji, Jianmin, Zhang, Yanyong

The conditional diffusion model has been demonstrated as an efficient tool for learning robot policies, owing to its advancement to accurately model the conditional distribution of policies. The intricate nature of real-world scenarios, characterized

Externí odkaz: http://arxiv.org/abs/2407.01950

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání