Zobrazeno 1 - 10
of 189
pro vyhledávání: '"Zheng, Boyuan"'
Autor:
Gu, Yu, Zheng, Boyuan, Gou, Boyu, Zhang, Kai, Chang, Cheng, Srivastava, Sanjari, Xie, Yanan, Qi, Peng, Sun, Huan, Su, Yu
Language agents have demonstrated promising capabilities in automating web-based tasks, though their current reactive approaches still underperform largely compared to humans. While incorporating advanced planning algorithms, particularly tree search
Externí odkaz:
http://arxiv.org/abs/2411.06559
Autor:
Gou, Boyu, Wang, Ruohan, Zheng, Boyuan, Xie, Yanan, Chang, Cheng, Shu, Yiheng, Sun, Huan, Su, Yu
Multimodal large language models (MLLMs) are transforming the capabilities of graphical user interface (GUI) agents, facilitating their transition from controlled simulations to complex, real-world applications across various platforms. However, the
Externí odkaz:
http://arxiv.org/abs/2410.05243
Humans naturally employ linguistic instructions to convey knowledge, a process that proves significantly more complex for machines, especially within the context of multitask robotic manipulation environments. Natural language, moreover, serves as th
Externí odkaz:
http://arxiv.org/abs/2405.17047
Offline meta reinforcement learning (OMRL) has emerged as a promising approach for interaction avoidance and strong generalization performance by leveraging pre-collected data and meta-learning techniques. Previous context-based approaches predominan
Externí odkaz:
http://arxiv.org/abs/2405.12001
Language agents powered by large language models (LLMs) have seen exploding development. Their capability of using language as a vehicle for thought and communication lends an incredible level of flexibility and versatility. People have quickly capit
Externí odkaz:
http://arxiv.org/abs/2402.10196
Automatic web navigation aims to build a web agent that can follow language instructions to execute complex and diverse tasks on real-world websites. Existing work primarily takes HTML documents as input, which define the contents and action spaces (
Externí odkaz:
http://arxiv.org/abs/2402.04476
Autor:
Shen, Lingfeng, Tan, Weiting, Chen, Sihao, Chen, Yunmo, Zhang, Jingyu, Xu, Haoran, Zheng, Boyuan, Koehn, Philipp, Khashabi, Daniel
As the influence of large language models (LLMs) spans across global communities, their safety challenges in multilingual settings become paramount for alignment research. This paper examines the variations in safety challenges faced by LLMs across d
Externí odkaz:
http://arxiv.org/abs/2401.13136
The recent development on large multimodal models (LMMs), especially GPT-4V(ision) and Gemini, has been quickly expanding the capability boundaries of multimodal models beyond traditional tasks like image captioning and visual question answering. In
Externí odkaz:
http://arxiv.org/abs/2401.01614
Autor:
Yue, Xiang, Ni, Yuansheng, Zhang, Kai, Zheng, Tianyu, Liu, Ruoqi, Zhang, Ge, Stevens, Samuel, Jiang, Dongfu, Ren, Weiming, Sun, Yuxuan, Wei, Cong, Yu, Botao, Yuan, Ruibin, Sun, Renliang, Yin, Ming, Zheng, Boyuan, Yang, Zhenzhu, Liu, Yibo, Huang, Wenhao, Sun, Huan, Su, Yu, Chen, Wenhu
We introduce MMMU: a new benchmark designed to evaluate multimodal models on massive multi-discipline tasks demanding college-level subject knowledge and deliberate reasoning. MMMU includes 11.5K meticulously collected multimodal questions from colle
Externí odkaz:
http://arxiv.org/abs/2311.16502
Autor:
Deng, Xiang, Gu, Yu, Zheng, Boyuan, Chen, Shijie, Stevens, Samuel, Wang, Boshi, Sun, Huan, Su, Yu
We introduce Mind2Web, the first dataset for developing and evaluating generalist agents for the web that can follow language instructions to complete complex tasks on any website. Existing datasets for web agents either use simulated websites or onl
Externí odkaz:
http://arxiv.org/abs/2306.06070