Výsledky vyhledávání

Report

E-CAR: Efficient Continuous Autoregressive Image Generation via Multistage Modeling

Autor: Yuan, Zhihang, Shang, Yuzhang, Zhang, Hanling, Fang, Tongcheng, Xie, Rui, Xu, Bingxin, Yan, Yan, Yan, Shengen, Dai, Guohao, Wang, Yu

Recent advances in autoregressive (AR) models with continuous tokens for image generation show promising results by eliminating the need for discrete tokenization. However, these models face efficiency challenges due to their sequential token generat

Externí odkaz: http://arxiv.org/abs/2412.14170

Zobrazit plný text záznamu

Report

GUI Agents: A Survey

Graphical User Interface (GUI) agents, powered by Large Foundation Models, have emerged as a transformative approach to automating human-computer interaction. These agents autonomously interact with digital systems or software applications via GUIs,

Externí odkaz: http://arxiv.org/abs/2412.13501

Zobrazit plný text záznamu

Report

Proposer-Agent-Evaluator(PAE): Autonomous Skill Discovery For Foundation Model Internet Agents

Autor: Zhou, Yifei, Yang, Qianlan, Lin, Kaixiang, Bai, Min, Zhou, Xiong, Wang, Yu-Xiong, Levine, Sergey, Li, Erran

The vision of a broadly capable and goal-directed agent, such as an Internet-browsing agent in the digital world and a household humanoid in the physical world, has rapidly advanced, thanks to the generalization capability of foundation models. Such

Externí odkaz: http://arxiv.org/abs/2412.13194

Zobrazit plný text záznamu

Report

What Matters in Learning A Zero-Shot Sim-to-Real RL Policy for Quadrotor Control? A Comprehensive Study

Autor: Chen, Jiayu, Yu, Chao, Xie, Yuqing, Gao, Feng, Chen, Yinuo, Yu, Shu'ang, Tang, Wenhao, Ji, Shilong, Mu, Mo, Wu, Yi, Yang, Huazhong, Wang, Yu

Executing precise and agile flight maneuvers is critical for quadrotors in various applications. Traditional quadrotor control approaches are limited by their reliance on flat trajectories or time-consuming optimization, which restricts their flexibi

Externí odkaz: http://arxiv.org/abs/2412.11764

Zobrazit plný text záznamu

Report

Drawing the Line: Enhancing Trustworthiness of MLLMs Through the Power of Refusal

Autor: Wang, Yuhao, Zhu, Zhiyuan, Liu, Heyang, Liao, Yusheng, Liu, Hongcheng, Wang, Yanfeng, Wang, Yu

Multimodal large language models (MLLMs) excel at multimodal perception and understanding, yet their tendency to generate hallucinated or inaccurate responses undermines their trustworthiness. Existing methods have largely overlooked the importance o

Externí odkaz: http://arxiv.org/abs/2412.11196

Zobrazit plný text záznamu

Report

Can We Generate Visual Programs Without Prompting LLMs?

Autor: Shlapentokh-Rothman, Michal, Wang, Yu-Xiong, Hoiem, Derek

Visual programming prompts LLMs (large language mod-els) to generate executable code for visual tasks like visual question answering (VQA). Prompt-based methods are difficult to improve while also being unreliable and costly in both time and money. O

Externí odkaz: http://arxiv.org/abs/2412.08564

Zobrazit plný text záznamu

Report

A Novel Method of Estimating GRB Peak Energies Beyond the \emph{Swift}/BAT Limit

Autor: Li, Liang, Wang, Yu

The \emph{Swift} Burst Alert Telescope (BAT), operating in the 15--150 keV energy band, struggles to detect the peak energy ($E_{\rm p}$) of gamma-ray bursts (GRBs), as most GRBs have $E_{\rm p}$ values typically distributed between 200-300 keV, exce

Externí odkaz: http://arxiv.org/abs/2412.08226

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání