Zobrazeno 1 - 10
of 125 021
pro vyhledávání: '"Wang, Yu."'
Autor:
Yuan, Zhihang, Shang, Yuzhang, Zhang, Hanling, Fang, Tongcheng, Xie, Rui, Xu, Bingxin, Yan, Yan, Yan, Shengen, Dai, Guohao, Wang, Yu
Recent advances in autoregressive (AR) models with continuous tokens for image generation show promising results by eliminating the need for discrete tokenization. However, these models face efficiency challenges due to their sequential token generat
Externí odkaz:
http://arxiv.org/abs/2412.14170
Autor:
Nguyen, Dang, Chen, Jian, Wang, Yu, Wu, Gang, Park, Namyong, Hu, Zhengmian, Lyu, Hanjia, Wu, Junda, Aponte, Ryan, Xia, Yu, Li, Xintong, Shi, Jing, Chen, Hongjie, Lai, Viet Dac, Xie, Zhouhang, Kim, Sungchul, Zhang, Ruiyi, Yu, Tong, Tanjim, Mehrab, Ahmed, Nesreen K., Mathur, Puneet, Yoon, Seunghyun, Yao, Lina, Kveton, Branislav, Nguyen, Thien Huu, Bui, Trung, Zhou, Tianyi, Rossi, Ryan A., Dernoncourt, Franck
Graphical User Interface (GUI) agents, powered by Large Foundation Models, have emerged as a transformative approach to automating human-computer interaction. These agents autonomously interact with digital systems or software applications via GUIs,
Externí odkaz:
http://arxiv.org/abs/2412.13501
Autor:
Zhou, Yifei, Yang, Qianlan, Lin, Kaixiang, Bai, Min, Zhou, Xiong, Wang, Yu-Xiong, Levine, Sergey, Li, Erran
The vision of a broadly capable and goal-directed agent, such as an Internet-browsing agent in the digital world and a household humanoid in the physical world, has rapidly advanced, thanks to the generalization capability of foundation models. Such
Externí odkaz:
http://arxiv.org/abs/2412.13194
Autor:
Chen, Jiayu, Yu, Chao, Xie, Yuqing, Gao, Feng, Chen, Yinuo, Yu, Shu'ang, Tang, Wenhao, Ji, Shilong, Mu, Mo, Wu, Yi, Yang, Huazhong, Wang, Yu
Executing precise and agile flight maneuvers is critical for quadrotors in various applications. Traditional quadrotor control approaches are limited by their reliance on flat trajectories or time-consuming optimization, which restricts their flexibi
Externí odkaz:
http://arxiv.org/abs/2412.11764
Autor:
Wang, Yuhao, Zhu, Zhiyuan, Liu, Heyang, Liao, Yusheng, Liu, Hongcheng, Wang, Yanfeng, Wang, Yu
Multimodal large language models (MLLMs) excel at multimodal perception and understanding, yet their tendency to generate hallucinated or inaccurate responses undermines their trustworthiness. Existing methods have largely overlooked the importance o
Externí odkaz:
http://arxiv.org/abs/2412.11196
Visual programming prompts LLMs (large language mod-els) to generate executable code for visual tasks like visual question answering (VQA). Prompt-based methods are difficult to improve while also being unreliable and costly in both time and money. O
Externí odkaz:
http://arxiv.org/abs/2412.08564
The \emph{Swift} Burst Alert Telescope (BAT), operating in the 15--150 keV energy band, struggles to detect the peak energy ($E_{\rm p}$) of gamma-ray bursts (GRBs), as most GRBs have $E_{\rm p}$ values typically distributed between 200-300 keV, exce
Externí odkaz:
http://arxiv.org/abs/2412.08226