Zobrazeno 1 - 10
of 2 167
pro vyhledávání: '"FAN, Yue"'
Autor:
Fan, Yue, Xian, Yongqin, Zhai, Xiaohua, Kolesnikov, Alexander, Naeem, Muhammad Ferjad, Schiele, Bernt, Tombari, Federico
Building generalized models that can solve many computer vision tasks simultaneously is an intriguing direction. Recent works have shown image itself can be used as a natural interface for general-purpose visual perception and demonstrated inspiring
Externí odkaz:
http://arxiv.org/abs/2407.00503
Autor:
Fan, Yue, Ding, Lei, Kuo, Ching-Chen, Jiang, Shan, Zhao, Yang, Guan, Xinze, Yang, Jie, Zhang, Yi, Wang, Xin Eric
Graphical User Interfaces (GUIs) are central to our interaction with digital devices. Recently, growing efforts have been made to build models for various GUI understanding tasks. However, these efforts largely overlook an important GUI-referring tas
Externí odkaz:
http://arxiv.org/abs/2406.19263
Autor:
He, Xuehai, Feng, Weixi, Zheng, Kaizhi, Lu, Yujie, Zhu, Wanrong, Li, Jiachen, Fan, Yue, Wang, Jianfeng, Li, Linjie, Yang, Zhengyuan, Lin, Kevin, Wang, William Yang, Wang, Lijuan, Wang, Xin Eric
Multimodal Language Language Models (MLLMs) demonstrate the emerging abilities of "world models" -- interpreting and reasoning about complex real-world dynamics. To assess these abilities, we posit videos are the ideal medium, as they encapsulate ric
Externí odkaz:
http://arxiv.org/abs/2406.08407
Open-vocabulary 3D scene understanding presents a significant challenge in computer vision, with wide-ranging applications in embodied agents and augmented reality systems. Existing methods adopt neurel rendering methods as 3D representations and joi
Externí odkaz:
http://arxiv.org/abs/2403.15624
We explore how reconciling several foundation models (large language models and vision-language models) with a novel unified memory mechanism could tackle the challenging video understanding problem, especially capturing the long-term temporal relati
Externí odkaz:
http://arxiv.org/abs/2403.11481
Large-scale pre-trained vision models (PVMs) have shown great potential for adaptability across various downstream vision tasks. However, with state-of-the-art PVMs growing to billions or even trillions of parameters, the standard full fine-tuning pa
Externí odkaz:
http://arxiv.org/abs/2402.02242
Autor:
Fan, Yue, Gu, Jing, Zhou, Kaiwen, Yan, Qianqi, Jiang, Shan, Kuo, Ching-Chen, Guan, Xinze, Wang, Xin Eric
Multipanel images, commonly seen as web screenshots, posters, etc., pervade our daily lives. These images, characterized by their composition of multiple subfigures in distinct layouts, effectively convey information to people. Toward building advanc
Externí odkaz:
http://arxiv.org/abs/2401.15847
Semi-supervised learning (SSL) methods effectively leverage unlabeled data to improve model generalization. However, SSL models often underperform in open-set scenarios, where unlabeled data contain outliers from novel categories that do not appear i
Externí odkaz:
http://arxiv.org/abs/2311.10572
The emergent reasoning and Theory of Mind (ToM) abilities demonstrated by Large Language Models (LLMs) make them promising candidates for developing coordination agents. In this study, we introduce a new LLM-Coordination Benchmark aimed at a detailed
Externí odkaz:
http://arxiv.org/abs/2310.03903
Publikováno v:
Journal of Asian Architecture and Building Engineering, Vol 0, Iss 0, Pp 1-12 (2024)
Longwave radiation is a significant renewable energy technology for energy-saving in buildings. By evaluating the potential and distribution of longwave radiation in China, this study simplifies the calculation process in architectural design. Weathe
Externí odkaz:
https://doaj.org/article/54960f2264094b08a078ab8b73c00a79