Výsledky vyhledávání

Report

Toward a Diffusion-Based Generalist for Dense Vision Tasks

Autor: Fan, Yue, Xian, Yongqin, Zhai, Xiaohua, Kolesnikov, Alexander, Naeem, Muhammad Ferjad, Schiele, Bernt, Tombari, Federico

Building generalized models that can solve many computer vision tasks simultaneously is an intriguing direction. Recent works have shown image itself can be used as a natural interface for general-purpose visual perception and demonstrated inspiring

Externí odkaz: http://arxiv.org/abs/2407.00503

Zobrazit plný text záznamu

Report

Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding

Autor: Fan, Yue, Ding, Lei, Kuo, Ching-Chen, Jiang, Shan, Zhao, Yang, Guan, Xinze, Yang, Jie, Zhang, Yi, Wang, Xin Eric

Graphical User Interfaces (GUIs) are central to our interaction with digital devices. Recently, growing efforts have been made to build models for various GUI understanding tasks. However, these efforts largely overlook an important GUI-referring tas

Externí odkaz: http://arxiv.org/abs/2406.19263

Zobrazit plný text záznamu

Report

MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

Autor: He, Xuehai, Feng, Weixi, Zheng, Kaizhi, Lu, Yujie, Zhu, Wanrong, Li, Jiachen, Fan, Yue, Wang, Jianfeng, Li, Linjie, Yang, Zhengyuan, Lin, Kevin, Wang, William Yang, Wang, Lijuan, Wang, Xin Eric

Multimodal Language Language Models (MLLMs) demonstrate the emerging abilities of "world models" -- interpreting and reasoning about complex real-world dynamics. To assess these abilities, we posit videos are the ideal medium, as they encapsulate ric

Externí odkaz: http://arxiv.org/abs/2406.08407

Zobrazit plný text záznamu

Report

Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting

Autor: Guo, Jun, Ma, Xiaojian, Fan, Yue, Liu, Huaping, Li, Qing

Open-vocabulary 3D scene understanding presents a significant challenge in computer vision, with wide-ranging applications in embodied agents and augmented reality systems. Existing methods adopt neurel rendering methods as 3D representations and joi

Externí odkaz: http://arxiv.org/abs/2403.15624

Zobrazit plný text záznamu

Report

VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding

Autor: Fan, Yue, Ma, Xiaojian, Wu, Rujie, Du, Yuntao, Li, Jiaqi, Gao, Zhi, Li, Qing

We explore how reconciling several foundation models (large language models and vision-language models) with a novel unified memory mechanism could tackle the challenging video understanding problem, especially capturing the long-term temporal relati

Externí odkaz: http://arxiv.org/abs/2403.11481

Zobrazit plný text záznamu

Report

Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey

Autor: Xin, Yi, Luo, Siqi, Zhou, Haodi, Du, Junlong, Liu, Xiaohong, Fan, Yue, Li, Qing, Du, Yuntao

Large-scale pre-trained vision models (PVMs) have shown great potential for adaptability across various downstream vision tasks. However, with state-of-the-art PVMs growing to billions or even trillions of parameters, the standard full fine-tuning pa

Externí odkaz: http://arxiv.org/abs/2402.02242

Zobrazit plný text záznamu

Report

Muffin or Chihuahua? Challenging Multimodal Large Language Models with Multipanel VQA

Autor: Fan, Yue, Gu, Jing, Zhou, Kaiwen, Yan, Qianqi, Jiang, Shan, Kuo, Ching-Chen, Guan, Xinze, Wang, Xin Eric

Multipanel images, commonly seen as web screenshots, posters, etc., pervade our daily lives. These images, characterized by their composition of multiple subfigures in distinct layouts, effectively convey information to people. Toward building advanc

Externí odkaz: http://arxiv.org/abs/2401.15847

Zobrazit plný text záznamu

Report

SSB: Simple but Strong Baseline for Boosting Performance of Open-Set Semi-Supervised Learning

Autor: Fan, Yue, Kukleva, Anna, Dai, Dengxin, Schiele, Bernt

Semi-supervised learning (SSL) methods effectively leverage unlabeled data to improve model generalization. However, SSL models often underperform in open-set scenarios, where unlabeled data contain outliers from novel categories that do not appear i

Externí odkaz: http://arxiv.org/abs/2311.10572

Zobrazit plný text záznamu

Report

LLM-Coordination: Evaluating and Analyzing Multi-agent Coordination Abilities in Large Language Models

Autor: Agashe, Saaket, Fan, Yue, Reyna, Anthony, Wang, Xin Eric

The emergent reasoning and Theory of Mind (ToM) abilities demonstrated by Large Language Models (LLMs) make them promising candidates for developing coordination agents. In this study, we introduce a new LLM-Coordination Benchmark aimed at a detailed

Externí odkaz: http://arxiv.org/abs/2310.03903

Zobrazit plný text záznamu

Akademický článek

Spatial and temporal distribution characteristics and sensitivity of meteorological parameters to long-wave radiation cooling in buildings: a case study of Xi’an

Autor: Chen Jie, Han Bing, Fan Yue

Publikováno v: Journal of Asian Architecture and Building Engineering, Vol 0, Iss 0, Pp 1-12 (2024)

Longwave radiation is a significant renewable energy technology for energy-saving in buildings. By evaluating the potential and distribution of longwave radiation in China, this study simplifies the calculation process in architectural design. Weathe

Externí odkaz: https://doaj.org/article/54960f2264094b08a078ab8b73c00a79

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání