Výsledky vyhledávání - "Zhang, Zhizheng"

Report

A General Theory for Compositional Generalization

Autor: Fu, Jingwen, Zhang, Zhizheng, Lu, Yan, Zheng, Nanning

Compositional Generalization (CG) embodies the ability to comprehend novel combinations of familiar concepts, representing a significant cognitive leap in human intellectual advancement. Despite its critical importance, the deep neural network (DNN)

Externí odkaz: http://arxiv.org/abs/2405.11743

Zobrazit plný text záznamu

Report

Text Grouping Adapter: Adapting Pre-trained Text Detector for Layout Analysis

Autor: Bi, Tianci, Zhang, Xiaoyi, Zhang, Zhizheng, Xie, Wenxuan, Lan, Cuiling, Lu, Yan, Zheng, Nanning

Significant progress has been made in scene text detection models since the rise of deep learning, but scene text layout analysis, which aims to group detected text instances as paragraphs, has not kept pace. Previous works either treated text detect

Externí odkaz: http://arxiv.org/abs/2405.07481

Zobrazit plný text záznamu

Report

VisualCritic: Making LMMs Perceive Visual Quality Like Humans

Autor: Huang, Zhipeng, Zhang, Zhizheng, Lu, Yiting, Zha, Zheng-Jun, Chen, Zhibo, Guo, Baining

At present, large multimodal models (LMMs) have exhibited impressive generalization capabilities in understanding and generating visual signals. However, they currently still lack sufficient capability to perceive low-level visual quality akin to hum

Externí odkaz: http://arxiv.org/abs/2403.12806

Zobrazit plný text záznamu

Report

RelationVLM: Making Large Vision-Language Models Understand Visual Relations

Autor: Huang, Zhipeng, Zhang, Zhizheng, Zha, Zheng-Jun, Lu, Yan, Guo, Baining

The development of Large Vision-Language Models (LVLMs) is striving to catch up with the success of Large Language Models (LLMs), yet it faces more challenges to be resolved. Very recent works enable LVLMs to localize object-level visual contents and

Externí odkaz: http://arxiv.org/abs/2403.12801

Zobrazit plný text záznamu

Report

SeD: Semantic-Aware Discriminator for Image Super-Resolution

Autor: Li, Bingchen, Li, Xin, Zhu, Hanxin, Jin, Yeying, Feng, Ruoyu, Zhang, Zhizheng, Chen, Zhibo

Generative Adversarial Networks (GANs) have been widely used to recover vivid textures in image super-resolution (SR) tasks. In particular, one discriminator is utilized to enable the SR network to learn the distribution of real-world high-quality im

Externí odkaz: http://arxiv.org/abs/2402.19387

Zobrazit plný text záznamu

Report

NaVid: Video-based VLM Plans the Next Step for Vision-and-Language Navigation

Autor: Zhang, Jiazhao, Wang, Kunyu, Xu, Rongtao, Zhou, Gengze, Hong, Yicong, Fang, Xiaomeng, Wu, Qi, Zhang, Zhizheng, Wang, He

Vision-and-language navigation (VLN) stands as a key research problem of Embodied AI, aiming at enabling agents to navigate in unseen environments following linguistic instructions. In this field, generalization is a long-standing challenge, either t

Externí odkaz: http://arxiv.org/abs/2402.15852

Zobrazit plný text záznamu

Report

Reinforced UI Instruction Grounding: Towards a Generic UI Task Automation API

Autor: Zhang, Zhizheng, Xie, Wenxuan, Zhang, Xiaoyi, Lu, Yan

Recent popularity of Large Language Models (LLMs) has opened countless possibilities in automating numerous AI tasks by connecting LLMs to various domain-specific models or APIs, where LLMs serve as dispatchers while domain-specific models or APIs ar

Externí odkaz: http://arxiv.org/abs/2310.04716

Zobrazit plný text záznamu

Report

Adaptive Frequency Filters As Efficient Global Token Mixers

Autor: Huang, Zhipeng, Zhang, Zhizheng, Lan, Cuiling, Zha, Zheng-Jun, Lu, Yan, Guo, Baining

Recent vision transformers, large-kernel CNNs and MLPs have attained remarkable successes in broad vision tasks thanks to their effective information fusion in the global scope. However, their efficient deployments, especially on mobile devices, stil

Externí odkaz: http://arxiv.org/abs/2307.14008

Zobrazit plný text záznamu

Report

When and Why Momentum Accelerates SGD:An Empirical Study

Autor: Fu, Jingwen, Wang, Bohan, Zhang, Huishuai, Zhang, Zhizheng, Chen, Wei, Zheng, Nanning

Momentum has become a crucial component in deep learning optimizers, necessitating a comprehensive understanding of when and why it accelerates stochastic gradient descent (SGD). To address the question of ''when'', we establish a meaningful comparis

Externí odkaz: http://arxiv.org/abs/2306.09000

Zobrazit plný text záznamu

Report

Responsible Task Automation: Empowering Large Language Models as Responsible Task Automators

Autor: Zhang, Zhizheng, Zhang, Xiaoyi, Xie, Wenxuan, Lu, Yan

The recent success of Large Language Models (LLMs) signifies an impressive stride towards artificial general intelligence. They have shown a promising prospect in automatically completing tasks upon user instructions, functioning as brain-like coordi

Externí odkaz: http://arxiv.org/abs/2306.01242

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání