Zobrazeno 1 - 10
of 311
pro vyhledávání: '"Zhang, Zhizheng"'
Compositional Generalization (CG) embodies the ability to comprehend novel combinations of familiar concepts, representing a significant cognitive leap in human intellectual advancement. Despite its critical importance, the deep neural network (DNN)
Externí odkaz:
http://arxiv.org/abs/2405.11743
Autor:
Bi, Tianci, Zhang, Xiaoyi, Zhang, Zhizheng, Xie, Wenxuan, Lan, Cuiling, Lu, Yan, Zheng, Nanning
Significant progress has been made in scene text detection models since the rise of deep learning, but scene text layout analysis, which aims to group detected text instances as paragraphs, has not kept pace. Previous works either treated text detect
Externí odkaz:
http://arxiv.org/abs/2405.07481
At present, large multimodal models (LMMs) have exhibited impressive generalization capabilities in understanding and generating visual signals. However, they currently still lack sufficient capability to perceive low-level visual quality akin to hum
Externí odkaz:
http://arxiv.org/abs/2403.12806
The development of Large Vision-Language Models (LVLMs) is striving to catch up with the success of Large Language Models (LLMs), yet it faces more challenges to be resolved. Very recent works enable LVLMs to localize object-level visual contents and
Externí odkaz:
http://arxiv.org/abs/2403.12801
Generative Adversarial Networks (GANs) have been widely used to recover vivid textures in image super-resolution (SR) tasks. In particular, one discriminator is utilized to enable the SR network to learn the distribution of real-world high-quality im
Externí odkaz:
http://arxiv.org/abs/2402.19387
Autor:
Zhang, Jiazhao, Wang, Kunyu, Xu, Rongtao, Zhou, Gengze, Hong, Yicong, Fang, Xiaomeng, Wu, Qi, Zhang, Zhizheng, Wang, He
Vision-and-language navigation (VLN) stands as a key research problem of Embodied AI, aiming at enabling agents to navigate in unseen environments following linguistic instructions. In this field, generalization is a long-standing challenge, either t
Externí odkaz:
http://arxiv.org/abs/2402.15852
Recent popularity of Large Language Models (LLMs) has opened countless possibilities in automating numerous AI tasks by connecting LLMs to various domain-specific models or APIs, where LLMs serve as dispatchers while domain-specific models or APIs ar
Externí odkaz:
http://arxiv.org/abs/2310.04716
Recent vision transformers, large-kernel CNNs and MLPs have attained remarkable successes in broad vision tasks thanks to their effective information fusion in the global scope. However, their efficient deployments, especially on mobile devices, stil
Externí odkaz:
http://arxiv.org/abs/2307.14008
Momentum has become a crucial component in deep learning optimizers, necessitating a comprehensive understanding of when and why it accelerates stochastic gradient descent (SGD). To address the question of ''when'', we establish a meaningful comparis
Externí odkaz:
http://arxiv.org/abs/2306.09000
The recent success of Large Language Models (LLMs) signifies an impressive stride towards artificial general intelligence. They have shown a promising prospect in automatically completing tasks upon user instructions, functioning as brain-like coordi
Externí odkaz:
http://arxiv.org/abs/2306.01242