Zobrazeno 1 - 10
of 237
pro vyhledávání: '"Zhang, Xinsong"'
Autor:
Yang, Sai, Hu, Bin, Zhou, Bojun, Liu, Fan, Wu, Xiaoxin, Zhang, Xinsong, Gu, Juping, Zhou, Jun
Power Line Autonomous Inspection (PLAI) plays a crucial role in the construction of smart grids due to its great advantages of low cost, high efficiency, and safe operation. PLAI is completed by accurately detecting the electrical components and defe
Externí odkaz:
http://arxiv.org/abs/2409.04812
Foundation models or pre-trained models have substantially improved the performance of various language, vision, and vision-language understanding tasks. However, existing foundation models can only perform the best in one type of tasks, namely langu
Externí odkaz:
http://arxiv.org/abs/2301.05065
Vision language pre-training aims to learn alignments between vision and language from a large amount of data. Most existing methods only learn image-text alignments. Some others utilize pre-trained object detectors to leverage vision language alignm
Externí odkaz:
http://arxiv.org/abs/2211.12402
Pre-trained vision-language models (VLMs) have achieved impressive results in a range of vision-language tasks. However, popular VLMs usually consist of hundreds of millions of parameters which brings challenges for fine-tuning and deployment in real
Externí odkaz:
http://arxiv.org/abs/2210.07795
Recent advances in vision-language pre-training have pushed the state-of-the-art on various vision-language tasks, making machines more capable of multi-modal writing (image-to-text generation) and painting (text-to-image generation). However, few st
Externí odkaz:
http://arxiv.org/abs/2206.07699
In this paper, we introduce Cross-View Language Modeling, a simple and effective pre-training framework that unifies cross-lingual and cross-modal pre-training with shared architectures and objectives. Our approach is motivated by a key observation t
Externí odkaz:
http://arxiv.org/abs/2206.00621
Recent advances in vision-language pre-training (VLP) have demonstrated impressive performance in a range of vision-language (VL) tasks. However, there exist several challenges for measuring the community's progress in building general multi-modal in
Externí odkaz:
http://arxiv.org/abs/2205.15237
Publikováno v:
In Marine and Petroleum Geology October 2024 168
Most existing methods in vision language pre-training rely on object-centric features extracted through object detection and make fine-grained alignments between the extracted features and texts. It is challenging for these methods to learn relations
Externí odkaz:
http://arxiv.org/abs/2111.08276
Autor:
Yin, Jiayi, Slavík, Ladislav, Wang, Zhihong, Shen, Zhen, Zhang, Xinsong, Liu, Yilong, Ma, Juan, Gong, Yiming, Zong, Ruiwen
Publikováno v:
In Earth-Science Reviews July 2024 254