Výsledky vyhledávání - "Zhang, Xinsong"

Report

Power Line Aerial Image Restoration under dverse Weather: Datasets and Baselines

Autor: Yang, Sai, Hu, Bin, Zhou, Bojun, Liu, Fan, Wu, Xiaoxin, Zhang, Xinsong, Gu, Juping, Zhou, Jun

Power Line Autonomous Inspection (PLAI) plays a crucial role in the construction of smart grids due to its great advantages of low cost, high efficiency, and safe operation. PLAI is completed by accurately detecting the electrical components and defe

Externí odkaz: http://arxiv.org/abs/2409.04812

Zobrazit plný text záznamu

Report

Toward Building General Foundation Models for Language, Vision, and Vision-Language Understanding Tasks

Autor: Zhang, Xinsong, Zeng, Yan, Zhang, Jipeng, Li, Hang

Foundation models or pre-trained models have substantially improved the performance of various language, vision, and vision-language understanding tasks. However, existing foundation models can only perform the best in one type of tasks, namely langu

Externí odkaz: http://arxiv.org/abs/2301.05065

Zobrazit plný text záznamu

Report

X$^2$-VLM: All-In-One Pre-trained Model For Vision-Language Tasks

Autor: Zeng, Yan, Zhang, Xinsong, Li, Hang, Wang, Jiawei, Zhang, Jipeng, Zhou, Wangchunshu

Vision language pre-training aims to learn alignments between vision and language from a large amount of data. Most existing methods only learn image-text alignments. Some others utilize pre-trained object detectors to leverage vision language alignm

Externí odkaz: http://arxiv.org/abs/2211.12402

Zobrazit plný text záznamu

Report

EfficientVLM: Fast and Accurate Vision-Language Models via Knowledge Distillation and Modal-adaptive Pruning

Autor: Wang, Tiannan, Zhou, Wangchunshu, Zeng, Yan, Zhang, Xinsong

Pre-trained vision-language models (VLMs) have achieved impressive results in a range of vision-language tasks. However, popular VLMs usually consist of hundreds of millions of parameters which brings challenges for fine-tuning and deployment in real

Externí odkaz: http://arxiv.org/abs/2210.07795

Zobrazit plný text záznamu

Report

Write and Paint: Generative Vision-Language Models are Unified Modal Learners

Autor: Diao, Shizhe, Zhou, Wangchunshu, Zhang, Xinsong, Wang, Jiawei

Recent advances in vision-language pre-training have pushed the state-of-the-art on various vision-language tasks, making machines more capable of multi-modal writing (image-to-text generation) and painting (text-to-image generation). However, few st

Externí odkaz: http://arxiv.org/abs/2206.07699

Zobrazit plný text záznamu

Report

Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-training

Autor: Zeng, Yan, Zhou, Wangchunshu, Luo, Ao, Cheng, Ziming, Zhang, Xinsong

In this paper, we introduce Cross-View Language Modeling, a simple and effective pre-training framework that unifies cross-lingual and cross-modal pre-training with shared architectures and objectives. Our approach is motivated by a key observation t

Externí odkaz: http://arxiv.org/abs/2206.00621

Zobrazit plný text záznamu

Report

VLUE: A Multi-Task Benchmark for Evaluating Vision-Language Models

Autor: Zhou, Wangchunshu, Zeng, Yan, Diao, Shizhe, Zhang, Xinsong

Recent advances in vision-language pre-training (VLP) have demonstrated impressive performance in a range of vision-language (VL) tasks. However, there exist several challenges for measuring the community's progress in building general multi-modal in

Externí odkaz: http://arxiv.org/abs/2205.15237

Zobrazit plný text záznamu

Akademický článek

River-fed basin-fill and hyperpycnites in an island-arc setting: The Silurian–Devonian Kekexiongkuduke Formation from Western Junggar, China

Autor: Zhang, Xinsong, Du, Weidong, Zong, Ruiwen, Yin, Jiayi

Publikováno v: In Marine and Petroleum Geology October 2024 168

Zobrazit plný text záznamu

Report

Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts

Autor: Zeng, Yan, Zhang, Xinsong, Li, Hang

Most existing methods in vision language pre-training rely on object-centric features extracted through object detection and make fine-grained alignments between the extracted features and texts. It is challenging for these methods to learn relations

Externí odkaz: http://arxiv.org/abs/2111.08276

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání