Zobrazeno 1 - 10
of 1 191
pro vyhledávání: '"XU Haiyang"'
Autor:
ZHAO Weixia, LI Yanfeng, ZHANG Baozhong, YU Yingduo, LUO Peng, XU Haiyang, PENG Kunhai, LEI Zhendong
Publikováno v:
Guan'gai paishui xuebao, Vol 41, Iss 9, Pp 1-5 (2022)
Intensive agriculture and other anthropogenic activities have led to soil degradation in many regions across China. Improving soil fertility and restoring soil functions is important to maintain sustainable agricultural production and safeguard food
Externí odkaz:
https://doaj.org/article/acbe60fc245343ac9dae286eb5dcab98
Diffusion models demonstrate impressive image generation performance with text guidance. Inspired by the learning process of diffusion, existing images can be edited according to text by DDIM inversion. However, the vanilla DDIM inversion is not opti
Externí odkaz:
http://arxiv.org/abs/2409.10476
Autor:
Hu, Anwen, Xu, Haiyang, Zhang, Liang, Ye, Jiabo, Yan, Ming, Zhang, Ji, Jin, Qin, Huang, Fei, Zhou, Jingren
Multimodel Large Language Models(MLLMs) have achieved promising OCR-free Document Understanding performance by increasing the supported resolution of document images. However, this comes at the cost of generating thousands of visual tokens for a sing
Externí odkaz:
http://arxiv.org/abs/2409.03420
Autor:
Jiang, Chaoya, Hongrui, Jia, Xu, Haiyang, Ye, Wei, Dong, Mengfan, Yan, Ming, Zhang, Ji, Huang, Fei, Zhang, Shikun
This paper presents MaVEn, an innovative Multi-granularity Visual Encoding framework designed to enhance the capabilities of Multimodal Large Language Models (MLLMs) in multi-image reasoning. Current MLLMs primarily focus on single-image visual under
Externí odkaz:
http://arxiv.org/abs/2408.12321
Autor:
Ye, Jiabo, Xu, Haiyang, Liu, Haowei, Hu, Anwen, Yan, Ming, Qian, Qi, Zhang, Ji, Huang, Fei, Zhou, Jingren
Multi-modal Large Language Models (MLLMs) have demonstrated remarkable capabilities in executing instructions for a variety of single-image tasks. Despite this progress, significant challenges remain in modeling long image sequences. In this work, we
Externí odkaz:
http://arxiv.org/abs/2408.04840
Autor:
Liu, Haowei, Zhang, Xi, Xu, Haiyang, Shi, Yaya, Jiang, Chaoya, Yan, Ming, Zhang, Ji, Huang, Fei, Yuan, Chunfeng, Li, Bing, Hu, Weiming
Built on the power of LLMs, numerous multimodal large language models (MLLMs) have recently achieved remarkable performance on various vision-language tasks. However, most existing MLLMs and benchmarks primarily focus on single-image input scenarios,
Externí odkaz:
http://arxiv.org/abs/2407.15272
We provide a two-way integration for the widely adopted ControlNet by integrating external condition generation algorithms into a single dense prediction method and incorporating its individually trained image generation processes into a single model
Externí odkaz:
http://arxiv.org/abs/2406.05871
Autor:
Wang, Junyang, Xu, Haiyang, Jia, Haitao, Zhang, Xi, Yan, Ming, Shen, Weizhou, Zhang, Ji, Huang, Fei, Sang, Jitao
Mobile device operation tasks are increasingly becoming a popular multi-modal AI application scenario. Current Multi-modal Large Language Models (MLLMs), constrained by their training data, lack the capability to function effectively as operation ass
Externí odkaz:
http://arxiv.org/abs/2406.01014
Publikováno v:
Nanophotonics, Vol 8, Iss 12, Pp 2203-2213 (2019)
The development of short-wavelength light-emitting diodes (LEDs) with high emission efficiency, a fascinating research area, is still necessary because of great scientific interest and practical significance. Here, a graphene plasmon layer treated by
Externí odkaz:
https://doaj.org/article/ba69fe354f8a4125869e5f37879913d3
Charts are important for presenting and explaining complex data relationships. Recently, multimodal large language models (MLLMs) have shown remarkable capabilities in various chart understanding tasks. However, the sheer size of these models in term
Externí odkaz:
http://arxiv.org/abs/2404.16635