Zobrazeno 1 - 10
of 63
pro vyhledávání: '"Shi Yaya"'
Autor:
Zhang Lili, Yadav Vivek, Zhu Hengchen, Shi Yaya, Hu Xiangxiang, Wang Xiaoxia, Zhou Xiaoping, Subhash Babu, Kang Yongxiang
Publikováno v:
Ecological Indicators, Vol 163, Iss , Pp 112088- (2024)
Litterfall is a link in the energy flow and material cycling of ecosystems, which maintain the primary productivity of forests. However, there is no consensus regarding the factors driving for the litterfall biomass dynamics because of the high spati
Externí odkaz:
https://doaj.org/article/a633e4a5957345efb005f1cc7485dc08
Autor:
Liu, Haowei, Zhang, Xi, Xu, Haiyang, Shi, Yaya, Jiang, Chaoya, Yan, Ming, Zhang, Ji, Huang, Fei, Yuan, Chunfeng, Li, Bing, Hu, Weiming
Built on the power of LLMs, numerous multimodal large language models (MLLMs) have recently achieved remarkable performance on various vision-language tasks. However, most existing MLLMs and benchmarks primarily focus on single-image input scenarios,
Externí odkaz:
http://arxiv.org/abs/2407.15272
Publikováno v:
PLoS ONE, Vol 17, Iss 8, p e0272946 (2022)
Near-surface air temperature (Ta) is an important parameter in agricultural production and climate change. Satellite remote sensing data provide an effective way to estimate regional-scale air temperature. Therefore, taking Gansu section of the upper
Externí odkaz:
https://doaj.org/article/ac94ce8f033b4b2c937c42c79ddf1890
Autor:
Liu, Haowei, Shi, Yaya, Xu, Haiyang, Yuan, Chunfeng, Ye, Qinghao, Li, Chenliang, Yan, Ming, Zhang, Ji, Huang, Fei, Li, Bing, Hu, Weiming
In vision-language pre-training (VLP), masked image modeling (MIM) has recently been introduced for fine-grained cross-modal alignment. However, in most existing methods, the reconstruction targets for MIM lack high-level semantics, and text is not s
Externí odkaz:
http://arxiv.org/abs/2403.00249
Autor:
Liu, Haowei, Shi, Yaya, Xu, Haiyang, Yuan, Chunfeng, Ye, Qinghao, Li, Chenliang, Yan, Ming, Zhang, Ji, Huang, Fei, Li, Bing, Hu, Weiming
In video-text retrieval, most existing methods adopt the dual-encoder architecture for fast retrieval, which employs two individual encoders to extract global latent representations for videos and texts. However, they face challenges in capturing fin
Externí odkaz:
http://arxiv.org/abs/2402.16769
Autor:
Hu, Anwen, Shi, Yaya, Xu, Haiyang, Ye, Jiabo, Ye, Qinghao, Yan, Ming, Li, Chenliang, Qian, Qi, Zhang, Ji, Huang, Fei
Recently, the strong text creation ability of Large Language Models(LLMs) has given rise to many tools for assisting paper reading or even writing. However, the weak diagram analysis abilities of LLMs or Multimodal LLMs greatly limit their applicatio
Externí odkaz:
http://arxiv.org/abs/2311.18248
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks
Autor:
Xu, Haiyang, Ye, Qinghao, Wu, Xuan, Yan, Ming, Miao, Yuan, Ye, Jiabo, Xu, Guohai, Hu, Anwen, Shi, Yaya, Xu, Guangwei, Li, Chenliang, Qian, Qi, Que, Maofei, Zhang, Ji, Zeng, Xiao, Huang, Fei
To promote the development of Vision-Language Pre-training (VLP) and multimodal Large Language Model (LLM) in the Chinese community, we firstly release the largest public Chinese high-quality video-language dataset named Youku-mPLUG, which is collect
Externí odkaz:
http://arxiv.org/abs/2306.04362
Autor:
Ye, Qinghao, Xu, Haiyang, Xu, Guohai, Ye, Jiabo, Yan, Ming, Zhou, Yiyang, Wang, Junyang, Hu, Anwen, Shi, Pengcheng, Shi, Yaya, Li, Chenliang, Xu, Yuanhong, Chen, Hehong, Tian, Junfeng, Qian, Qi, Zhang, Ji, Huang, Fei, Zhou, Jingren
Large language models (LLMs) have demonstrated impressive zero-shot abilities on a variety of open-ended tasks, while recent research has also explored the use of LLMs for multi-modal generation. In this study, we introduce mPLUG-Owl, a novel trainin
Externí odkaz:
http://arxiv.org/abs/2304.14178
Autor:
Xu, Haiyang, Ye, Qinghao, Yan, Ming, Shi, Yaya, Ye, Jiabo, Xu, Yuanhong, Li, Chenliang, Bi, Bin, Qian, Qi, Wang, Wei, Xu, Guohai, Zhang, Ji, Huang, Songfang, Huang, Fei, Zhou, Jingren
Publikováno v:
ICML2023
Recent years have witnessed a big convergence of language, vision, and multi-modal pretraining. In this work, we present mPLUG-2, a new unified paradigm with modularized design for multi-modal pretraining, which can benefit from modality collaboratio
Externí odkaz:
http://arxiv.org/abs/2302.00402
Autor:
Shi, Qingmin, Shi, Yaya, Wang, Shuangming, Kou, Bingyang, Zhao, Hongchao, Ji, Ruijun, Yang, Xiaolong, Liu, Pan, Li, Zhuangzhuang
Publikováno v:
In Fuel 15 November 2024 376