Zobrazeno 1 - 10
of 69
pro vyhledávání: '"Ji, Yatai"'
Autor:
Ji, Yatai, Zhang, Shilong, Wu, Jie, Sun, Peize, Chen, Weifeng, Xiao, Xuefeng, Yang, Sidi, Yang, Yujiu, Luo, Ping
The rapid advancement of Large Vision-Language models (LVLMs) has demonstrated a spectrum of emergent capabilities. Nevertheless, current models only focus on the visual content of a single scenario, while their ability to associate instances across
Externí odkaz:
http://arxiv.org/abs/2407.07577
Autor:
Wang, Junjie, Zhang, Yin, Ji, Yatai, Zhang, Yuxiang, Jiang, Chunyang, Wang, Yubo, Zhu, Kang, Wang, Zekun, Wang, Tiezhen, Huang, Wenhao, Fu, Jie, Chen, Bei, Lin, Qunshu, Liu, Minghao, Zhang, Ge, Chen, Wenhu
Recent advancements in Large Multimodal Models (LMMs) have leveraged extensive multimodal datasets to enhance capabilities in complex knowledge-driven tasks. However, persistent challenges in perceptual and reasoning errors limit their efficacy, part
Externí odkaz:
http://arxiv.org/abs/2406.13923
The widespread use of high-definition screens in edge devices, such as end-user cameras, smartphones, and televisions, is spurring a significant demand for image enhancement. Existing enhancement models often optimize for high performance while falli
Externí odkaz:
http://arxiv.org/abs/2403.19238
Autor:
Tu, Rong-Cheng, Ji, Yatai, Jiang, Jie, Kong, Weijie, Cai, Chengfei, Zhao, Wenzhe, Wang, Hongfa, Yang, Yujiu, Liu, Wei
Cross-modal alignment plays a crucial role in vision-language pre-training (VLP) models, enabling them to capture meaningful associations across different modalities. For this purpose, numerous masked modeling tasks have been proposed for VLP to furt
Externí odkaz:
http://arxiv.org/abs/2306.07096
Autor:
Chen, Weifeng, Ji, Yatai, Wu, Jie, Wu, Hefeng, Xie, Pan, Li, Jiashi, Xia, Xin, Xiao, Xuefeng, Lin, Liang
Recent advances in text-to-image (T2I) diffusion models have enabled impressive image generation capabilities guided by text prompts. However, extending these techniques to video generation remains challenging, with existing text-to-video (T2V) metho
Externí odkaz:
http://arxiv.org/abs/2305.13840
Current methods for few-shot action recognition mainly fall into the metric learning framework following ProtoNet, which demonstrates the importance of prototypes. Although they achieve relatively good performance, the effect of multimodal informatio
Externí odkaz:
http://arxiv.org/abs/2212.04873
Autor:
Ji, Yatai, Tu, Rongcheng, Jiang, Jie, Kong, Weijie, Cai, Chengfei, Zhao, Wenzhe, Wang, Hongfa, Yang, Yujiu, Liu, Wei
Cross-modal alignment is essential for vision-language pre-training (VLP) models to learn the correct corresponding information across different modalities. For this purpose, inspired by the success of masked language modeling (MLM) tasks in the NLP
Externí odkaz:
http://arxiv.org/abs/2211.13437
Autor:
Ji, Yatai, Wang, Junjie, Gong, Yuan, Zhang, Lin, Zhu, Yanru, Wang, Hongfa, Zhang, Jiaxing, Sakai, Tetsuya, Yang, Yujiu
Multimodal semantic understanding often has to deal with uncertainty, which means the obtained messages tend to refer to multiple targets. Such uncertainty is problematic for our interpretation, including inter- and intra-modal uncertainty. Little ef
Externí odkaz:
http://arxiv.org/abs/2210.05335
Autor:
Ji, Yatai1 (AUTHOR) jytv5@mail.tsinghua.edu.cn, Giangrande, Paolo2 (AUTHOR) paolo.giangrande@unibg.it, Zhao, Weiduo3 (AUTHOR) weiduo.zhao@nottingham.edu.cn
Publikováno v:
Energies (19961073). Aug2024, Vol. 17 Issue 16, p3980. 32p.
Autor:
Wang, Xiaohui, Dao, Fuhai, Ji, Yatai, Qiu, Sihang, Zhu, Xian, Dong, Wenjie, Wang, Huizan, Zhang, Weimin, Zheng, Xiaolong
Publikováno v:
In The Innovation 1 July 2024 5(4)