Výsledky vyhledávání

Report

IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model

Autor: Ji, Yatai, Zhang, Shilong, Wu, Jie, Sun, Peize, Chen, Weifeng, Xiao, Xuefeng, Yang, Sidi, Yang, Yujiu, Luo, Ping

The rapid advancement of Large Vision-Language models (LVLMs) has demonstrated a spectrum of emergent capabilities. Nevertheless, current models only focus on the visual content of a single scenario, while their ability to associate instances across

Externí odkaz: http://arxiv.org/abs/2407.07577

Zobrazit plný text záznamu

Report

PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents

Autor: Wang, Junjie, Zhang, Yin, Ji, Yatai, Zhang, Yuxiang, Jiang, Chunyang, Wang, Yubo, Zhu, Kang, Wang, Zekun, Wang, Tiezhen, Huang, Wenhao, Fu, Jie, Chen, Bei, Lin, Qunshu, Liu, Minghao, Zhang, Ge, Chen, Wenhu

Recent advancements in Large Multimodal Models (LMMs) have leveraged extensive multimodal datasets to enhance capabilities in complex knowledge-driven tasks. However, persistent challenges in perceptual and reasoning errors limit their efficacy, part

Externí odkaz: http://arxiv.org/abs/2406.13923

Zobrazit plný text záznamu

Report

Taming Lookup Tables for Efficient Image Retouching

Autor: Yang, Sidi, Huang, Binxiao, Cao, Mingdeng, Ji, Yatai, Guo, Hanzhong, Wong, Ngai, Yang, Yujiu

The widespread use of high-definition screens in edge devices, such as end-user cameras, smartphones, and televisions, is spurring a significant demand for image enhancement. Existing enhancement models often optimize for high performance while falli

Externí odkaz: http://arxiv.org/abs/2403.19238

Zobrazit plný text záznamu

Report

Global and Local Semantic Completion Learning for Vision-Language Pre-training

Autor: Tu, Rong-Cheng, Ji, Yatai, Jiang, Jie, Kong, Weijie, Cai, Chengfei, Zhao, Wenzhe, Wang, Hongfa, Yang, Yujiu, Liu, Wei

Cross-modal alignment plays a crucial role in vision-language pre-training (VLP) models, enabling them to capture meaningful associations across different modalities. For this purpose, numerous masked modeling tasks have been proposed for VLP to furt

Externí odkaz: http://arxiv.org/abs/2306.07096

Zobrazit plný text záznamu

Report

Control-A-Video: Controllable Text-to-Video Diffusion Models with Motion Prior and Reward Feedback Learning

Autor: Chen, Weifeng, Ji, Yatai, Wu, Jie, Wu, Hefeng, Xie, Pan, Li, Jiashi, Xia, Xin, Xiao, Xuefeng, Lin, Liang

Recent advances in text-to-image (T2I) diffusion models have enabled impressive image generation capabilities guided by text prompts. However, extending these techniques to video generation remains challenging, with existing text-to-video (T2V) metho

Externí odkaz: http://arxiv.org/abs/2305.13840

Zobrazit plný text záznamu

Report

Multimodal Prototype-Enhanced Network for Few-Shot Action Recognition

Autor: Ni, Xinzhe, Liu, Yong, Wen, Hao, Ji, Yatai, Xiao, Jing, Yang, Yujiu

Current methods for few-shot action recognition mainly fall into the metric learning framework following ProtoNet, which demonstrates the importance of prototypes. Although they achieve relatively good performance, the effect of multimodal informatio

Externí odkaz: http://arxiv.org/abs/2212.04873

Zobrazit plný text záznamu

Report

Seeing What You Miss: Vision-Language Pre-training with Semantic Completion Learning

Autor: Ji, Yatai, Tu, Rongcheng, Jiang, Jie, Kong, Weijie, Cai, Chengfei, Zhao, Wenzhe, Wang, Hongfa, Yang, Yujiu, Liu, Wei

Cross-modal alignment is essential for vision-language pre-training (VLP) models to learn the correct corresponding information across different modalities. For this purpose, inspired by the success of masked language modeling (MLM) tasks in the NLP

Externí odkaz: http://arxiv.org/abs/2211.13437

Zobrazit plný text záznamu

Report

MAP: Multimodal Uncertainty-Aware Vision-Language Pre-training Model

Autor: Ji, Yatai, Wang, Junjie, Gong, Yuan, Zhang, Lin, Zhu, Yanru, Wang, Hongfa, Zhang, Jiaxing, Sakai, Tetsuya, Yang, Yujiu

Multimodal semantic understanding often has to deal with uncertainty, which means the obtained messages tend to refer to multiple targets. Such uncertainty is problematic for our interpretation, including inter- and intra-modal uncertainty. Little ef

Externí odkaz: http://arxiv.org/abs/2210.05335

Zobrazit plný text záznamu

Akademický článek

Effect of Environmental and Operating Conditions on Partial Discharge Activity in Electrical Machine Insulation: A Comprehensive Review.

Autor: Ji, Yatai¹ (AUTHOR) jytv5@mail.tsinghua.edu.cn, Giangrande, Paolo² (AUTHOR) paolo.giangrande@unibg.it, Zhao, Weiduo³ (AUTHOR) weiduo.zhao@nottingham.edu.cn

Publikováno v: Energies (19961073). Aug2024, Vol. 17 Issue 16, p3980. 32p.

Zobrazit plný text záznamu

Plný text ve formátu HTML

Vyhledávací nástroje:

Upřesnit hledání