Zobrazeno 1 - 3
of 3
pro vyhledávání: '"Zhang, Haoji"'
This paper describes our champion solution to the LOVEU Challenge @ CVPR'24, Track 1 (Long Video VQA). Processing long sequences of visual tokens is computationally expensive and memory-intensive, making long video question-answering a challenging ta
Externí odkaz:
http://arxiv.org/abs/2407.00603
Benefiting from the advancements in large language models and cross-modal alignment, existing multi-modal video understanding methods have achieved prominent performance in offline scenario. However, online video streams, as one of the most common me
Externí odkaz:
http://arxiv.org/abs/2406.08085
Autor:
Li, Jianhui, Li, Jianmin, Zhang, Haoji, Liu, Shilong, Wang, Zhengyi, Xiao, Zihao, Zheng, Kaiwen, Zhu, Jun
We study the 3D-aware image attribute editing problem in this paper, which has wide applications in practice. Recent methods solved the problem by training a shared encoder to map images into a 3D generator's latent space or by per-image latent code
Externí odkaz:
http://arxiv.org/abs/2304.10263