Zobrazeno 1 - 10
of 1 825
pro vyhledávání: '"He, Ran"'
Autor:
Fu, Chaoyou, Lin, Haojia, Long, Zuwei, Shen, Yunhang, Zhao, Meng, Zhang, Yifan, Wang, Xiong, Yin, Di, Ma, Long, Zheng, Xiawu, He, Ran, Ji, Rongrong, Wu, Yunsheng, Shan, Caifeng, Sun, Xing
The remarkable multimodal capabilities and interactive experience of GPT-4o underscore their necessity in practical applications, yet open-source models rarely excel in both areas. In this paper, we introduce VITA, the first-ever open-source Multimod
Externí odkaz:
http://arxiv.org/abs/2408.05211
Test-time adaptation (TTA) aims to address the distribution shift between the training and test data with only unlabeled data at test time. Existing TTA methods often focus on improving recognition performance specifically for test data associated wi
Externí odkaz:
http://arxiv.org/abs/2407.15773
Video generation has made remarkable progress in recent years, especially since the advent of the video diffusion models. Many video generation models can produce plausible synthetic videos, e.g., Stable Video Diffusion (SVD). However, most video mod
Externí odkaz:
http://arxiv.org/abs/2406.00908
Autor:
Liu, Haogeng, You, Quanzeng, Han, Xiaotian, Liu, Yongfei, Huang, Huaibo, He, Ran, Yang, Hongxia
In the realm of Multimodal Large Language Models (MLLMs), vision-language connector plays a crucial role to link the pre-trained vision encoders with Large Language Models (LLMs). Despite its importance, the vision-language connector has been relativ
Externí odkaz:
http://arxiv.org/abs/2405.17815
The Vision Transformer (ViT) has gained prominence for its superior relational modeling prowess. However, its global attention mechanism's quadratic complexity poses substantial computational burdens. A common remedy spatially groups tokens for self-
Externí odkaz:
http://arxiv.org/abs/2405.13337
In recent years, Transformers have achieved remarkable progress in computer vision tasks. However, their global modeling often comes with substantial computational overhead, in stark contrast to the human eye's efficient information processing. Inspi
Externí odkaz:
http://arxiv.org/abs/2405.13335
The transformer networks are extensively utilized in face forgery detection due to their scalability across large datasets.Despite their success, transformers face challenges in balancing the capture of global context, which is crucial for unveiling
Externí odkaz:
http://arxiv.org/abs/2404.06022
Autor:
Fan, Qihang, You, Quanzeng, Han, Xiaotian, Liu, Yongfei, Tao, Yunzhe, Huang, Huaibo, He, Ran, Yang, Hongxia
This paper tackles a significant challenge faced by Vision Transformers (ViTs): their constrained scalability across different image resolutions. Typically, ViTs experience a performance decline when processing resolutions different from those seen d
Externí odkaz:
http://arxiv.org/abs/2403.18361
Characterizing non-equilibrium dynamics in quantum many-body systems is a challenging frontier of physics. In this work, we systematically construct solvable non-integrable quantum circuits that exhibit exact Markovian subsystem dynamics. This featur
Externí odkaz:
http://arxiv.org/abs/2403.14807
Autor:
Wang, He-Ran, Yuan, Dong
Quantum many-body scars are highly excited eigenstates of non-integrable Hamiltonians which violate the eigenstate thermalization hypothesis and are embedded in a sea of thermal eigenstates. We provide a general mechanism to construct partially integ
Externí odkaz:
http://arxiv.org/abs/2403.14755