Výsledky vyhledávání

Report

VITA: Towards Open-Source Interactive Omni Multimodal LLM

Autor: Fu, Chaoyou, Lin, Haojia, Long, Zuwei, Shen, Yunhang, Zhao, Meng, Zhang, Yifan, Wang, Xiong, Yin, Di, Ma, Long, Zheng, Xiawu, He, Ran, Ji, Rongrong, Wu, Yunsheng, Shan, Caifeng, Sun, Xing

The remarkable multimodal capabilities and interactive experience of GPT-4o underscore their necessity in practical applications, yet open-source models rarely excel in both areas. In this paper, we introduce VITA, the first-ever open-source Multimod

Externí odkaz: http://arxiv.org/abs/2408.05211

Zobrazit plný text záznamu

Report

STAMP: Outlier-Aware Test-Time Adaptation with Stable Memory Replay

Autor: Yu, Yongcan, Sheng, Lijun, He, Ran, Liang, Jian

Test-time adaptation (TTA) aims to address the distribution shift between the training and test data with only unlabeled data at test time. Existing TTA methods often focus on improving recognition performance specifically for test data associated wi

Externí odkaz: http://arxiv.org/abs/2407.15773

Zobrazit plný text záznamu

Report

ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation

Autor: Yang, Shaoshu, Zhang, Yong, Cun, Xiaodong, Shan, Ying, He, Ran

Video generation has made remarkable progress in recent years, especially since the advent of the video diffusion models. Many video generation models can produce plausible synthetic videos, e.g., Stable Video Diffusion (SVD). However, most video mod

Externí odkaz: http://arxiv.org/abs/2406.00908

Zobrazit plný text záznamu

Report

Visual Anchors Are Strong Information Aggregators For Multimodal Large Language Model

Autor: Liu, Haogeng, You, Quanzeng, Han, Xiaotian, Liu, Yongfei, Huang, Huaibo, He, Ran, Yang, Hongxia

In the realm of Multimodal Large Language Models (MLLMs), vision-language connector plays a crucial role to link the pre-trained vision encoders with Large Language Models (LLMs). Despite its importance, the vision-language connector has been relativ

Externí odkaz: http://arxiv.org/abs/2405.17815

Zobrazit plný text záznamu

Report

Semantic Equitable Clustering: A Simple, Fast and Effective Strategy for Vision Transformer

Autor: Fan, Qihang, Huang, Huaibo, Chen, Mingrui, He, Ran

The Vision Transformer (ViT) has gained prominence for its superior relational modeling prowess. However, its global attention mechanism's quadratic complexity poses substantial computational burdens. A common remedy spatially groups tokens for self-

Externí odkaz: http://arxiv.org/abs/2405.13337

Zobrazit plný text záznamu

Report

Vision Transformer with Sparse Scan Prior

Autor: Fan, Qihang, Huang, Huaibo, Chen, Mingrui, He, Ran

In recent years, Transformers have achieved remarkable progress in computer vision tasks. However, their global modeling often comes with substantial computational overhead, in stark contrast to the human eye's efficient information processing. Inspi

Externí odkaz: http://arxiv.org/abs/2405.13335

Zobrazit plný text záznamu

Report

Band-Attention Modulated RetNet for Face Forgery Detection

Autor: Zhang, Zhida, Cao, Jie, Yang, Wenkui, Fan, Qihang, Zhou, Kai, He, Ran

The transformer networks are extensively utilized in face forgery detection due to their scalability across large datasets.Despite their success, transformers face challenges in balancing the capture of global context, which is crucial for unveiling

Externí odkaz: http://arxiv.org/abs/2404.06022

Zobrazit plný text záznamu

Report

ViTAR: Vision Transformer with Any Resolution

Autor: Fan, Qihang, You, Quanzeng, Han, Xiaotian, Liu, Yongfei, Tao, Yunzhe, Huang, Huaibo, He, Ran, Yang, Hongxia

This paper tackles a significant challenge faced by Vision Transformers (ViTs): their constrained scalability across different image resolutions. Typically, ViTs experience a performance decline when processing resolutions different from those seen d

Externí odkaz: http://arxiv.org/abs/2403.18361

Zobrazit plný text záznamu

Report

Exact Markovian Dynamics in Quantum Circuits

Autor: Wang, He-Ran, Yang, Xiao-Yang, Wang, Zhong

Characterizing non-equilibrium dynamics in quantum many-body systems is a challenging frontier of physics. In this work, we systematically construct solvable non-integrable quantum circuits that exhibit exact Markovian subsystem dynamics. This featur

Externí odkaz: http://arxiv.org/abs/2403.14807

Zobrazit plný text záznamu

Report

Generalized Spin Helix States as Quantum Many-Body Scars in Partially Integrable Models

Autor: Wang, He-Ran, Yuan, Dong

Quantum many-body scars are highly excited eigenstates of non-integrable Hamiltonians which violate the eigenstate thermalization hypothesis and are embedded in a sea of thermal eigenstates. We provide a general mechanism to construct partially integ

Externí odkaz: http://arxiv.org/abs/2403.14755

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání