Výsledky vyhledávání

Report

Pattern Integration and Enhancement Vision Transformer for Self-Supervised Learning in Remote Sensing

Autor: Lu, Kaixuan, Zhang, Ruiqian, Huang, Xiao, Xie, Yuxing, Ning, Xiaogang, Zhang, Hanchao, Yuan, Mengke, Zhang, Pan, Wang, Tao, Liao, Tongkui

Recent self-supervised learning (SSL) methods have demonstrated impressive results in learning visual representations from unlabeled remote sensing images. However, most remote sensing images predominantly consist of scenographic scenes containing mu

Externí odkaz: http://arxiv.org/abs/2411.06091

Zobrazit plný text záznamu

Report

MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models

Autor: Liu, Ziyu, Zang, Yuhang, Dong, Xiaoyi, Zhang, Pan, Cao, Yuhang, Duan, Haodong, He, Conghui, Xiong, Yuanjun, Lin, Dahua, Wang, Jiaqi

Visual preference alignment involves training Large Vision-Language Models (LVLMs) to predict human preferences between visual inputs. This is typically achieved by using labeled datasets of chosen/rejected pairs and employing optimization algorithms

Externí odkaz: http://arxiv.org/abs/2410.17637

Zobrazit plný text záznamu

Report

PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction

Autor: Xing, Long, Huang, Qidong, Dong, Xiaoyi, Lu, Jiajie, Zhang, Pan, Zang, Yuhang, Cao, Yuhang, He, Conghui, Wang, Jiaqi, Wu, Feng, Lin, Dahua

In large vision-language models (LVLMs), images serve as inputs that carry a wealth of information. As the idiom "A picture is worth a thousand words" implies, representing a single image in current LVLMs can require hundreds or even thousands of tok

Externí odkaz: http://arxiv.org/abs/2410.17247

Zobrazit plný text záznamu

Report

SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree

Autor: Ding, Shuangrui, Qian, Rui, Dong, Xiaoyi, Zhang, Pan, Zang, Yuhang, Cao, Yuhang, Guo, Yuwei, Lin, Dahua, Wang, Jiaqi

The Segment Anything Model 2 (SAM 2) has emerged as a powerful foundation model for object segmentation in both images and videos, paving the way for various downstream video applications. The crucial design of SAM 2 for video segmentation is its mem

Externí odkaz: http://arxiv.org/abs/2410.16268

Zobrazit plný text záznamu

Report

Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate

Autor: Huang, Qidong, Dong, Xiaoyi, Zhang, Pan, Zang, Yuhang, Cao, Yuhang, Wang, Jiaqi, Lin, Dahua, Zhang, Weiming, Yu, Nenghai

We present the Modality Integration Rate (MIR), an effective, robust, and generalized metric to indicate the multi-modal pre-training quality of Large Vision Language Models (LVLMs). Large-scale pre-training plays a critical role in building capable

Externí odkaz: http://arxiv.org/abs/2410.07167

Zobrazit plný text záznamu

Report

BroadWay: Boost Your Text-to-Video Generation Model in a Training-free Way

Autor: Bu, Jiazi, Ling, Pengyang, Zhang, Pan, Wu, Tong, Dong, Xiaoyi, Zang, Yuhang, Cao, Yuhang, Lin, Dahua, Wang, Jiaqi

The text-to-video (T2V) generation models, offering convenient visual creation, have recently garnered increasing attention. Despite their substantial potential, the generated videos may present artifacts, including structural implausibility, tempora

Externí odkaz: http://arxiv.org/abs/2410.06241

Zobrazit plný text záznamu

Report

Efficient Optimization of Variational Autoregressive Networks with Natural Gradient

Autor: Liu, Jing, Tang, Ying, Zhang, Pan

Estimating free energy is a fundamental problem in statistical mechanics. Recently, machine-learning-based methods, particularly the variational autoregressive networks (VANs), have been proposed to minimize variational free energy and to approximate

Externí odkaz: http://arxiv.org/abs/2409.20029

Zobrazit plný text záznamu

Report

Tensor network Monte Carlo simulations for the two-dimensional random-bond Ising model

Autor: Chen, Tao, Guo, Erdong, Zhang, Wanzhou, Zhang, Pan, Deng, Youjin

Disordered lattice spin systems are crucial in both theoretical and applied physics. However, understanding their properties poses significant challenges for Monte Carlo simulations. In this work, we investigate the two-dimensional random-bond Ising

Externí odkaz: http://arxiv.org/abs/2409.06538

Zobrazit plný text záznamu

Report

Bilayer TeO2: The First Oxide Semiconductor with Symmetric Sub-5-nm NMOS and PMOS

Autor: Xu, Linqiang, Zhao, Liya, Lau, Chit Siong, Zhang, Pan, Xu, Lianqiang, Li, Qiuhui, Fang, Shibo, Ang, Yee Sin, Sun, Xiaotian, Lu, Jing

Wide bandgap oxide semiconductors are very promising channel candidates for next-generation electronics due to their large-area manufacturing, high-quality dielectrics, low contact resistance, and low leakage current. However, the absence of ultra-sh

Externí odkaz: http://arxiv.org/abs/2408.07339

Zobrazit plný text záznamu

Report

The Kramers escape rate of phase transitions for the 6-dimensional Gauss-Bonnet AdS black hole with triple phases

Autor: Ma, Chen, Zhang, Pan-Pan, Wu, Bin, Xu, Zhen-Ming

In this study, we obtain specific picture of the phase transitions for the 6-dimensional Gauss-Bonnet Anti-de Sitter (AdS) black hole with triple phases, using the generalized free energy we constructed and Kramers escape rate in stochastic motion. T

Externí odkaz: http://arxiv.org/abs/2407.20512

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání