Zobrazeno 1 - 10
of 15 783
pro vyhledávání: '"An, Jiwen"'
We are concerned with the mixed local/nonlocal Schr\"{o}dinger equation \begin{equation} - \Delta u + (-\Delta)^s u+u = u^{p+1} \quad \hbox{in $\mathbb{R}^n$,} \end{equation} for arbitrary space dimension $n\geqslant1$, any $s\in(0,1)$ and $p\in(0,2^
Externí odkaz:
http://arxiv.org/abs/2410.19616
Autor:
Fei, Xin, Zheng, Wenzhao, Duan, Yueqi, Zhan, Wei, Tomizuka, Masayoshi, Keutzer, Kurt, Lu, Jiwen
We propose PixelGaussian, an efficient feed-forward framework for learning generalizable 3D Gaussian reconstruction from arbitrary views. Most existing methods rely on uniform pixel-wise Gaussian representations, which learn a fixed number of 3D Gaus
Externí odkaz:
http://arxiv.org/abs/2410.18979
Autor:
Qin, Yiran, Shi, Zhelun, Yu, Jiwen, Wang, Xijun, Zhou, Enshen, Li, Lijun, Yin, Zhenfei, Liu, Xihui, Sheng, Lu, Shao, Jing, Bai, Lei, Ouyang, Wanli, Zhang, Ruimao
Recent advancements in predictive models have demonstrated exceptional capabilities in predicting the future state of objects and scenes. However, the lack of categorization based on inherent characteristics continues to hinder the progress of predic
Externí odkaz:
http://arxiv.org/abs/2410.18072
Mamba has garnered widespread attention due to its flexible design and efficient hardware performance to process 1D sequences based on the state space model (SSM). Recent studies have attempted to apply Mamba to the visual domain by flattening 2D ima
Externí odkaz:
http://arxiv.org/abs/2410.10382
Vision mambas have demonstrated strong performance with linear complexity to the number of vision tokens. Their efficiency results from processing image tokens sequentially. However, most existing methods employ patch-based image tokenization and the
Externí odkaz:
http://arxiv.org/abs/2410.10316
In this paper, we propose a new framework for zero-shot object navigation. Existing zero-shot object navigation methods prompt LLM with the text of spatially closed objects, which lacks enough scene context for in-depth reasoning. To better preserve
Externí odkaz:
http://arxiv.org/abs/2410.08189
In this paper, we propose a post-training quantization framework of large vision-language models (LVLMs) for efficient multi-modal inference. Conventional quantization methods sequentially search the layer-wise rounding functions by minimizing activa
Externí odkaz:
http://arxiv.org/abs/2410.08119
TextHawk2: A Large Vision-Language Model Excels in Bilingual OCR and Grounding with 16x Fewer Tokens
Reading dense text and locating objects within images are fundamental abilities for Large Vision-Language Models (LVLMs) tasked with advanced jobs. Previous LVLMs, including superior proprietary models like GPT-4o, have struggled to excel in both tas
Externí odkaz:
http://arxiv.org/abs/2410.05261
In this paper, we propose a One-Point-One NeRF (OPONeRF) framework for robust scene rendering. Existing NeRFs are designed based on a key assumption that the target scene remains unchanged between the training and test time. However, small but unpred
Externí odkaz:
http://arxiv.org/abs/2409.20043
Building on the success of diffusion models in visual generation, flow-based models reemerge as another prominent family of generative models that have achieved competitive or better performance in terms of both visual quality and inference speed. By
Externí odkaz:
http://arxiv.org/abs/2409.18128