Výsledky vyhledávání

Report

PixelGaussian: Generalizable 3D Gaussian Reconstruction from Arbitrary Views

Autor: Fei, Xin, Zheng, Wenzhao, Duan, Yueqi, Zhan, Wei, Tomizuka, Masayoshi, Keutzer, Kurt, Lu, Jiwen

We propose PixelGaussian, an efficient feed-forward framework for learning generalizable 3D Gaussian reconstruction from arbitrary views. Most existing methods rely on uniform pixel-wise Gaussian representations, which learn a fixed number of 3D Gaus

Externí odkaz: http://arxiv.org/abs/2410.18979

Zobrazit plný text záznamu

Report

V2M: Visual 2-Dimensional Mamba for Image Representation Learning

Autor: Wang, Chengkun, Zheng, Wenzhao, Huang, Yuanhui, Zhou, Jie, Lu, Jiwen

Mamba has garnered widespread attention due to its flexible design and efficient hardware performance to process 1D sequences based on the state space model (SSM). Recent studies have attempted to apply Mamba to the visual domain by flattening 2D ima

Externí odkaz: http://arxiv.org/abs/2410.10382

Zobrazit plný text záznamu

Report

GlobalMamba: Global Image Serialization for Vision Mamba

Autor: Wang, Chengkun, Zheng, Wenzhao, Zhou, Jie, Lu, Jiwen

Vision mambas have demonstrated strong performance with linear complexity to the number of vision tokens. Their efficiency results from processing image tokens sequentially. However, most existing methods employ patch-based image tokenization and the

Externí odkaz: http://arxiv.org/abs/2410.10316

Zobrazit plný text záznamu

Report

SG-Nav: Online 3D Scene Graph Prompting for LLM-based Zero-shot Object Navigation

Autor: Yin, Hang, Xu, Xiuwei, Wu, Zhenyu, Zhou, Jie, Lu, Jiwen

In this paper, we propose a new framework for zero-shot object navigation. Existing zero-shot object navigation methods prompt LLM with the text of spatially closed objects, which lacks enough scene context for in-depth reasoning. To better preserve

Externí odkaz: http://arxiv.org/abs/2410.08189

Zobrazit plný text záznamu

Report

Q-VLM: Post-training Quantization for Large Vision-Language Models

Autor: Wang, Changyuan, Wang, Ziwei, Xu, Xiuwei, Tang, Yansong, Zhou, Jie, Lu, Jiwen

In this paper, we propose a post-training quantization framework of large vision-language models (LVLMs) for efficient multi-modal inference. Conventional quantization methods sequentially search the layer-wise rounding functions by minimizing activa

Externí odkaz: http://arxiv.org/abs/2410.08119

Zobrazit plný text záznamu

Report

OPONeRF: One-Point-One NeRF for Robust Neural Rendering

Autor: Zheng, Yu, Duan, Yueqi, Zheng, Kangfu, Yan, Hongru, Lu, Jiwen, Zhou, Jie

In this paper, we propose a One-Point-One NeRF (OPONeRF) framework for robust scene rendering. Existing NeRFs are designed based on a key assumption that the target scene remains unchanged between the training and test time. However, small but unpred

Externí odkaz: http://arxiv.org/abs/2409.20043

Zobrazit plný text záznamu

Report

FlowTurbo: Towards Real-time Flow-Based Image Generation with Velocity Refiner

Autor: Zhao, Wenliang, Shi, Minglei, Yu, Xumin, Zhou, Jie, Lu, Jiwen

Building on the success of diffusion models in visual generation, flow-based models reemerge as another prominent family of generative models that have achieved competitive or better performance in terms of both visual quality and inference speed. By

Externí odkaz: http://arxiv.org/abs/2409.18128

Zobrazit plný text záznamu

Report

Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution

Autor: Liu, Zuyan, Dong, Yuhao, Liu, Ziwei, Hu, Winston, Lu, Jiwen, Rao, Yongming

Visual data comes in various forms, ranging from small icons of just a few pixels to long videos spanning hours. Existing multi-modal LLMs usually standardize these diverse visual inputs to a fixed resolution for visual encoders and yield similar num

Externí odkaz: http://arxiv.org/abs/2409.12961

Zobrazit plný text záznamu

Report

DC-Solver: Improving Predictor-Corrector Diffusion Sampler via Dynamic Compensation

Autor: Zhao, Wenliang, Wang, Haolin, Zhou, Jie, Lu, Jiwen

Diffusion probabilistic models (DPMs) have shown remarkable performance in visual synthesis but are computationally expensive due to the need for multiple evaluations during the sampling. Recent predictor-corrector diffusion samplers have significant

Externí odkaz: http://arxiv.org/abs/2409.03755

Zobrazit plný text záznamu

Report

EmbodiedSAM: Online Segment Any 3D Thing in Real Time

Autor: Xu, Xiuwei, Chen, Huangxing, Zhao, Linqing, Wang, Ziwei, Zhou, Jie, Lu, Jiwen

Embodied tasks require the agent to fully understand 3D scenes simultaneously with its exploration, so an online, real-time, fine-grained and highly-generalized 3D perception model is desperately needed. Since high-quality 3D data is limited, directl

Externí odkaz: http://arxiv.org/abs/2408.11811

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání