LiDARTouch: Monocular metric depth estimation with a few-beam LiDAR

Autor:	Florent Bartoccioni, Éloi Zablocki, Patrick Pérez, Matthieu Cord, Karteek Alahari
Přispěvatelé:	Apprentissage de modèles à partir de données massives (Thoth), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Jean Kuntzmann (LJK), Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA), Valeo.ai, VALEO, ANR-18-CE23-0011,AVENUE,Réseau de mémoire visuelle pour l'interprétation de scènes(2018), ANR-18-CE23-0011
Jazyk:	angličtina
Rok vydání:	2021
Předmět:	FOS: Computer and information sciences Computer Science - Artificial Intelligence Computer Vision and Pattern Recognition (cs.CV) ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION Computer Science - Computer Vision and Pattern Recognition [INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV] 68T45 Computer Science - Robotics Artificial Intelligence (cs.AI) Signal Processing Computer Vision and Pattern Recognition Robotics (cs.RO) MSC 68T45 Software
Zdroj:	Computer Vision and Image Understanding Computer Vision and Image Understanding, 2023, 227, pp.103601. ⟨10.1016/j.cviu.2022.103601⟩
ISSN:	1077-3142 1090-235X
Popis:	Preprint. Under review; Vision-based depth estimation is a key feature in autonomous systems, which often relies on a single camera or several independent ones. In such a monocular setup, dense depth is obtained with either additional input from one or several expensive LiDARs, e.g., with 64 beams, or camera-only methods, which suffer from scale-ambiguity and infinite-depth problems. In this paper, we propose a new alternative of densely estimating metric depth by combining a monocular camera with a light-weight LiDAR, e.g., with 4 beams, typical of today's automotive-grade mass-produced laser scanners. Inspired by recent self-supervised methods, we introduce a novel framework, called LiDARTouch, to estimate dense depth maps from monocular images with the help of ``touches'' of LiDAR, i.e., without the need for dense ground-truth depth. In our setup, the minimal LiDAR input contributes on three different levels: as an additional model's input, in a self-supervised LiDAR reconstruction objective function, and to estimate changes of pose (a key component of self-supervised depth estimation architectures). Our LiDARTouch framework achieves new state of the art in self-supervised depth estimation on the KITTI dataset, thus supporting our choices of integrating the very sparse LiDAR signal with other visual features. Moreover, we show that the use of a few-beam LiDAR alleviates scale ambiguity and infinite-depth issues that camera-only methods suffer from. We also demonstrate that methods from the fully-supervised depth-completion literature can be adapted to a self-supervised regime with a minimal LiDAR signal.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::1cd9088cb2a5e91be3cef49b605e5fd7 http://arxiv.org/abs/2109.03569 Zobrazit plný text záznamu Full Text from ScienceDirect