TVENet: Transformer-Based Visual Exploration Network for Mobile Robot in Unseen Environment

Autor: Tianyao Zhang, Xiaoguang Hu, Jin Xiao, Guofeng Zhang
Jazyk: angličtina
Rok vydání: 2022
Předmět:
Zdroj: IEEE Access, Vol 10, Pp 62056-62072 (2022)
Druh dokumentu: article
ISSN: 2169-3536
DOI: 10.1109/ACCESS.2022.3181989
Popis: This paper presents a Transformer-based Visual Exploration Network (TVENet) that capably serves as a solution for active perception problems, especially the visual exploration problem: How could a robot that is equipped with a camera explore an unknown 3D environment? The TVENet consists of a Mapper, a Global Policy and a Local Policy. The mapper is trained by supervised learning to take the visual observation as input and generate an occupancy grid map for the explored environment. The Global Policy and the Local Policy are trained by reinforcement learning in order to make navigation decision. Most state-of-the-art methods in visual exploration domain use ResNet as feature extractor, and few of them pay attention to the extraction capability of the extractor. Therefore, this paper focuses on enhancing the extraction capability, and proposes a Transformer-based Feature Pyramid Module (TFPM). Moreover, two tricks for training process are introduced to improve the performance (M.F. and Aux.) Our experiments in photo-realistic simulated environment (Habitat) demonstrate the higher-accuracy mapping of TVENet. Experimental results prove that the TFPM and tricks have positive impacts on the mapping accuracy of the visual exploration and increase it by 5.31% compared with the state-of-the-art. Most importantly, the TVENet is deployed on a real robot (NVIDIA Jetbot) to prove the feasibility of Embodied AI approaches. To the authors’ knowledge, this paper is the first one that proves the viability of the Embodied AI style approach for visual exploration tasks and deploys the pre-trained model on the NVIDIA Jetson robot.
Databáze: Directory of Open Access Journals