End-to-end Contextual Perception and Prediction with Interaction Transformer
Autor: | Mengye Ren, Ming Liang, Bin Yang, Raquel Urtasun, Wenyuan Zeng, Lingyun Luke Li, Sean Segal |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2020 |
Předmět: |
FOS: Computer and information sciences
Computer science media_common.quotation_subject Computer Vision and Pattern Recognition (cs.CV) Computer Science - Computer Vision and Pattern Recognition Context (language use) 010501 environmental sciences Machine learning computer.software_genre 01 natural sciences Motion (physics) Computer Science - Robotics End-to-end principle Perception 0502 economics and business 050207 economics 0105 earth and related environmental sciences media_common Transformer (machine learning model) business.industry 05 social sciences Recurrent neural network Artificial intelligence business computer Robotics (cs.RO) |
Zdroj: | IROS |
Popis: | In this paper, we tackle the problem of detecting objects in 3D and forecasting their future motion in the context of self-driving. Towards this goal, we design a novel approach that explicitly takes into account the interactions between actors. To capture their spatial-temporal dependencies, we propose a recurrent neural network with a novel Transformer architecture, which we call the Interaction Transformer. Importantly, our model can be trained end-to-end, and runs in real-time. We validate our approach on two challenging real-world datasets: ATG4D and nuScenes. We show that our approach can outperform the state-of-the-art on both datasets. In particular, we significantly improve the social compliance between the estimated future trajectories, resulting in far fewer collisions between the predicted actors. IROS 2020 |
Databáze: | OpenAIRE |
Externí odkaz: |