Evaluating Vision Transformer Methods for Deep Reinforcement Learning from Pixels

Autor:	Tao, Tianxin, Reda, Daniele, van de Panne, Michiel
Rok vydání:	2022
Předmět:	Computer Science - Machine Learning Computer Science - Computer Vision and Pattern Recognition Computer Science - Robotics
Druh dokumentu:	Working Paper
Popis:	Vision Transformers (ViT) have recently demonstrated the significant potential of transformer architectures for computer vision. To what extent can image-based deep reinforcement learning also benefit from ViT architectures, as compared to standard convolutional neural network (CNN) architectures? To answer this question, we evaluate ViT training methods for image-based reinforcement learning (RL) control tasks and compare these results to a leading convolutional-network architecture method, RAD. For training the ViT encoder, we consider several recently-proposed self-supervised losses that are treated as auxiliary tasks, as well as a baseline with no additional loss terms. We find that the CNN architectures trained using RAD still generally provide superior performance. For the ViT methods, all three types of auxiliary tasks that we consider provide a benefit over plain ViT training. Furthermore, ViT reconstruction-based tasks are found to significantly outperform ViT contrastive-learning.
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2204.04905 Zobrazit plný text záznamu View this record from Arxiv