Remote sensing image caption generation via transformer and reinforcement learning.

Autor: Shen, Xiangqing, Liu, Bing, Zhou, Yong, Zhao, Jiaqi
Předmět:
Zdroj: Multimedia Tools & Applications; Sep2020, Vol. 79 Issue 35/36, p26661-26682, 22p
Abstrakt: Image captioning is a task generating the natural semantic description of the given image, which plays an essential role for machines to understand the content of the image. Remote sensing image captioning is a part of the field. Most of the current remote sensing image captioning models failed to fully utilize the semantic information in images and suffered the overfitting problem induced by the small size of the dataset. To this end, we propose a new model using the Transformer to decode the image features to target sentences. For making the Transformer more adaptive to the remote sensing image captioning task, we additionally employ dropout layers, residual connections, and adaptive feature fusion in the Transformer. Reinforcement Learning is then applied to enhance the quality of the generated sentences. We demonstrate the validity of our proposed model on three remote sensing image captioning datasets. Our model obtains all seven higher scores on the Sydney Dataset and Remote Sensing Image Caption Dataset (RSICD), four higher scores on UCM dataset, which indicates that the proposed methods perform better than the previous state of the art models in remote sensing image caption generation. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index