Autor: |
Yi, Xiu-Long, Hua, Rong, Fu, You, Zheng, Du-Lei, Wang, Zhi-Yu |
Předmět: |
|
Zdroj: |
Soft Computing - A Fusion of Foundations, Methodologies & Applications; Feb2022, Vol. 26 Issue 4, p1501-1507, 7p |
Abstrakt: |
As cross-domain research combining computer vision and natural language processing, the current image captioning research mainly considers how to improve the visual features; less attention has been paid to utilizing the inherent properties of language to boost captioning performance. Facing this challenge, we proposed a textual attention mechanism, which can obtain semantic relevance between words by scanning all generated words. The retrospect network for image captioning (RNIC) proposed in this paper aims to improve input and prediction process by using textual attention. Concretely, the textual attention mechanism is applied to the model simultaneously with the visual attention mechanism to provide the input of the model with the maximum information required for generating captions. In this way, our model can learn to collaboratively attend on both visual and textual features. Moreover, the semantic relevance between words obtained by retrospect is used as the basis for prediction, so that the decoder can simulate the human language system and better make predictions based on the already generated contents. We evaluate the effectiveness of our model on the COCO image captioning datasets and achieve superior performance over the previous methods. [ABSTRACT FROM AUTHOR] |
Databáze: |
Complementary Index |
Externí odkaz: |
|