Cross-media retrieval via fusing multi-modality and multi-grained data.

Autor: Liu, Z., Yuan, S., Pei, X., Gao, S., Han, H.
Předmět:
Zdroj: Scientia Iranica. Transaction D, Computer Science & Engineering & Electrical Engineering; Sep/Oct2023, Vol. 30 Issue 5, p1645-1669, 25p
Abstrakt: Traditional cross-media retrieval methods mainly focus on coarse-grained data that reflect global characteristics, while ignoring the fine-grained descriptions of local details. Meanwhile, traditional methods cannot accurately describe the correlations between the anchor and the irrelevant data. To solve the problems mentioned above, this paper proposes to fuse coarse-grained and fine-grained features and a multi-margin triplet loss on the basis of a dual-framework. 1) Framework I: a multi grained data fusion framework based on Deep Belief Network, and 2) Framework II: a multi-modality data fusion framework based on the multi-margin triplet loss function. In Framework I, the coarse grained and fine-grained features fused by the joint Restricted Boltzmann Machine are input into Framework II. In Framework II, we innovatively propose the multi-margin triplet loss. The data, which belong to different modalities and semantic categories, are stepped away from the anchor in a multi-margin way. Experimental results show that the proposed method achieves better cross-media retrieval performance than other methods with different datasets. Furthermore, the ablation experiments verify that our proposed multi-grained fusion strategy and the multi-margin triplet loss function are effective. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index