Cross-Modal Semantic Analysis by Tri-factorized Modular Hypergraph Autoencoder

Autor:	Shaily Malik, Poonam Bansal, Nishtha Jatana, Geetika Dhand, Kavita Sheoran
Rok vydání:	2023
DOI:	10.21203/rs.3.rs-2532846/v1
Popis:	The data from different sensors, cameras, and their text descriptions needs their features to be mapped into a common latent space with lower dimensions for image-to-text and text-to-image classifications. These low-dimensional features should incur maximum information with minimum losses. The cross-modal semantic autoencoder is proposed in this paper, which factorizes the features into a lower rank by nonnegative matrix factorization (NMF). The conventional NMF lacks to map the complete information into lower space due to two matrix factorization which is overcome by a novel tri-factor NMF with hypergraph regularization. A more information-rich modularity matrix is proposed in hypergraph regularization in place of the feature adjacency matrix. This tri-factorized hypergraph regularized multimodal autoencoder is tested on the Wiki dataset for the image-to-text and text-to-image conversion. This novel autoencoder is also supported by Multimodal Conditional Principal label space transformation (MCPLST) to reduce the dimension of the features. The proposed autoencoder observed a classification accuracy improvement of up to 1.8 % than the semantic autoencoder.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::44c1fa46dae3a489e22146eb1ebef393 https://doi.org/10.21203/rs.3.rs-2532846/v1 Zobrazit plný text záznamu