Cross-Modal Semantic Analysis by Tri-factorized Modular Hypergraph Autoencoder

Autor: Shaily Malik, Poonam Bansal, Nishtha Jatana, Geetika Dhand, Kavita Sheoran
Rok vydání: 2023
DOI: 10.21203/rs.3.rs-2532846/v1
Popis: The data from different sensors, cameras, and their text descriptions needs their features to be mapped into a common latent space with lower dimensions for image-to-text and text-to-image classifications. These low-dimensional features should incur maximum information with minimum losses. The cross-modal semantic autoencoder is proposed in this paper, which factorizes the features into a lower rank by nonnegative matrix factorization (NMF). The conventional NMF lacks to map the complete information into lower space due to two matrix factorization which is overcome by a novel tri-factor NMF with hypergraph regularization. A more information-rich modularity matrix is proposed in hypergraph regularization in place of the feature adjacency matrix. This tri-factorized hypergraph regularized multimodal autoencoder is tested on the Wiki dataset for the image-to-text and text-to-image conversion. This novel autoencoder is also supported by Multimodal Conditional Principal label space transformation (MCPLST) to reduce the dimension of the features. The proposed autoencoder observed a classification accuracy improvement of up to 1.8 % than the semantic autoencoder.
Databáze: OpenAIRE