Autor: |
Shaily Malik, Poonam Bansal, Nishtha Jatana, Geetika Dhand, Kavita Sheoran |
Rok vydání: |
2023 |
DOI: |
10.21203/rs.3.rs-2532846/v1 |
Popis: |
The data from different sensors, cameras, and their text descriptions needs their features to be mapped into a common latent space with lower dimensions for image-to-text and text-to-image classifications. These low-dimensional features should incur maximum information with minimum losses. The cross-modal semantic autoencoder is proposed in this paper, which factorizes the features into a lower rank by nonnegative matrix factorization (NMF). The conventional NMF lacks to map the complete information into lower space due to two matrix factorization which is overcome by a novel tri-factor NMF with hypergraph regularization. A more information-rich modularity matrix is proposed in hypergraph regularization in place of the feature adjacency matrix. This tri-factorized hypergraph regularized multimodal autoencoder is tested on the Wiki dataset for the image-to-text and text-to-image conversion. This novel autoencoder is also supported by Multimodal Conditional Principal label space transformation (MCPLST) to reduce the dimension of the features. The proposed autoencoder observed a classification accuracy improvement of up to 1.8 % than the semantic autoencoder. |
Databáze: |
OpenAIRE |
Externí odkaz: |
|