Abstrakt: |
Fairly large number of recent studies are devoted to the analysis of data containing heterogeneous information. Multimodality is considered by scientists as a step towards artificial general intelligence. In this article, we study the problem of classifying images containing text inserts. At the same time, the results for joint classification by text and image outperform the image classification algorithm by about 5%, and the text classification algorithm by 8%. Moreover, the share of correct recognitions for the proposed model in the problem of partitioning into 3 classes is 86%. [ABSTRACT FROM AUTHOR] |