Popis: |
In recent years, text recognition has made significant progress with the booming development of deep learning techniques including pre-training schemes. However, for texts in different languages, previous pre-training methods treat them separately. In this paper, we propose a unified self-supervised pre-training method, PuzText, for understanding of textures and strokes which are consistent across different languages. By reconstructing permuted patches of text images, PuzText forces the model to learn about positional relationships and fine-grained details from different parts of input text images. Furthermore, the method is capable of reconstructing never-before-seen images of human languages, demonstrating its powerful generalization and application prospects. Besides, instead of multiplexed recognition heads on different languages, we propose a global Character-Aware Gate (CAG) to learn characters utilized in different languages. Fused with implicit character-aware representations, a unified language model will construct semantic information for different languages within specific characters. Experiments on several public benchmarks show that our method significantly outperforms previous approaches on end-to-end multilingual text recognition. |