Experiment study on utilizing convolutional neural networks to recognize historical Arabic handwritten text

Autor: Berat Kurar, Majeed Kassis, Reem Alaasam, Jihad El-Sana
Rok vydání: 2017
Předmět:
Zdroj: ASAR
DOI: 10.1109/asar.2017.8067773
Popis: Deep learning is a form of hierarchical learning, it consists of multiple layers of representations that gradually transform data into high level concepts. Deep learning has been providing the state of the art results for various computer vision problems. However, a typical deep leaning algorithm needs a large amount of data to train a deep model and guarantee the models ability to generalize. It is not easy to generate large labeled datasets and it is one of the main barriers to apply deep learning for many problems. Data augmentation schemes were introduced to overcome this limitation, by extending small available labeled datasets. In this work we experiment with extending a small labeled dataset of Arabic continuous subwords by an orders of magnitude. The labeled dataset, which consist of handwritten Arabic subwords is used to synthesize a large collection of labeled dataset. The synthesized subwords are based on one or multiple writing styles from the original labeled dataset. We also experiment with generating various printed forms of subwords. We include only Naskh font, as most of the Arabic historical manuscripts were written in this type of font. We train several convolutional neural networks using handwritten, printed and synthesized datasets and obtain encouraging results.
Databáze: OpenAIRE