Generation of Synthetic Data for Handwritten Word Alteration Detection

Autor: Prabhat Dansena, Soumen Bag, Rajarshi Pal
Jazyk: angličtina
Rok vydání: 2021
Předmět:
Zdroj: IEEE Access, Vol 9, Pp 38979-38990 (2021)
Druh dokumentu: article
ISSN: 2169-3536
DOI: 10.1109/ACCESS.2021.3059342
Popis: Fraudsters often alter handwritten contents in a document in order to achieve illicit purposes. At times, this may result in financial and mental loss to an individual or an organization. Hence, ink analysis is necessary to identify such an alteration. Convolution Neural Network (CNN) can be used to identify such cases of alteration, as CNN has emerged as a monumental success in the field of computer vision for varieties of classification tasks. But, CNN requires large amount of labeled data for training. Hence, there is a need to generate a large dataset for the experiments relating to handwritten word alteration detection. Collection, digitization, and cropping of a large number of altered and unaltered handwritten words are tedious and time consuming. To overcome such an issue, an approach for synthetic word data generation is presented in this paper for handwritten word alteration detection experiments. This scheme is designed in such a way that the synthetically generated words are very similar to the original ones. In order to achieve this, handwritten character data set is prepared using 10 blue and 10 black pens. These handwritten characters are used for creating synthetic word alteration data set. The presented approach uses relatively less number of handwritten character images to create a huge word alteration data set. Further, deep learning models are trained on the synthetically generated data set for word alteration detection.
Databáze: Directory of Open Access Journals