Perturbation Models for Generating Synthetic Training Data in Handwriting Recognition.

Autor: Kacprzyk, Janusz, Marinai, Simone, Fujisawa, Hiromichi, Varga, Tamás, Bunke, Horst
Zdroj: Machine Learning in Document Analysis & Recognition; 2008, p333-360, 28p
Abstrakt: In this chapter, the use of synthetic training data for handwriting recognition is studied. After an overview of the previous works related to the field, the authors' main results regarding this research area are presented and discussed, including a perturbation model for the generation of synthetic text lines from existing cursively handwritten lines of text produced by human writers. The goal of synthetic text line generation is to improve the performance of an off-line cursive handwriting recognition system by providing it with additional training data. It can be expected that by adding synthetic training data the variability of the training set improves, which leads to a higher recognition rate. On the other hand, synthetic training data may bias a recognizer towards unnatural handwriting styles, which could lead to a deterioration of the recognition rate. The proposed perturbation model is evaluated under several experimental conditions, and it is shown that significant improvement of the recognition performance is possible even when the original training set is large and the text lines are provided by a large number of different writers. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index