A Comparison of Recognition Strategies for Printed/Handwritten Composite Documents

Autor: Christopher Kermorvant, Ronaldo Messina, Bastien Moysset
Rok vydání: 2014
Předmět:
Zdroj: ICFHR
DOI: 10.1109/icfhr.2014.34
Popis: Full-page segmentation and recognition of real-world documents is a challenging task, involving the segmentation of the images (graphics, text) and the subsequent recognition of the detected text-zones. Often those documents present zones with both write-types: printed and handwritten, which so far have been dealt with by classifying the zones according to the write-type and then using type-specific models for recognition. Here we present two recognition systems using state-of-the-art recurrent neural networks, that can recognize the text in zones with both write-types, without the need of explicit type identification, just the segmentation in lines is needed. In one of the systems, there is no distinction on the type at the network's output (one output label per character) while in the other there is one output label for each character and write-type. Experiments have been done on real-world documents from the Maurdor competition. These two systems perform at a similar level than systems using specific networks per type on the constrained task where there is only one write-type per zone. They perform better when both handwritten and printed text are present in the text zone. The results open the perspective to treat OCR and handwritten text recognition with a single optical model.
Databáze: OpenAIRE