A Pix2Pix Architecture for Complete Offline Handwritten Text Normalization.

Autor: Barreiro-Garrido A; Higher Technical School of Computer Engineering, Universidad Rey Juan Carlos, c/Tulipan sn, Mostoles, 28922 Madrid, Spain., Ruiz-Parrado V; Higher Technical School of Computer Engineering, Universidad Rey Juan Carlos, c/Tulipan sn, Mostoles, 28922 Madrid, Spain., Moreno AB; Higher Technical School of Computer Engineering, Universidad Rey Juan Carlos, c/Tulipan sn, Mostoles, 28922 Madrid, Spain., Velez JF; Higher Technical School of Computer Engineering, Universidad Rey Juan Carlos, c/Tulipan sn, Mostoles, 28922 Madrid, Spain.
Jazyk: angličtina
Zdroj: Sensors (Basel, Switzerland) [Sensors (Basel)] 2024 Jun 16; Vol. 24 (12). Date of Electronic Publication: 2024 Jun 16.
DOI: 10.3390/s24123892
Abstrakt: In the realm of offline handwritten text recognition, numerous normalization algorithms have been developed over the years to serve as preprocessing steps prior to applying automatic recognition models to handwritten text scanned images. These algorithms have demonstrated effectiveness in enhancing the overall performance of recognition architectures. However, many of these methods rely heavily on heuristic strategies that are not seamlessly integrated with the recognition architecture itself. This paper introduces the use of a Pix2Pix trainable model, a specific type of conditional generative adversarial network, as the method to normalize handwritten text images. Also, this algorithm can be seamlessly integrated as the initial stage of any deep learning architecture designed for handwritten recognition tasks. All of this facilitates training the normalization and recognition components as a unified whole, while still maintaining some interpretability of each module. Our proposed normalization approach learns from a blend of heuristic transformations applied to text images, aiming to mitigate the impact of intra-personal handwriting variability among different writers. As a result, it achieves slope and slant normalizations, alongside other conventional preprocessing objectives, such as normalizing the size of text ascenders and descenders. We will demonstrate that the proposed architecture replicates, and in certain cases surpasses, the results of a widely used heuristic algorithm across two metrics and when integrated as the first step of a deep recognition architecture.
Databáze: MEDLINE
Nepřihlášeným uživatelům se plný text nezobrazuje