Applying End-to-end Trainable Approach on Stroke Extraction in Handwritten Math Expressions Images

Autor: Harold Mouchère, Elmokhtar Mohamed Moussa, Thibault Lelore
Přispěvatelé: Laboratoire des Sciences du Numérique de Nantes (LS2N), IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-Université de Nantes - UFR des Sciences et des Techniques (UN UFR ST), Université de Nantes (UN)-Université de Nantes (UN)-École Centrale de Nantes (ECN)-Centre National de la Recherche Scientifique (CNRS), Image Perception Interaction (IPI), Université de Nantes (UN)-Université de Nantes (UN)-École Centrale de Nantes (ECN)-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique), MyScript SAS
Jazyk: angličtina
Rok vydání: 2021
Předmět:
Zdroj: ICDAR 2021 : 16th International Conference on Document Analysis and Recognition
ICDAR 2021 : 16th International Conference on Document Analysis and Recognition, Sep 2021, Lausanne, Switzerland. ⟨10.1007/978-3-030-86334-0_29⟩
Document Analysis and Recognition – ICDAR 2021 ISBN: 9783030863333
ICDAR (3)
Popis: International audience; In this paper, we propose a novel end-to-end system to extract strokes from offline math expressions. Using a multi-task neural network we simultaneously predict the location of the pen and the pen state. Our approach is based on a recent state-of-the-art image-to-sequence method limited to small fixed-sizes images. We generalize it to large and multi-symbol images without preprocessing steps such as skeletonization or binarization. This architecture allows an end-to-end training. A curriculum learning strategy have been used to address the complexity of the images. We achieve comparable results to the state of the art on the UNIPEN English character dataset considering the next point prediction. We propose a stroke level metrics that allows us to measure the stroke reconstruction. Experiments show the advantages and limitations of the adopted Image-to-Sequence method when scaling up to large and complex images such as math equations.
Databáze: OpenAIRE