Applying End-to-end Trainable Approach on Stroke Extraction in Handwritten Math Expressions Images
Autor: | Harold Mouchère, Elmokhtar Mohamed Moussa, Thibault Lelore |
---|---|
Přispěvatelé: | Laboratoire des Sciences du Numérique de Nantes (LS2N), IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-Université de Nantes - UFR des Sciences et des Techniques (UN UFR ST), Université de Nantes (UN)-Université de Nantes (UN)-École Centrale de Nantes (ECN)-Centre National de la Recherche Scientifique (CNRS), Image Perception Interaction (IPI), Université de Nantes (UN)-Université de Nantes (UN)-École Centrale de Nantes (ECN)-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique), MyScript SAS |
Jazyk: | angličtina |
Rok vydání: | 2021 |
Předmět: |
Computer science
ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION 02 engineering and technology [INFO.INFO-NE]Computer Science [cs]/Neural and Evolutionary Computing [cs.NE] 01 natural sciences Measure (mathematics) Skeletonization End-to-end principle [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] 0103 physical sciences 0202 electrical engineering electronic engineering information engineering Preprocessor 010306 general physics Artificial neural network Point (typography) Character (computing) business.industry 020207 software engineering Pattern recognition Stroke Extraction Handwritten Mathematical Expressions [INFO.INFO-TI]Computer Science [cs]/Image Processing [eess.IV] End-to-end Trainable System State (computer science) Artificial intelligence business |
Zdroj: | ICDAR 2021 : 16th International Conference on Document Analysis and Recognition ICDAR 2021 : 16th International Conference on Document Analysis and Recognition, Sep 2021, Lausanne, Switzerland. ⟨10.1007/978-3-030-86334-0_29⟩ Document Analysis and Recognition – ICDAR 2021 ISBN: 9783030863333 ICDAR (3) |
Popis: | International audience; In this paper, we propose a novel end-to-end system to extract strokes from offline math expressions. Using a multi-task neural network we simultaneously predict the location of the pen and the pen state. Our approach is based on a recent state-of-the-art image-to-sequence method limited to small fixed-sizes images. We generalize it to large and multi-symbol images without preprocessing steps such as skeletonization or binarization. This architecture allows an end-to-end training. A curriculum learning strategy have been used to address the complexity of the images. We achieve comparable results to the state of the art on the UNIPEN English character dataset considering the next point prediction. We propose a stroke level metrics that allows us to measure the stroke reconstruction. Experiments show the advantages and limitations of the adopted Image-to-Sequence method when scaling up to large and complex images such as math equations. |
Databáze: | OpenAIRE |
Externí odkaz: |