Applying End-to-end Trainable Approach on Stroke Extraction in Handwritten Math Expressions Images

Autor:	Harold Mouchère, Elmokhtar Mohamed Moussa, Thibault Lelore
Přispěvatelé:	Laboratoire des Sciences du Numérique de Nantes (LS2N), IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique), Institut Mines-Télécom [Paris] (IMT)-Institut Mines-Télécom [Paris] (IMT)-Université de Nantes - UFR des Sciences et des Techniques (UN UFR ST), Université de Nantes (UN)-Université de Nantes (UN)-École Centrale de Nantes (ECN)-Centre National de la Recherche Scientifique (CNRS), Image Perception Interaction (IPI), Université de Nantes (UN)-Université de Nantes (UN)-École Centrale de Nantes (ECN)-Centre National de la Recherche Scientifique (CNRS)-IMT Atlantique Bretagne-Pays de la Loire (IMT Atlantique), MyScript SAS
Jazyk:	angličtina
Rok vydání:	2021
Předmět:	Computer science ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION 02 engineering and technology [INFO.INFO-NE]Computer Science [cs]/Neural and Evolutionary Computing [cs.NE] 01 natural sciences Measure (mathematics) Skeletonization End-to-end principle [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] 0103 physical sciences 0202 electrical engineering electronic engineering information engineering Preprocessor 010306 general physics Artificial neural network Point (typography) Character (computing) business.industry 020207 software engineering Pattern recognition Stroke Extraction Handwritten Mathematical Expressions [INFO.INFO-TI]Computer Science [cs]/Image Processing [eess.IV] End-to-end Trainable System State (computer science) Artificial intelligence business
Zdroj:	ICDAR 2021 : 16th International Conference on Document Analysis and Recognition ICDAR 2021 : 16th International Conference on Document Analysis and Recognition, Sep 2021, Lausanne, Switzerland. ⟨10.1007/978-3-030-86334-0_29⟩ Document Analysis and Recognition – ICDAR 2021 ISBN: 9783030863333 ICDAR (3)
Popis:	International audience; In this paper, we propose a novel end-to-end system to extract strokes from offline math expressions. Using a multi-task neural network we simultaneously predict the location of the pen and the pen state. Our approach is based on a recent state-of-the-art image-to-sequence method limited to small fixed-sizes images. We generalize it to large and multi-symbol images without preprocessing steps such as skeletonization or binarization. This architecture allows an end-to-end training. A curriculum learning strategy have been used to address the complexity of the images. We achieve comparable results to the state of the art on the UNIPEN English character dataset considering the next point prediction. We propose a stroke level metrics that allows us to measure the stroke reconstruction. Experiments show the advantages and limitations of the adopted Image-to-Sequence method when scaling up to large and complex images such as math equations.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::6b3b719ff498fbb2757e60593a92c5dd https://hal.archives-ouvertes.fr/hal-03236506 Zobrazit plný text záznamu