Learning text-line localization with shared and local regression neural networks
Autor: | Christian Wolf, Christopher Kermorvant, Bastien Moysset, Jérôme Louradour |
---|---|
Přispěvatelé: | Extraction de Caractéristiques et Identification (imagine), Laboratoire d'InfoRmatique en Image et Systèmes d'information (LIRIS), Institut National des Sciences Appliquées de Lyon (INSA Lyon), Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-École Centrale de Lyon (ECL), Université de Lyon-Université Lumière - Lyon 2 (UL2)-Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Université Lumière - Lyon 2 (UL2), A2iA (A2iA), A2iA, Wolf, Christian |
Jazyk: | angličtina |
Rok vydání: | 2016 |
Předmět: |
Computer science
Image processing 02 engineering and technology 010501 environmental sciences computer.software_genre 01 natural sciences Convolutional neural network [INFO.INFO-CV] Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV] 0202 electrical engineering electronic engineering information engineering 0105 earth and related environmental sciences Artificial neural network Pixel business.industry text detection Deep learning Local regression deep learning [INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV] Pattern recognition Image segmentation 020201 artificial intelligence & image processing Data mining Artificial intelligence Line (text file) business computer |
Zdroj: | International Conference on Frontiers in Handwriting Recognition International Conference on Frontiers in Handwriting Recognition, Oct 2016, Shenzhen, China ICFHR |
Popis: | International audience; Text line detection and localisation is a crucial step for full page document analysis, but still suffers from heterogeneity of real life documents. In this paper, we present a novel approach for text line localisation based on Convolutional Neural Networks and Multidimensional Long Short-Term Memory cells as a regressor in order to predict the coordinates of the text line bounding boxes directly from the pixel values. Targeting typically large images in document image analysis, we propose a new model using weight sharing over local blocks. We compare two strategies: directly predicting the four coordinates or predicting lower-left and upper-right points separately followed by matching. We evaluate our work on the highly unconstrained Maurdor dataset and show that our method outperforms both other machine learning and image processing methods. |
Databáze: | OpenAIRE |
Externí odkaz: |