Learning to detect, localize and recognize many text objects in document images from few examples

Autor:	Christopher Kermorvant, Christian Wolf, Bastien Moysset
Přispěvatelé:	Extraction de Caractéristiques et Identification (imagine), Laboratoire d'InfoRmatique en Image et Systèmes d'information (LIRIS), Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Institut National des Sciences Appliquées (INSA)-Université de Lyon-Institut National des Sciences Appliquées (INSA)-Centre National de la Recherche Scientifique (CNRS)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-École Centrale de Lyon (ECL), Université de Lyon-Université Lumière - Lyon 2 (UL2)-Institut National des Sciences Appliquées de Lyon (INSA Lyon), Université de Lyon-Université Lumière - Lyon 2 (UL2), A2iA (A2iA), A2iA
Rok vydání:	2018
Předmět:	Computer science 02 engineering and technology 010501 environmental sciences 01 natural sciences Image (mathematics) Text line detection Position (vector) Minimum bounding box Document analysis 0202 electrical engineering electronic engineering information engineering 0105 earth and related environmental sciences Artificial neural network business.industry [INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV] Pattern recognition Object (computer science) Neural network Regression Object detection Computer Science Applications Variable (computer science) Local Pattern recognition (psychology) 020201 artificial intelligence & image processing Computer Vision and Pattern Recognition Artificial intelligence Recurrent business Software
Zdroj:	International Journal on Document Analysis and Recognition International Journal on Document Analysis and Recognition, Springer Verlag, In press, ⟨10.1007/s10032-018-0305-2⟩
ISSN:	1433-2825 1433-2833
DOI:	10.1007/s10032-018-0305-2
Popis:	International audience; The current trend in object detection and localization is to learn predictions with high capacity deep neural networks trained on a very large amount of annotated data and using a high amount of processing power. In this work, we particularly target the detection of text in document images and we propose a new neural model which directly predicts object coordinates. The particularity of our contribution lies in the local computations of predictions with a new form of local parameter sharing which keeps the overall amount of trainable parameters low. Key components of the model are spatial 2D-LSTM recurrent layers which convey contextual information between the regions of the image. We show that this model is more powerful than the state of the art in applications where training data are not as abundant as in the classical configuration of natural images and Imagenet/Pascal-VOC tasks. The proposed model also facilitates the detection of many objects in a single image and can deal with inputs of variable sizes without resizing. To enhance the localization precision of the coordinate regressor, we limit the amount of information produced by the local model components and propose two different regression strategies: (i) separately predict lower-left and upper-right corners of each object bounding box, followed by combinatorial pairing; (ii) only predict the left side of the objects and estimate the right position jointly with text recognition. These strategies lead to good full-page text recognition results in heterogeneous documents. Experiments have been performed on a document analysis task, the localization of the text lines in the Maurdor dataset.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::fa543f6bd41126a6edaa4eaa8c33bf11 https://doi.org/10.1007/s10032-018-0305-2 Zobrazit plný text záznamu Full text from SpringerLink