Automatic generation of a custom corpora for invoice analysis and recognition

Autor: Abdel Belaïd, Yolande Belaïd, Jerome Blanchard
Přispěvatelé: Analyse et Traitement Informatique de la Langue Française (ATILF), Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Recognition of writing and analysis of documents (READ), Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)
Jazyk: angličtina
Rok vydání: 2019
Předmět:
Zdroj: ICDAR-WIADAR
ICDAR-WIADAR, Sep 2019, Sydney, Australia
WIADAR@ICDAR
Popis: International audience; In this paper, we present a bill-type document generator capable of supplying on demand all the mass of documents that a learning system needs. The lack of administrative documents has long been a handicap because of the confidentiality of this type of document. In addition, this generator allowed us to solve the problem of annotations since they are done automatically during the generation and put directly in XML-GEDI form. Then, to show the interest of the generator, we proposed a system of invoice recognition based on graph convolutional neural network. The experiments took place in excellent conditions since we had all the possibilities to vary the classes, the samples in the classes, and their parameters.
Databáze: OpenAIRE