Semi-automated workflow for recognition of printed documents with heterogeneous content

Autor: Alexandru Colesnicov, Ludmila Malahov, Svetlana Cojocaru, Lyudmila Burtseva
Jazyk: angličtina
Rok vydání: 2020
Předmět:
Zdroj: Computer Science Journal of Moldova, Vol 28, Iss 3(84), Pp 223-240 (2020)
Druh dokumentu: article
ISSN: 1561-4042
Popis: The paper discusses problems of heterogeneous texts digitization. The archives of scanned printed documents grow dramatically by results of projects concerning cultural heritage preserving. Manual annotations of scanned document images and per page screen reading make the usage of these archives difficult and, sometimes, impossible. Existing document processing systems cannot automatically display content correctly due to the presence of heterogeneous content. We proposed a Web platform to maximize the support of semi-automated work of all used tools for recognition of heterogeneous documents. Maximizing support means both creating the convenient ``single window'' access to all tools, and reducing the manual part of the process as much as possible. For implementation, the convergent technology is used, which assembles complex software systems from ready-made heterogeneous modules on a single platform.
Databáze: Directory of Open Access Journals