AN ALGORITHM FOR MATCHING OCR-GENERATED TEXT STRINGS

Autor:	Junichi Kanai, Stephen V. Rice, Thomas A. Nartker
Rok vydání:	1994
Předmět:	Matching (graph theory) business.industry String (computer science) Image processing Optical character recognition String searching algorithm computer.software_genre Binary logarithm Substring Image (mathematics) Artificial Intelligence Computer Vision and Pattern Recognition Artificial intelligence business Algorithm computer Software Mathematics
Zdroj:	Document Image Analysis
ISSN:	1793-6381 0218-0014
DOI:	10.1142/s0218001494000632
Popis:	When optical character recognition (OCR) devices process the same page image, they generate similar text strings. Differences are due to recognition errors. A page of text rarely contains long repeated substrings; therefore, N strings generated by OCR devices can be quickly matched by detecting long common substrings. An algorithm for matching an arbitrary number of strings based on this principle is presented. Although its worst-case performance is O(Nn2), its performance in practice has been observed to be O(Nn log n), where n is the length of a string. This algorithm has been successfully used to study OCR errors, to determine the accuracy of OCR devices, and to implement a voting algorithm.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::45feb83ea9677a22b0ac6d5d45ce7564 https://doi.org/10.1142/s0218001494000632 Zobrazit plný text záznamu