AN ALGORITHM FOR MATCHING OCR-GENERATED TEXT STRINGS
Autor: | Junichi Kanai, Stephen V. Rice, Thomas A. Nartker |
---|---|
Rok vydání: | 1994 |
Předmět: |
Matching (graph theory)
business.industry String (computer science) Image processing Optical character recognition String searching algorithm computer.software_genre Binary logarithm Substring Image (mathematics) Artificial Intelligence Computer Vision and Pattern Recognition Artificial intelligence business Algorithm computer Software Mathematics |
Zdroj: | Document Image Analysis |
ISSN: | 1793-6381 0218-0014 |
DOI: | 10.1142/s0218001494000632 |
Popis: | When optical character recognition (OCR) devices process the same page image, they generate similar text strings. Differences are due to recognition errors. A page of text rarely contains long repeated substrings; therefore, N strings generated by OCR devices can be quickly matched by detecting long common substrings. An algorithm for matching an arbitrary number of strings based on this principle is presented. Although its worst-case performance is O(Nn2), its performance in practice has been observed to be O(Nn log n), where n is the length of a string. This algorithm has been successfully used to study OCR errors, to determine the accuracy of OCR devices, and to implement a voting algorithm. |
Databáze: | OpenAIRE |
Externí odkaz: |