Word-Based Adaptive OCR for Historical Books
Autor: | Eugene Walach, Vladimir Kluzner, Yuval Shimony, Asaf Tzadok, Apostolos Antonacopoulos |
---|---|
Rok vydání: | 2009 |
Předmět: |
Computer science
business.industry Speech recognition ComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISION Optical flow 02 engineering and technology Optical character recognition Document processing computer.software_genre Class (biology) Text mining 020204 information systems Distortion 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Artificial intelligence business Cluster analysis computer Word (computer architecture) Natural language processing |
Zdroj: | ICDAR |
DOI: | 10.1109/icdar.2009.133 |
Popis: | The aim of this work is to propose a new approach to the recognition of historical texts by providing an adaptive mechanism that automatically tunes itself to a specific book. The system is based on clustering together all the similar words in a book/text and simultaneously handling entire class. The paper describes the architecture of such a system and new algorithms that have been developed for robust word image comparison (including registration, optical flow based distortion compensation, and adaptive binarization). Results for a large dataset are presented as well. Over 23% recognition improvement is demonstrated. |
Databáze: | OpenAIRE |
Externí odkaz: |