The BBN Byblos Hindi OCR system

Autor:	Prem Natarajan, Michael Decerbo, Ehry MacRostie
Rok vydání:	2005
Předmět:	Hindi Computer science Character (computing) Speech recognition Word error rate Optical character recognition computer.software_genre language.human_language Test set Devanagari ComputingMethodologies_DOCUMENTANDTEXTPROCESSING language Pashto Hidden Markov model computer
Zdroj:	DRR
ISSN:	0277-786X
DOI:	10.1117/12.588810
Popis:	The BBN Byblos OCR system implements a script-independent methodology for OCR using Hidden Markov Models (HMMs). We have successfully ported the system to Arabic, English, Chinese, Pashto, and Japanese. In this paper, we report on our recent effort in training the system to perform recognition of Hindi (Devanagari) documents. The initial experiments reported in this paper were performed using a corpus of synthetic (computer-generated) document images along with slightly degraded versions of the same that were generated by scanning printed versions of the document images and by scanning faxes of the printed versions. On a fair test set consisting of synthetic images alone we measured a character error rate of 1.0%. The character error rate on a fair test set consisting of scanned images (scans of printed versions of the synthetic images) was 1.40% while the character error rate on a fair test set of fax images (scans of printed and faxed versions of the synthetic images) was 8.7%.© (2005) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::5af5c6c44d84eaf8ac51038c02beaed5 https://doi.org/10.1117/12.588810 Zobrazit plný text záznamu