Categorizing Document Images into Script and Language Classes

Autor:	Sabine Bergler, A. Bloch, Nicola Nobile, C. P. Nadal, B. Waked, Ching Y. Suen
Rok vydání:	1999
Předmět:	Language identification Computer science business.industry Skew Optical character recognition computer.software_genre Document processing Index (publishing) Preprocessor Shape coding Segmentation Artificial intelligence business computer Natural language processing
Zdroj:	International Conference on Advances in Pattern Recognition ISBN: 9781447112143
DOI:	10.1007/978-1-4471-0833-7_30
Popis:	In order to properly archive and index large numbers of international documents, several challenging processing steps must be completed even before optical character recognition (OCR) can be applied. We present a system that preclassifies documents for further processing and OCR. The system operates in four phases: preprocessing (including skew detection, segmentation, and noise removal), script (Latin, Arabic, Ideographic, or Cyrillic) classification, shape coding, and language classification for seven European languages.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::0010a13f5e34e65b4543dddf8eb6cb4c https://doi.org/10.1007/978-1-4471-0833-7_30 Zobrazit plný text záznamu