A Robust and Automated Approach for Multilingual Indian Document Indexing

Autor: Parag S. Deshpande, Mayank Thakur, Meera Dhabu, Parnika Paranjape, Nitesh Funde
Rok vydání: 2019
Předmět:
Zdroj: 2019 5th International Conference on Advanced Computing & Communication Systems (ICACCS).
Popis: Currently, several Indian government offices lack a robust software for searching words from the scanned multilingual Indian documents. Manually searching such documents is tedious and time-consuming. Moreover, there will be a large number of such documents to be searched for the desired contents. Thus, there is a pressing need for robust automatic search software for multilingual Indian aged documents, where there is no single robust Optical Character Recognition (OCR) system existing to recognize the complex Indian scripts. Towards this end, we propose to group the components belonging to a text line of a document with multiple orientations using a new geometrical approach and an extended profile feature extraction technique for character recognition of printed Indian documents. The performance of the proposed approach is evaluated on variety of Indian documents with English characters and Devanagari scripts. Experimental results suggests that the proposed approach generates the accurate index words for most of the document images used in this study. Moreover, the proposed technique saves both time and efforts compared with the manual indexing of document images.
Databáze: OpenAIRE