Generation of Search-able PDF of the Chemical Equations segmented from Document Images
Autor: | Prerana Jana, Anubhab Majumdar, Sekhar Mandal, Bhabatosh Chanda |
---|---|
Rok vydání: | 2016 |
Předmět: |
Information retrieval
Computer science 020207 software engineering 02 engineering and technology computer.software_genre Chemical equation Contextual design Mathematical equations Pattern recognition (psychology) ComputingMethodologies_DOCUMENTANDTEXTPROCESSING 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Segmentation Data mining computer |
Zdroj: | DocEng |
DOI: | 10.1145/2960811.2960822 |
Popis: | PDF format of scanned document images is not searchable. OCR tries to remedy this adversity by converting document images into editable and searchable data, but it has its own limitations in presence of equations - both mathematical and chemical. OCR system for mathematical equation is already a major research area and has provided successful result. However, chemical equation segmentation has been a less ventured road. In this paper, we present a novel method for automated generation of searchable PDF format of segmented chemical equations from scanned document images by performing chemical symbol recognition and auto-correction of OCR output. We use existing OCR system, pattern recognition technique, contextual data analysis and a standard LaTeX package to generate the chemical equation in searchable PDF format. The effectiveness of the proposed method is verified through exhaustive testing on 234 document images. |
Databáze: | OpenAIRE |
Externí odkaz: |