Utilising OpenCV with Tesseract to extract Bill of Materials (BOM) from Isometric Drawings

Autor: Jack McShane, Stephen McClay, Kevin Meehan
Rok vydání: 2021
Předmět:
Zdroj: 2021 32nd Irish Signals and Systems Conference (ISSC).
DOI: 10.1109/issc52156.2021.9467854
Popis: Quality assurance is often a time-consuming and error prone process for organisations. However, it is increasingly important for companies that produce fabricated products for integration into safety critical environments. For example, creating pipe systems for the pharmaceutical industry will include additional risks. As a result, increased regulation is required, which has resulted in further paperwork and validation for companies operating in this sector.A lot of the isometric drawings provided to companies for fabrication remain in paper format (or scanned paper documents). This provides an administrative burden on these companies as the average project could generate up to 5,000 isometric drawings. This research explores techniques that could be utilised to automatically extract Bill of Materials (BOM) information from these isometric drawings.Tesseract has failed to perform OCR accurately on the extracted Region of Interest (ROI) data containing the BOM information, achieving a mean average of 43.8%. This paper explores different pre-processing techniques to increase the accuracy of recognition. Techniques such as binarisation, erosion, noise reduction and contouring were employed to increase this accuracy. In the study, the accuracy increased to a mean average of 81.2%. This has demonstrated that effective use of pre-processing can have an impact on character recognition.
Databáze: OpenAIRE