Autor: |
Sriharsha, A. V., Bhavana, Mekala, Tejaswee, Syamala, Ahmed, Mynasaheb Bushra, Jaipal Reddy, Peddi Reddi |
Předmět: |
|
Zdroj: |
Grenze International Journal of Engineering & Technology (GIJET); Jun2024, Vol. 10 Issue 2, Part 2, p1842-1848, 7p |
Abstrakt: |
Optical character recognition (OCR) extracts text from images while large language models (LLMs) understand and generate human-like text. OCR engines like EasyOCR transform document images into machine-readable text. But this raw extracted text contains artifacts. LLMs like GPT-3 require clean embedding vectors instead of text. We propose an integrated pipeline combining EasyOCR and GPT-3 for enhanced text comprehension from images. EasyOCR optically recognizes text from document images. The extracted text is cleaned and converted into embeddings that are fed to GPT-3. GPT-3's deep language model then provides contextual understanding of the extracted text. This enables interpreting complex concepts, resolving ambiguities, summarizing key ideas, and generating natural language descriptions. Our pipeline establishes a paradigm for augmenting text understanding through synergistic combination of OCR and LLMs. It has diverse applications in document analysis, information retrieval, question answering, and other NLP tasks involving documents. The integrated model outperforms previous approaches on benchmarks for comprehension of extracted text. It demonstrates the benefits of complementing optical text extraction with LLMs' innate language abilities. This paves the way for advanced OCR systems that not just read but also understand text in documents. [ABSTRACT FROM AUTHOR] |
Databáze: |
Complementary Index |
Externí odkaz: |
|