Financial Table Extraction in Image Documents

Autor: Watson, William, Liu, Bo
Rok vydání: 2024
Předmět:
Druh dokumentu: Working Paper
DOI: 10.1145/3383455.3422520
Popis: Table extraction has long been a pervasive problem in financial services. This is more challenging in the image domain, where content is locked behind cumbersome pixel format. Luckily, advances in deep learning for image segmentation, OCR, and sequence modeling provides the necessary heavy lifting to achieve impressive results. This paper presents an end-to-end pipeline for identifying, extracting and transcribing tabular content in image documents, while retaining the original spatial relations with high fidelity.
Databáze: arXiv