Representation Learning for Information Extraction from Form-like Documents
Autor: | Marc Najork, Navneet Potti, Qi Zhao, James B. Wendt, Bodhisattwa Prasad Majumder, Sandeep Tata |
---|---|
Rok vydání: | 2020 |
Předmět: |
business.industry
Computer science Representation (systemics) 020207 software engineering 02 engineering and technology computer.software_genre Task (project management) Information extraction 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Artificial intelligence business computer Feature learning Natural language processing |
Zdroj: | ACL |
DOI: | 10.18653/v1/2020.acl-main.580 |
Popis: | We propose a novel approach using representation learning for tackling the problem of extracting structured information from form-like document images. We propose an extraction system that uses knowledge of the types of the target fields to generate extraction candidates and a neural network architecture that learns a dense representation of each candidate based on neighboring words in the document. These learned representations are not only useful in solving the extraction task for unseen document templates from two different domains but are also interpretable, as we show using loss cases. |
Databáze: | OpenAIRE |
Externí odkaz: |