An Augmentation Strategy for Visually Rich Documents
Autor: | Xie, Jing, Wendt, James B., Zhou, Yichao, Ebner, Seth, Tata, Sandeep |
---|---|
Rok vydání: | 2022 |
Předmět: | |
Druh dokumentu: | Working Paper |
Popis: | Many business workflows require extracting important fields from form-like documents (e.g. bank statements, bills of lading, purchase orders, etc.). Recent techniques for automating this task work well only when trained with large datasets. In this work we propose a novel data augmentation technique to improve performance when training data is scarce, e.g. 10-250 documents. Our technique, which we call FieldSwap, works by swapping out the key phrases of a source field with the key phrases of a target field to generate new synthetic examples of the target field for use in training. We demonstrate that this approach can yield 1-7 F1 point improvements in extraction performance. Comment: 9 pages, 6 figures, 3 tables |
Databáze: | arXiv |
Externí odkaz: |