Assessing Large Language Models Used for Extracting Table Information from Annual Financial Reports

Autor: David Balsiger, Hans-Rudolf Dimmler, Samuel Egger-Horstmann, Thomas Hanne
Jazyk: angličtina
Rok vydání: 2024
Předmět:
Zdroj: Computers, Vol 13, Iss 10, p 257 (2024)
Druh dokumentu: article
ISSN: 2073-431X
DOI: 10.3390/computers13100257
Popis: The extraction of data from tables in PDF documents has been a longstanding challenge in the field of data processing and analysis. While traditional methods have been explored in depth, the rise of Large Language Models (LLMs) offers new possibilities. This article addresses the knowledge gaps regarding LLMs, specifically ChatGPT-4 and BARD, for extracting and interpreting data from financial tables in PDF format. This research is motivated by the real-world need to efficiently gather and analyze corporate financial information. The hypothesis is that LLMs—in this case, ChatGPT-4 and BARD—can accurately extract key financial data, such as balance sheets and income statements. The methodology involves selecting representative pages from 46 annual reports of large Swiss corporations listed in the SMI Expanded Index from 2022 and copy–pasting text from these into LLMs. Eight analytical questions were posed to the LLMs, and their responses were assessed for accuracy and for identifying potential error sources in data extraction. The findings revealed significant variance in the performance of ChatGPT-4 and another LLM, BARD, with ChatGPT-4 generally exhibiting superior accuracy. This research contributes to understanding the capabilities and limitations of LLMs in processing and interpreting complex financial data from corporate documents.
Databáze: Directory of Open Access Journals