CFB:Financial Large Models Evaluation Methods

Autor: LI Yi, LI Hao, XU Xiaozhe, YANG Yifan
Jazyk: čínština
Rok vydání: 2024
Předmět:
Zdroj: Jisuanji kexue yu tansuo, Vol 18, Iss 12, Pp 3272-3287 (2024)
Druh dokumentu: article
ISSN: 1673-9418
DOI: 10.3778/j.issn.1673-9418.2406055
Popis: As the potential applications of large language models (LLMs) in the financial sector continue to emerge, evaluating the performance of financial LLMs becomes increasingly important. However, current financial evaluation methods face limitations such as singular evaluation tasks, insufficient coverage of evaluation datasets, and contamination of benchmark data. Consequently, the potential of LLMs in the financial domain has not been fully explored. To address these issues, this paper proposes the Chinese financial benchmark (CFB) for evaluating financial LLMs. The CFB encompasses 36 datasets, covers 24 financial tasks, and involves 7 evaluation tasks: question answering, terminology explanation, text generation, text translation, classification task, voice recognition, and predictive decision. It also establishes corresponding benchmarks. The new approach of the CFB includes a broader range of tasks and data, the introduction of a benchmark decontamination method based on LLMs, and three evaluation methods: instruction fine-tuning, knowledge retrieval enhancement, and prompt engineering. The evaluation of 12 LLMs, including GPT-4o, ChatGPT, and Gemini, reveals that though LLMs excel in information extraction and text analysis, they struggle with advanced reasoning and complex tasks. GPT-4o performs exceptionally in information extraction and stock trading, whereas Gemini excels in text generation and prediction. Instruction fine-tuning improves LLMs’ performance in text analysis but offers limited benefits for complex tasks.
Databáze: Directory of Open Access Journals