OpenFactCheck: A Unified Framework for Factuality Evaluation of LLMs

Autor:	Iqbal, Hasan, Wang, Yuxia, Wang, Minghan, Georgiev, Georgi, Geng, Jiahui, Gurevych, Iryna, Nakov, Preslav
Rok vydání:	2024
Předmět:	Computer Science - Computation and Language Computer Science - Artificial Intelligence I.2.7
Druh dokumentu:	Working Paper
Popis:	The increased use of large language models (LLMs) across a variety of real-world applications calls for automatic tools to check the factual accuracy of their outputs, as LLMs often hallucinate. This is difficult as it requires assessing the factuality of free-form open-domain responses. While there has been a lot of research on this topic, different papers use different evaluation benchmarks and measures, which makes them hard to compare and hampers future progress. To mitigate these issues, we developed OpenFactCheck, a unified framework, with three modules: (i) RESPONSEEVAL, which allows users to easily customize an automatic fact-checking system and to assess the factuality of all claims in an input document using that system, (ii) LLMEVAL, which assesses the overall factuality of an LLM, and (iii) CHECKEREVAL, a module to evaluate automatic fact-checking systems. OpenFactCheck is open-sourced (https://github.com/hasaniqbal777/openfactcheck) and publicly released as a Python library (https://pypi.org/project/openfactcheck/) and also as a web service (https://huggingface.co/spaces/hasaniqbal777/OpenFactCheck). A video describing the system is available at https://youtu.be/-i9VKL0HleI. Comment: 10 pages, 4 Figures, 3 Tables, Submitted to EMNLP 2024 System Demonstration. arXiv admin note: substantial text overlap with arXiv:2405.05583
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2408.11832 Zobrazit plný text záznamu View this record from Arxiv