Zobrazeno 1 - 10
of 20 862
pro vyhledávání: '"human evaluation"'
Autor:
Belz, Anya, Thomson, Craig
This paper presents version 3.0 of the Human Evaluation Datasheet (HEDS). This update is the result of our experience using HEDS in the context of numerous recent human evaluation experiments, including reproduction studies, and of feedback received.
Externí odkaz:
http://arxiv.org/abs/2412.07940
Autor:
Kroll, Margaret, Kraus, Kelsey
Publikováno v:
Proc. Interspeech 2024, 1935-1939 (2024)
The emergence of powerful LLMs has led to a paradigm shift in abstractive summarization of spoken documents. The properties that make LLMs so valuable for this task -- creativity, ability to produce fluent speech, and ability to abstract information
Externí odkaz:
http://arxiv.org/abs/2410.18218
Autor:
Sarmah, Bhaskarjit, Dutta, Kriti, Grigoryan, Anna, Tiwari, Sachin, Pasquali, Stefano, Mehta, Dhagash
We argue that the Declarative Self-improving Python (DSPy) optimizers are a way to align the large language model (LLM) prompts and their evaluations to the human annotations. We present a comparative analysis of five teleprompter algorithms, namely,
Externí odkaz:
http://arxiv.org/abs/2412.15298
Open community-driven platforms like Chatbot Arena that collect user preference data from site visitors have gained a reputation as one of the most trustworthy publicly available benchmarks for LLM performance. While now standard, it is tricky to imp
Externí odkaz:
http://arxiv.org/abs/2412.04363
Procedural Knowledge is the know-how expressed in the form of sequences of steps needed to perform some tasks. Procedures are usually described by means of natural language texts, such as recipes or maintenance manuals, possibly spread across differe
Externí odkaz:
http://arxiv.org/abs/2412.03589
Publikováno v:
AHRI 2024, Sep 2024, Glasgow, United Kingdom
Conversational systems are now capable of producing impressive and generally relevant responses. However, we have no visibility nor control of the socio-emotional strategies behind state-of-the-art Large Language Models (LLMs), which poses a problem
Externí odkaz:
http://arxiv.org/abs/2412.04492
Images are capable of conveying emotions, but emotional experience is highly subjective. Advances in artificial intelligence have enabled the generation of images based on emotional descriptions. However, the level of agreement between the generative
Externí odkaz:
http://arxiv.org/abs/2410.08332
This paper introduces LalaEval, a holistic framework designed for the human evaluation of domain-specific large language models (LLMs). LalaEval proposes a comprehensive suite of end-to-end protocols that cover five main components including domain s
Externí odkaz:
http://arxiv.org/abs/2408.13338
In this position paper, we argue that human evaluation of generative large language models (LLMs) should be a multidisciplinary undertaking that draws upon insights from disciplines such as user experience research and human behavioral psychology to
Externí odkaz:
http://arxiv.org/abs/2405.18638
Selecting an automatic metric that best emulates human annotators is often non-trivial, because there is no clear definition of "best emulates." A meta-metric is required to compare the human judgments to the automatic metric scores, and metric ranki
Externí odkaz:
http://arxiv.org/abs/2409.09598