Zobrazeno 1 - 10
of 8 691
pro vyhledávání: '"meta-evaluation"'
Driven by the remarkable progress in diffusion models, text-to-image generation has made significant strides, creating a pressing demand for automatic quality evaluation of generated images. Current state-of-the-art automatic evaluation methods heavi
Externí odkaz:
http://arxiv.org/abs/2411.15488
Autor:
Son, Guijin, Yoon, Dongkeun, Suk, Juyoung, Aula-Blasco, Javier, Aslan, Mano, Kim, Vu Trong, Islam, Shayekh Bin, Prats-Cristià, Jaume, Tormo-Bañuelos, Lucía, Kim, Seungone
Large language models (LLMs) are commonly used as evaluators in tasks (e.g., reward modeling, LLM-as-a-judge), where they act as proxies for human preferences or judgments. This leads to the need for meta-evaluation: evaluating the credibility of LLM
Externí odkaz:
http://arxiv.org/abs/2410.17578
The correlation between NLG automatic evaluation metrics and human evaluation is often regarded as a critical criterion for assessing the capability of an evaluation metric. However, different grouping methods and correlation coefficients result in v
Externí odkaz:
http://arxiv.org/abs/2410.16834
Annually, at the Conference of Machine Translation (WMT), the Metrics Shared Task organizers conduct the meta-evaluation of Machine Translation (MT) metrics, ranking them according to their correlation with human judgments. Their results guide resear
Externí odkaz:
http://arxiv.org/abs/2408.13831
Metrics are the foundation for automatic evaluation in grammatical error correction (GEC), with their evaluation of the metrics (meta-evaluation) relying on their correlation with human judgments. However, conventional meta-evaluations in English GEC
Externí odkaz:
http://arxiv.org/abs/2403.02674
With the rising human-like precision of Large Language Models (LLMs) in numerous tasks, their utilization in a variety of real-world applications is becoming more prevalent. Several studies have shown that LLMs excel on many standard NLP benchmarks.
Externí odkaz:
http://arxiv.org/abs/2404.01667
Despite the utility of Large Language Models (LLMs) across a wide range of tasks and scenarios, developing a method for reliably evaluating LLMs across varied contexts continues to be challenging. Modern evaluation approaches often use LLMs to assess
Externí odkaz:
http://arxiv.org/abs/2401.16788
Autor:
Moghe, Nikita, Fazla, Arnisa, Amrhein, Chantal, Kocmi, Tom, Steedman, Mark, Birch, Alexandra, Sennrich, Rico, Guillou, Liane
Recent machine translation (MT) metrics calibrate their effectiveness by correlating with human judgement but without any insights about their behaviour across different error types. Challenge sets are used to probe specific dimensions of metric beha
Externí odkaz:
http://arxiv.org/abs/2401.16313
Autor:
Howes, Sophie K.1 (AUTHOR) sophie.k.howes2@gmail.com, van Burgel, Emma1,2 (AUTHOR), Cubillo, Beau1 (AUTHOR), Connally, Sarah3 (AUTHOR), Ferguson, Megan1,3,4 (AUTHOR), Brimblecombe, Julie1,4 (AUTHOR)
Publikováno v:
BMC Public Health. 9/16/2024, Vol. 24 Issue 1, p1-27. 27p.
Publikováno v:
BMC Medical Education, Vol 24, Iss 1, Pp 1-3 (2024)
Abstract We have recently published the experience of the accreditation body of undergraduate medical education in Iran on developing and validating standards based on the WFME framework (Gandomkar et al., BMC Med Educ 23:379, 2023). Agabagheri et al
Externí odkaz:
https://doaj.org/article/4fdd24dec296431aac4815ecc84402a2