Zobrazeno 1 - 10
of 1 155
pro vyhledávání: '"Cooper Martin"'
Autor:
Wagner, Nico, Desmond, Michael, Nair, Rahul, Ashktorab, Zahra, Daly, Elizabeth M., Pan, Qian, Cooper, Martín Santillán, Johnson, James M., Geyer, Werner
LLM-as-a-Judge is a widely used method for evaluating the performance of Large Language Models (LLMs) across various tasks. We address the challenge of quantifying the uncertainty of LLM-as-a-Judge evaluations. While uncertainty quantification has be
Externí odkaz:
http://arxiv.org/abs/2410.11594
Autor:
Ashktorab, Zahra, Desmond, Michael, Pan, Qian, Johnson, James M., Cooper, Martin Santillan, Daly, Elizabeth M., Nair, Rahul, Pedapati, Tejaswini, Achintalwar, Swapnaja, Geyer, Werner
Evaluation of large language model (LLM) outputs requires users to make critical judgments about the best outputs across various configurations. This process is costly and takes time given the large amounts of data. LLMs are increasingly used as eval
Externí odkaz:
http://arxiv.org/abs/2410.00873
Autor:
Cooper, Martin, Amgoud, Leila
Abductive explanations (AXp's) are widely used for understanding decisions of classifiers. Existing definitions are suitable when features are independent. However, we show that ignoring constraints when they exist between features may lead to an exp
Externí odkaz:
http://arxiv.org/abs/2409.12154
Explaining decisions of black-box classifiers is both important and computationally challenging. In this paper, we scrutinize explainers that generate feature-based explanations from samples or datasets. We start by presenting a set of desirable prop
Externí odkaz:
http://arxiv.org/abs/2408.04903
History eXplanation based on Predicates (HXP), studies the behavior of a Reinforcement Learning (RL) agent in a sequence of agent's interactions with the environment (a history), through the prism of an arbitrary predicate. To this end, an action imp
Externí odkaz:
http://arxiv.org/abs/2408.02606
Autor:
Pan, Qian, Ashktorab, Zahra, Desmond, Michael, Cooper, Martin Santillan, Johnson, James, Nair, Rahul, Daly, Elizabeth, Geyer, Werner
Traditional reference-based metrics, such as BLEU and ROUGE, are less effective for assessing outputs from Large Language Models (LLMs) that produce highly creative or superior-quality text, or in situations where reference outputs are unavailable. W
Externí odkaz:
http://arxiv.org/abs/2407.03479
Determining whether two STRIPS planning instances are isomorphic is the simplest form of comparison between planning instances. It is also a particular case of the problem concerned with finding an isomorphism between a planning instance $P$ and a su
Externí odkaz:
http://arxiv.org/abs/2406.16555
Autor:
Izza, Yacine, Huang, Xuanxiang, Ignatiev, Alexey, Narodytska, Nina, Cooper, Martin C., Marques-Silva, Joao
The most widely studied explainable AI (XAI) approaches are unsound. This is the case with well-known model-agnostic explanation approaches, and it is also the case with approaches based on saliency maps. One solution is to consider intrinsic interpr
Externí odkaz:
http://arxiv.org/abs/2212.05990