Výsledky vyhledávání - "Cooper, Martín Santillán"

Report

Black-box Uncertainty Quantification Method for LLM-as-a-Judge

Autor: Wagner, Nico, Desmond, Michael, Nair, Rahul, Ashktorab, Zahra, Daly, Elizabeth M., Pan, Qian, Cooper, Martín Santillán, Johnson, James M., Geyer, Werner

LLM-as-a-Judge is a widely used method for evaluating the performance of Large Language Models (LLMs) across various tasks. We address the challenge of quantifying the uncertainty of LLM-as-a-Judge evaluations. While uncertainty quantification has be

Externí odkaz: http://arxiv.org/abs/2410.11594

Zobrazit plný text záznamu

Report

Aligning Human and LLM Judgments: Insights from EvalAssist on Task-Specific Evaluations and AI-assisted Assessment Strategy Preferences

Autor: Ashktorab, Zahra, Desmond, Michael, Pan, Qian, Johnson, James M., Cooper, Martin Santillan, Daly, Elizabeth M., Nair, Rahul, Pedapati, Tejaswini, Achintalwar, Swapnaja, Geyer, Werner

Evaluation of large language model (LLM) outputs requires users to make critical judgments about the best outputs across various configurations. This process is costly and takes time given the large amounts of data. LLMs are increasingly used as eval

Externí odkaz: http://arxiv.org/abs/2410.00873

Zobrazit plný text záznamu

Report

Human-Centered Design Recommendations for LLM-as-a-Judge

Autor: Pan, Qian, Ashktorab, Zahra, Desmond, Michael, Cooper, Martin Santillan, Johnson, James, Nair, Rahul, Daly, Elizabeth, Geyer, Werner

Traditional reference-based metrics, such as BLEU and ROUGE, are less effective for assessing outputs from Large Language Models (LLMs) that produce highly creative or superior-quality text, or in situations where reference outputs are unavailable. W

Externí odkaz: http://arxiv.org/abs/2407.03479

Zobrazit plný text záznamu

Report

Label Sleuth: From Unlabeled Text to a Classifier in a Few Hours

Text classification can be useful in many real-world scenarios, saving a lot of time for end users. However, building a custom classifier typically requires coding skills and ML knowledge, which poses a significant barrier for many potential users. T

Externí odkaz: http://arxiv.org/abs/2208.01483

Zobrazit plný text záznamu

Akademický článek

Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.

Vyhledávací nástroje:

Upřesnit hledání