Zobrazeno 1 - 5
of 5
pro vyhledávání: '"Cooper, Martín Santillán"'
Autor:
Wagner, Nico, Desmond, Michael, Nair, Rahul, Ashktorab, Zahra, Daly, Elizabeth M., Pan, Qian, Cooper, Martín Santillán, Johnson, James M., Geyer, Werner
LLM-as-a-Judge is a widely used method for evaluating the performance of Large Language Models (LLMs) across various tasks. We address the challenge of quantifying the uncertainty of LLM-as-a-Judge evaluations. While uncertainty quantification has be
Externí odkaz:
http://arxiv.org/abs/2410.11594
Autor:
Ashktorab, Zahra, Desmond, Michael, Pan, Qian, Johnson, James M., Cooper, Martin Santillan, Daly, Elizabeth M., Nair, Rahul, Pedapati, Tejaswini, Achintalwar, Swapnaja, Geyer, Werner
Evaluation of large language model (LLM) outputs requires users to make critical judgments about the best outputs across various configurations. This process is costly and takes time given the large amounts of data. LLMs are increasingly used as eval
Externí odkaz:
http://arxiv.org/abs/2410.00873
Autor:
Pan, Qian, Ashktorab, Zahra, Desmond, Michael, Cooper, Martin Santillan, Johnson, James, Nair, Rahul, Daly, Elizabeth, Geyer, Werner
Traditional reference-based metrics, such as BLEU and ROUGE, are less effective for assessing outputs from Large Language Models (LLMs) that produce highly creative or superior-quality text, or in situations where reference outputs are unavailable. W
Externí odkaz:
http://arxiv.org/abs/2407.03479
Autor:
Shnarch, Eyal, Halfon, Alon, Gera, Ariel, Danilevsky, Marina, Katsis, Yannis, Choshen, Leshem, Cooper, Martin Santillan, Epelboim, Dina, Zhang, Zheng, Wang, Dakuo, Yip, Lucy, Ein-Dor, Liat, Dankin, Lena, Shnayderman, Ilya, Aharonov, Ranit, Li, Yunyao, Liberman, Naftali, Slesarev, Philip Levin, Newton, Gwilym, Ofek-Koifman, Shila, Slonim, Noam, Katz, Yoav
Text classification can be useful in many real-world scenarios, saving a lot of time for end users. However, building a custom classifier typically requires coding skills and ML knowledge, which poses a significant barrier for many potential users. T
Externí odkaz:
http://arxiv.org/abs/2208.01483
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.