Zobrazeno 1 - 10
of 406
pro vyhledávání: '"Daly, Elizabeth"'
Large language models (LLMs) offer powerful capabilities but also introduce significant risks. One way to mitigate these risks is through comprehensive pre-deployment evaluations using benchmarks designed to test for specific vulnerabilities. However
Externí odkaz:
http://arxiv.org/abs/2410.12974
Autor:
Wagner, Nico, Desmond, Michael, Nair, Rahul, Ashktorab, Zahra, Daly, Elizabeth M., Pan, Qian, Cooper, Martín Santillán, Johnson, James M., Geyer, Werner
LLM-as-a-Judge is a widely used method for evaluating the performance of Large Language Models (LLMs) across various tasks. We address the challenge of quantifying the uncertainty of LLM-as-a-Judge evaluations. While uncertainty quantification has be
Externí odkaz:
http://arxiv.org/abs/2410.11594
Autor:
Ashktorab, Zahra, Desmond, Michael, Pan, Qian, Johnson, James M., Cooper, Martin Santillan, Daly, Elizabeth M., Nair, Rahul, Pedapati, Tejaswini, Achintalwar, Swapnaja, Geyer, Werner
Evaluation of large language model (LLM) outputs requires users to make critical judgments about the best outputs across various configurations. This process is costly and takes time given the large amounts of data. LLMs are increasingly used as eval
Externí odkaz:
http://arxiv.org/abs/2410.00873
Autor:
Rawat, Ambrish, Schoepf, Stefan, Zizzo, Giulio, Cornacchia, Giandomenico, Hameed, Muhammad Zaid, Fraser, Kieran, Miehling, Erik, Buesser, Beat, Daly, Elizabeth M., Purcell, Mark, Sattigeri, Prasanna, Chen, Pin-Yu, Varshney, Kush R.
As generative AI, particularly large language models (LLMs), become increasingly integrated into production applications, new attack surfaces and vulnerabilities emerge and put a focus on adversarial threats in natural language and multi-modal system
Externí odkaz:
http://arxiv.org/abs/2409.15398
Autor:
Pan, Qian, Ashktorab, Zahra, Desmond, Michael, Cooper, Martin Santillan, Johnson, James, Nair, Rahul, Daly, Elizabeth, Geyer, Werner
Traditional reference-based metrics, such as BLEU and ROUGE, are less effective for assessing outputs from Large Language Models (LLMs) that produce highly creative or superior-quality text, or in situations where reference outputs are unavailable. W
Externí odkaz:
http://arxiv.org/abs/2407.03479
Autor:
Hou, Yufang, Pascale, Alessandra, Carnerero-Cano, Javier, Tchrakian, Tigran, Marinescu, Radu, Daly, Elizabeth, Padhi, Inkit, Sattigeri, Prasanna
Retrieval-augmented generation (RAG) has emerged as a promising solution to mitigate the limitations of large language models (LLMs), such as hallucinations and outdated information. However, it remains unclear how LLMs handle knowledge conflicts ari
Externí odkaz:
http://arxiv.org/abs/2406.13805
Autor:
Miehling, Erik, Nagireddy, Manish, Sattigeri, Prasanna, Daly, Elizabeth M., Piorkowski, David, Richards, John T.
Modern language models, while sophisticated, exhibit some inherent shortcomings, particularly in conversational settings. We claim that many of the observed shortcomings can be attributed to violation of one or more conversational principles. By draw
Externí odkaz:
http://arxiv.org/abs/2403.15115
Autor:
Achintalwar, Swapnaja, Garcia, Adriana Alvarado, Anaby-Tavor, Ateret, Baldini, Ioana, Berger, Sara E., Bhattacharjee, Bishwaranjan, Bouneffouf, Djallel, Chaudhury, Subhajit, Chen, Pin-Yu, Chiazor, Lamogha, Daly, Elizabeth M., DB, Kirushikesh, de Paula, Rogério Abreu, Dognin, Pierre, Farchi, Eitan, Ghosh, Soumya, Hind, Michael, Horesh, Raya, Kour, George, Lee, Ja Young, Madaan, Nishtha, Mehta, Sameep, Miehling, Erik, Murugesan, Keerthiram, Nagireddy, Manish, Padhi, Inkit, Piorkowski, David, Rawat, Ambrish, Raz, Orna, Sattigeri, Prasanna, Strobelt, Hendrik, Swaminathan, Sarathkrishna, Tillmann, Christoph, Trivedi, Aashka, Varshney, Kush R., Wei, Dennis, Witherspooon, Shalisha, Zalmanovici, Marcel
Large language models (LLMs) are susceptible to a variety of risks, from non-faithful output to biased and toxic generations. Due to several limiting factors surrounding LLMs (training cost, API access, data availability, etc.), it may not always be
Externí odkaz:
http://arxiv.org/abs/2403.06009
Autor:
Dhurandhar, Amit, Nair, Rahul, Singh, Moninder, Daly, Elizabeth, Ramamurthy, Karthikeyan Natesan
Evaluation and ranking of large language models (LLMs) has become an important problem with the proliferation of these models and their impact. Evaluation methods either require human responses which are expensive to acquire or use pairs of LLMs to e
Externí odkaz:
http://arxiv.org/abs/2402.14860
In machine learning systems, bias mitigation approaches aim to make outcomes fairer across privileged and unprivileged groups. Bias mitigation methods work in different ways and have known "waterfall" effects, e.g., mitigating bias at one place may m
Externí odkaz:
http://arxiv.org/abs/2312.00765