Zobrazeno 1 - 10
of 3 519
pro vyhledávání: '"AS, Chandu"'
Autor:
Rezaei, Keivan, Chandu, Khyathi, Feizi, Soheil, Choi, Yejin, Brahman, Faeze, Ravichander, Abhilasha
Large language models trained on web-scale corpora can memorize undesirable datapoints such as incorrect facts, copyrighted content or sensitive data. Recently, many machine unlearning methods have been proposed that aim to 'erase' these datapoints f
Externí odkaz:
http://arxiv.org/abs/2411.00204
Autor:
Lu, Ximing, Sclar, Melanie, Hallinan, Skyler, Mireshghallah, Niloofar, Liu, Jiacheng, Han, Seungju, Ettinger, Allyson, Jiang, Liwei, Chandu, Khyathi, Dziri, Nouha, Choi, Yejin
Creativity has long been considered one of the most difficult aspect of human intelligence for AI to mimic. However, the rise of Large Language Models (LLMs), like ChatGPT, has raised questions about whether AI can match or even surpass human creativ
Externí odkaz:
http://arxiv.org/abs/2410.04265
Autor:
Zhao, Wenting, Goyal, Tanya, Chiu, Yu Ying, Jiang, Liwei, Newman, Benjamin, Ravichander, Abhilasha, Chandu, Khyathi, Bras, Ronan Le, Cardie, Claire, Deng, Yuntian, Choi, Yejin
While hallucinations of large language models (LLMs) prevail as a major challenge, existing evaluation benchmarks on factuality do not cover the diverse domains of knowledge that the real-world users of LLMs seek information about. To bridge this gap
Externí odkaz:
http://arxiv.org/abs/2407.17468
Autor:
Brahman, Faeze, Kumar, Sachin, Balachandran, Vidhisha, Dasigi, Pradeep, Pyatkin, Valentina, Ravichander, Abhilasha, Wiegreffe, Sarah, Dziri, Nouha, Chandu, Khyathi, Hessel, Jack, Tsvetkov, Yulia, Smith, Noah A., Choi, Yejin, Hajishirzi, Hannaneh
Chat-based language models are designed to be helpful, yet they should not comply with every user request. While most existing work primarily focuses on refusal of "unsafe" queries, we posit that the scope of noncompliance should be broadened. We int
Externí odkaz:
http://arxiv.org/abs/2407.12043
Autor:
Chandu, Khyathi Raghavi, Li, Linjie, Awadalla, Anas, Lu, Ximing, Park, Jae Sung, Hessel, Jack, Wang, Lijuan, Choi, Yejin
The ability to acknowledge the inevitable uncertainty in their knowledge and reasoning is a prerequisite for AI systems to be truly truthful and reliable. In this paper, we present a taxonomy of uncertainty specific to vision-language AI systems, dis
Externí odkaz:
http://arxiv.org/abs/2407.01942
Autor:
Li, Jeffrey, Fang, Alex, Smyrnis, Georgios, Ivgi, Maor, Jordan, Matt, Gadre, Samir, Bansal, Hritik, Guha, Etash, Keh, Sedrick, Arora, Kushal, Garg, Saurabh, Xin, Rui, Muennighoff, Niklas, Heckel, Reinhard, Mercat, Jean, Chen, Mayee, Gururangan, Suchin, Wortsman, Mitchell, Albalak, Alon, Bitton, Yonatan, Nezhurina, Marianna, Abbas, Amro, Hsieh, Cheng-Yu, Ghosh, Dhruba, Gardner, Josh, Kilian, Maciej, Zhang, Hanlin, Shao, Rulin, Pratt, Sarah, Sanyal, Sunny, Ilharco, Gabriel, Daras, Giannis, Marathe, Kalyani, Gokaslan, Aaron, Zhang, Jieyu, Chandu, Khyathi, Nguyen, Thao, Vasiljevic, Igor, Kakade, Sham, Song, Shuran, Sanghavi, Sujay, Faghri, Fartash, Oh, Sewoong, Zettlemoyer, Luke, Lo, Kyle, El-Nouby, Alaaeldin, Pouransari, Hadi, Toshev, Alexander, Wang, Stephanie, Groeneveld, Dirk, Soldaini, Luca, Koh, Pang Wei, Jitsev, Jenia, Kollar, Thomas, Dimakis, Alexandros G., Carmon, Yair, Dave, Achal, Schmidt, Ludwig, Shankar, Vaishaal
We introduce DataComp for Language Models (DCLM), a testbed for controlled dataset experiments with the goal of improving language models. As part of DCLM, we provide a standardized corpus of 240T tokens extracted from Common Crawl, effective pretrai
Externí odkaz:
http://arxiv.org/abs/2406.11794
Autor:
Lin, Bill Yuchen, Deng, Yuntian, Chandu, Khyathi, Brahman, Faeze, Ravichander, Abhilasha, Pyatkin, Valentina, Dziri, Nouha, Bras, Ronan Le, Choi, Yejin
We introduce WildBench, an automated evaluation framework designed to benchmark large language models (LLMs) using challenging, real-world user queries. WildBench consists of 1,024 tasks carefully selected from over one million human-chatbot conversa
Externí odkaz:
http://arxiv.org/abs/2406.04770
Autor:
Nawrath, Marcel, Nowak, Agnieszka, Ratz, Tristan, Walenta, Danilo C., Opitz, Juri, Ribeiro, Leonardo F. R., Sedoc, João, Deutsch, Daniel, Mille, Simon, Liu, Yixin, Zhang, Lining, Gehrmann, Sebastian, Mahamood, Saad, Clinciu, Miruna, Chandu, Khyathi, Hou, Yufang
At the heart of the Pyramid evaluation method for text summarization lie human written summary content units (SCUs). These SCUs are concise sentences that decompose a summary into small facts. Such SCUs can be used to judge the quality of a candidate
Externí odkaz:
http://arxiv.org/abs/2404.01701
Autor:
Lambert, Nathan, Pyatkin, Valentina, Morrison, Jacob, Miranda, LJ, Lin, Bill Yuchen, Chandu, Khyathi, Dziri, Nouha, Kumar, Sachin, Zick, Tom, Choi, Yejin, Smith, Noah A., Hajishirzi, Hannaneh
Reward models (RMs) are at the crux of successfully using RLHF to align pretrained models to human preferences, yet there has been relatively little study that focuses on evaluation of those models. Evaluating reward models presents an opportunity to
Externí odkaz:
http://arxiv.org/abs/2403.13787
Autor:
Srinivasan, Tejas, Hessel, Jack, Gupta, Tanmay, Lin, Bill Yuchen, Choi, Yejin, Thomason, Jesse, Chandu, Khyathi Raghavi
Selective prediction minimizes incorrect predictions from vision-language models (VLMs) by allowing them to abstain from answering when uncertain. However, when deploying a vision-language system with low tolerance for inaccurate predictions, selecti
Externí odkaz:
http://arxiv.org/abs/2402.15610