Zobrazeno 1 - 10
of 7 426
pro vyhledávání: '"A, Kveton"'
Autor:
Thekumparampil, Kiran Koshy, Hiranandani, Gaurush, Kalantari, Kousha, Sabach, Shoham, Kveton, Branislav
We study learning of human preferences from a limited comparison feedback. This task is ubiquitous in machine learning. Its applications such as reinforcement learning from human feedback, have been transformational. We formulate this problem as lear
Externí odkaz:
http://arxiv.org/abs/2412.19396
Autor:
Nguyen, Dang, Chen, Jian, Wang, Yu, Wu, Gang, Park, Namyong, Hu, Zhengmian, Lyu, Hanjia, Wu, Junda, Aponte, Ryan, Xia, Yu, Li, Xintong, Shi, Jing, Chen, Hongjie, Lai, Viet Dac, Xie, Zhouhang, Kim, Sungchul, Zhang, Ruiyi, Yu, Tong, Tanjim, Mehrab, Ahmed, Nesreen K., Mathur, Puneet, Yoon, Seunghyun, Yao, Lina, Kveton, Branislav, Nguyen, Thien Huu, Bui, Trung, Zhou, Tianyi, Rossi, Ryan A., Dernoncourt, Franck
Graphical User Interface (GUI) agents, powered by Large Foundation Models, have emerged as a transformative approach to automating human-computer interaction. These agents autonomously interact with digital systems or software applications via GUIs,
Externí odkaz:
http://arxiv.org/abs/2412.13501
Autor:
Mukherjee, Subhojyoti, Lalitha, Anusha, Sengupta, Sailik, Deshmukh, Aniket, Kveton, Branislav
Multi-objective alignment from human feedback (MOAHF) in large language models (LLMs) is a challenging problem as human preferences are complex, multifaceted, and often conflicting. Recent works on MOAHF considered a-priori multi-objective optimizati
Externí odkaz:
http://arxiv.org/abs/2412.05469
The growth of recommender systems (RecSys) is driven by digitization and the need for personalized content in areas such as e-commerce and video streaming. The content in these systems often changes rapidly and therefore they constantly face the ongo
Externí odkaz:
http://arxiv.org/abs/2411.09065
Autor:
Wu, Junda, Li, Xintong, Wang, Ruoyu, Xia, Yu, Xiong, Yuxin, Wang, Jianing, Yu, Tong, Chen, Xiang, Kveton, Branislav, Yao, Lina, Shang, Jingbo, McAuley, Julian
Offline evaluation of LLMs is crucial in understanding their capacities, though current methods remain underexplored in existing research. In this work, we focus on the offline evaluation of the chain-of-thought capabilities and show how to optimize
Externí odkaz:
http://arxiv.org/abs/2410.23703
Autor:
Zhang, Zhehao, Rossi, Ryan A., Kveton, Branislav, Shao, Yijia, Yang, Diyi, Zamani, Hamed, Dernoncourt, Franck, Barrow, Joe, Yu, Tong, Kim, Sungchul, Zhang, Ruiyi, Gu, Jiuxiang, Derr, Tyler, Chen, Hongjie, Wu, Junda, Chen, Xiang, Wang, Zichao, Mitra, Subrata, Lipka, Nedim, Ahmed, Nesreen, Wang, Yu
Personalization of Large Language Models (LLMs) has recently become increasingly important with a wide range of applications. Despite the importance and recent progress, most existing works on personalized LLMs have focused either entirely on (a) per
Externí odkaz:
http://arxiv.org/abs/2411.00027
Posterior sampling in contextual bandits with a Gaussian prior can be implemented exactly or approximately using the Laplace approximation. The Gaussian prior is computationally efficient but it cannot describe complex distributions. In this work, we
Externí odkaz:
http://arxiv.org/abs/2410.03919
Learning from human feedback has been central to recent advances in artificial intelligence and machine learning. Since the collection of human feedback is costly, a natural question to ask is if the new feedback always needs to collected. Or could w
Externí odkaz:
http://arxiv.org/abs/2406.10030
We study estimator selection and hyper-parameter tuning in off-policy evaluation. Although cross-validation is the most popular method for model selection in supervised learning, off-policy evaluation relies mostly on theory, which provides only limi
Externí odkaz:
http://arxiv.org/abs/2405.15332
Autor:
Mukherjee, Subhojyoti, Lalitha, Anusha, Kalantari, Kousha, Deshmukh, Aniket, Liu, Ge, Ma, Yifei, Kveton, Branislav
Learning of preference models from human feedback has been central to recent advances in artificial intelligence. Motivated by the cost of obtaining high-quality human annotations, we study efficient human preference elicitation for learning preferen
Externí odkaz:
http://arxiv.org/abs/2404.13895