Výsledky vyhledávání

Report

Comparing Few to Rank Many: Active Human Preference Learning using Randomized Frank-Wolfe

Autor: Thekumparampil, Kiran Koshy, Hiranandani, Gaurush, Kalantari, Kousha, Sabach, Shoham, Kveton, Branislav

We study learning of human preferences from a limited comparison feedback. This task is ubiquitous in machine learning. Its applications such as reinforcement learning from human feedback, have been transformational. We formulate this problem as lear

Externí odkaz: http://arxiv.org/abs/2412.19396

Zobrazit plný text záznamu

Report

GUI Agents: A Survey

Graphical User Interface (GUI) agents, powered by Large Foundation Models, have emerged as a transformative approach to automating human-computer interaction. These agents autonomously interact with digital systems or software applications via GUIs,

Externí odkaz: http://arxiv.org/abs/2412.13501

Zobrazit plný text záznamu

Report

Multi-Objective Alignment of Large Language Models Through Hypervolume Maximization

Autor: Mukherjee, Subhojyoti, Lalitha, Anusha, Sengupta, Sailik, Deshmukh, Aniket, Kveton, Branislav

Multi-objective alignment from human feedback (MOAHF) in large language models (LLMs) is a challenging problem as human preferences are complex, multifaceted, and often conflicting. Recent works on MOAHF considered a-priori multi-objective optimizati

Externí odkaz: http://arxiv.org/abs/2412.05469

Zobrazit plný text záznamu

Report

Language-Model Prior Overcomes Cold-Start Items

Autor: Wang, Shiyu, Ding, Hao, Gu, Yupeng, Aydore, Sergul, Kalantari, Kousha, Kveton, Branislav

The growth of recommender systems (RecSys) is driven by digitization and the need for personalized content in areas such as e-commerce and video streaming. The content in these systems often changes rapidly and therefore they constantly face the ongo

Externí odkaz: http://arxiv.org/abs/2411.09065

Zobrazit plný text záznamu

Report

OCEAN: Offline Chain-of-thought Evaluation and Alignment in Large Language Models

Autor: Wu, Junda, Li, Xintong, Wang, Ruoyu, Xia, Yu, Xiong, Yuxin, Wang, Jianing, Yu, Tong, Chen, Xiang, Kveton, Branislav, Yao, Lina, Shang, Jingbo, McAuley, Julian

Offline evaluation of LLMs is crucial in understanding their capacities, though current methods remain underexplored in existing research. In this work, we focus on the offline evaluation of the chain-of-thought capabilities and show how to optimize

Externí odkaz: http://arxiv.org/abs/2410.23703

Zobrazit plný text záznamu

Report

Personalization of Large Language Models: A Survey

Personalization of Large Language Models (LLMs) has recently become increasingly important with a wide range of applications. Despite the importance and recent progress, most existing works on personalized LLMs have focused either entirely on (a) per

Externí odkaz: http://arxiv.org/abs/2411.00027

Zobrazit plný text záznamu

Report

Online Posterior Sampling with a Diffusion Prior

Autor: Kveton, Branislav, Oreshkin, Boris, Park, Youngsuk, Deshmukh, Aniket, Song, Rui

Posterior sampling in contextual bandits with a Gaussian prior can be implemented exactly or approximately using the Laplace approximation. The Gaussian prior is computationally efficient but it cannot describe complex distributions. In this work, we

Externí odkaz: http://arxiv.org/abs/2410.03919

Zobrazit plný text záznamu

Report

Off-Policy Evaluation from Logged Human Feedback

Autor: Bhargava, Aniruddha, Jain, Lalit, Kveton, Branislav, Liu, Ge, Mukherjee, Subhojyoti

Learning from human feedback has been central to recent advances in artificial intelligence and machine learning. Since the collection of human feedback is costly, a natural question to ask is if the new feedback always needs to collected. Or could w

Externí odkaz: http://arxiv.org/abs/2406.10030

Zobrazit plný text záznamu

Report

Cross-Validated Off-Policy Evaluation

Autor: Cief, Matej, Kveton, Branislav, Kompan, Michal

We study estimator selection and hyper-parameter tuning in off-policy evaluation. Although cross-validation is the most popular method for model selection in supervised learning, off-policy evaluation relies mostly on theory, which provides only limi

Externí odkaz: http://arxiv.org/abs/2405.15332

Zobrazit plný text záznamu

Report

Optimal Design for Human Preference Elicitation

Autor: Mukherjee, Subhojyoti, Lalitha, Anusha, Kalantari, Kousha, Deshmukh, Aniket, Liu, Ge, Ma, Yifei, Kveton, Branislav

Learning of preference models from human feedback has been central to recent advances in artificial intelligence. Motivated by the cost of obtaining high-quality human annotations, we study efficient human preference elicitation for learning preferen

Externí odkaz: http://arxiv.org/abs/2404.13895

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání