Výsledky vyhledávání - "Jamieson, Kevin"

Report

Overcoming the Sim-to-Real Gap: Leveraging Simulation to Learn to Explore for Real-World RL

Autor: Wagenmaker, Andrew, Huang, Kevin, Ke, Liyiming, Boots, Byron, Jamieson, Kevin, Gupta, Abhishek

In order to mitigate the sample complexity of real-world reinforcement learning, common practice is to first train a policy in a simulator where samples are cheap, and then deploy this policy in the real world, with the hope that it generalizes effec

Externí odkaz: http://arxiv.org/abs/2410.20254

Zobrazit plný text záznamu

Report

Cost-Effective Proxy Reward Model Construction with On-Policy and Active Learning

Autor: Chen, Yifang, Wang, Shuohang, Yang, Ziyi, Sharma, Hiteshi, Karampatziakis, Nikos, Yu, Donghan, Jamieson, Kevin, Du, Simon Shaolei, Shen, Yelong

Reinforcement learning with human feedback (RLHF), as a widely adopted approach in current large language model pipelines, is \textit{bottlenecked by the size of human preference data}. While traditional methods rely on offline preference dataset con

Externí odkaz: http://arxiv.org/abs/2407.02119

Zobrazit plný text záznamu

Report

Humor in AI: Massive Scale Crowd-Sourced Preferences and Benchmarks for Cartoon Captioning

Autor: Zhang, Jifan, Jain, Lalit, Guo, Yang, Chen, Jiayi, Zhou, Kuan Lok, Suresh, Siddharth, Wagenmaker, Andrew, Sievert, Scott, Rogers, Timothy, Jamieson, Kevin, Mankoff, Robert, Nowak, Robert

We present a novel multimodal preference dataset for creative tasks, consisting of over 250 million human ratings on more than 2.2 million captions, collected through crowdsourcing rating data for The New Yorker's weekly cartoon caption contest over

Externí odkaz: http://arxiv.org/abs/2406.10522

Zobrazit plný text záznamu

Report

Sample Complexity Reduction via Policy Difference Estimation in Tabular Reinforcement Learning

Autor: Narang, Adhyyan, Wagenmaker, Andrew, Ratliff, Lillian, Jamieson, Kevin

In this paper, we study the non-asymptotic sample complexity for the pure exploration problem in contextual bandits and tabular reinforcement learning (RL): identifying an epsilon-optimal policy from a set of policies with high probability. Existing

Externí odkaz: http://arxiv.org/abs/2406.06856

Zobrazit plný text záznamu

Report

CLIPLoss and Norm-Based Data Selection Methods for Multimodal Contrastive Learning

Autor: Wang, Yiping, Chen, Yifang, Yan, Wendan, Fang, Alex, Zhou, Wenjing, Jamieson, Kevin, Du, Simon Shaolei

Data selection has emerged as a core issue for large-scale visual-language model pretaining (e.g., CLIP), particularly with noisy web-curated datasets. Three main data selection approaches are: (1) leveraging external non-CLIP models to aid data sele

Externí odkaz: http://arxiv.org/abs/2405.19547

Zobrazit plný text záznamu

Report

Variance Alignment Score: A Simple But Tough-to-Beat Data Selection Method for Multimodal Contrastive Learning

Autor: Wang, Yiping, Chen, Yifang, Yan, Wendan, Jamieson, Kevin, Du, Simon Shaolei

In recent years, data selection has emerged as a core issue for large-scale visual-language model pretraining, especially on noisy web-curated datasets. One widely adopted strategy assigns quality scores such as CLIP similarity for each sample and re

Externí odkaz: http://arxiv.org/abs/2402.02055

Zobrazit plný text záznamu

Report

An Experimental Design Framework for Label-Efficient Supervised Finetuning of Large Language Models

Autor: Bhatt, Gantavya, Chen, Yifang, Das, Arnav M., Zhang, Jifan, Truong, Sang T., Mussmann, Stephen, Zhu, Yinglun, Bilmes, Jeffrey, Du, Simon S., Jamieson, Kevin, Ash, Jordan T., Nowak, Robert D.

Supervised finetuning (SFT) on instruction datasets has played a crucial role in achieving the remarkable zero-shot generalization capabilities observed in modern large language models (LLMs). However, the annotation efforts required to produce high

Externí odkaz: http://arxiv.org/abs/2401.06692

Zobrazit plný text záznamu

Report

Fair Active Learning in Low-Data Regimes

Autor: Camilleri, Romain, Wagenmaker, Andrew, Morgenstern, Jamie, Jain, Lalit, Jamieson, Kevin

In critical machine learning applications, ensuring fairness is essential to avoid perpetuating social inequities. In this work, we address the challenges of reducing bias and improving accuracy in data-scarce environments, where the cost of collecti

Externí odkaz: http://arxiv.org/abs/2312.08559

Zobrazit plný text záznamu

Report

Minimax Optimal Submodular Optimization with Bandit Feedback

Autor: Tajdini, Artin, Jain, Lalit, Jamieson, Kevin

We consider maximizing a monotonic, submodular set function $f: 2^{[n]} \rightarrow [0,1]$ under stochastic bandit feedback. Specifically, $f$ is unknown to the learner but at each time $t=1,\dots,T$ the learner chooses a set $S_t \subset [n]$ with $

Externí odkaz: http://arxiv.org/abs/2310.18465

Zobrazit plný text záznamu

Report

Near-Optimal Pure Exploration in Matrix Games: A Generalization of Stochastic Bandits & Dueling Bandits

Autor: Maiti, Arnab, Boczar, Ross, Jamieson, Kevin, Ratliff, Lillian J.

We study the sample complexity of identifying the pure strategy Nash equilibrium (PSNE) in a two-player zero-sum matrix game with noise. Formally, we are given a stochastic model where any learner can sample an entry $(i,j)$ of the input matrix $A\in

Externí odkaz: http://arxiv.org/abs/2310.16252

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání