Zobrazeno 1 - 10
of 126
pro vyhledávání: '"Jamieson, Kevin"'
Autor:
Wagenmaker, Andrew, Huang, Kevin, Ke, Liyiming, Boots, Byron, Jamieson, Kevin, Gupta, Abhishek
In order to mitigate the sample complexity of real-world reinforcement learning, common practice is to first train a policy in a simulator where samples are cheap, and then deploy this policy in the real world, with the hope that it generalizes effec
Externí odkaz:
http://arxiv.org/abs/2410.20254
Autor:
Chen, Yifang, Wang, Shuohang, Yang, Ziyi, Sharma, Hiteshi, Karampatziakis, Nikos, Yu, Donghan, Jamieson, Kevin, Du, Simon Shaolei, Shen, Yelong
Reinforcement learning with human feedback (RLHF), as a widely adopted approach in current large language model pipelines, is \textit{bottlenecked by the size of human preference data}. While traditional methods rely on offline preference dataset con
Externí odkaz:
http://arxiv.org/abs/2407.02119
Autor:
Zhang, Jifan, Jain, Lalit, Guo, Yang, Chen, Jiayi, Zhou, Kuan Lok, Suresh, Siddharth, Wagenmaker, Andrew, Sievert, Scott, Rogers, Timothy, Jamieson, Kevin, Mankoff, Robert, Nowak, Robert
We present a novel multimodal preference dataset for creative tasks, consisting of over 250 million human ratings on more than 2.2 million captions, collected through crowdsourcing rating data for The New Yorker's weekly cartoon caption contest over
Externí odkaz:
http://arxiv.org/abs/2406.10522
In this paper, we study the non-asymptotic sample complexity for the pure exploration problem in contextual bandits and tabular reinforcement learning (RL): identifying an epsilon-optimal policy from a set of policies with high probability. Existing
Externí odkaz:
http://arxiv.org/abs/2406.06856
Autor:
Wang, Yiping, Chen, Yifang, Yan, Wendan, Fang, Alex, Zhou, Wenjing, Jamieson, Kevin, Du, Simon Shaolei
Data selection has emerged as a core issue for large-scale visual-language model pretaining (e.g., CLIP), particularly with noisy web-curated datasets. Three main data selection approaches are: (1) leveraging external non-CLIP models to aid data sele
Externí odkaz:
http://arxiv.org/abs/2405.19547
In recent years, data selection has emerged as a core issue for large-scale visual-language model pretraining, especially on noisy web-curated datasets. One widely adopted strategy assigns quality scores such as CLIP similarity for each sample and re
Externí odkaz:
http://arxiv.org/abs/2402.02055
Autor:
Bhatt, Gantavya, Chen, Yifang, Das, Arnav M., Zhang, Jifan, Truong, Sang T., Mussmann, Stephen, Zhu, Yinglun, Bilmes, Jeffrey, Du, Simon S., Jamieson, Kevin, Ash, Jordan T., Nowak, Robert D.
Supervised finetuning (SFT) on instruction datasets has played a crucial role in achieving the remarkable zero-shot generalization capabilities observed in modern large language models (LLMs). However, the annotation efforts required to produce high
Externí odkaz:
http://arxiv.org/abs/2401.06692
In critical machine learning applications, ensuring fairness is essential to avoid perpetuating social inequities. In this work, we address the challenges of reducing bias and improving accuracy in data-scarce environments, where the cost of collecti
Externí odkaz:
http://arxiv.org/abs/2312.08559
We consider maximizing a monotonic, submodular set function $f: 2^{[n]} \rightarrow [0,1]$ under stochastic bandit feedback. Specifically, $f$ is unknown to the learner but at each time $t=1,\dots,T$ the learner chooses a set $S_t \subset [n]$ with $
Externí odkaz:
http://arxiv.org/abs/2310.18465
We study the sample complexity of identifying the pure strategy Nash equilibrium (PSNE) in a two-player zero-sum matrix game with noise. Formally, we are given a stochastic model where any learner can sample an entry $(i,j)$ of the input matrix $A\in
Externí odkaz:
http://arxiv.org/abs/2310.16252