Zobrazeno 1 - 10
of 23
pro vyhledávání: '"Bukharin, Alexander"'
Autor:
Wang, Zhilin, Bukharin, Alexander, Delalleau, Olivier, Egert, Daniel, Shen, Gerald, Zeng, Jiaqi, Kuchaiev, Oleksii, Dong, Yi
Reward models are critical for aligning models to follow instructions, and are typically trained following one of two popular paradigms: Bradley-Terry style or Regression style. However, there is a lack of evidence that either approach is better than
Externí odkaz:
http://arxiv.org/abs/2410.01257
Autor:
Wang, Kuan, Bukharin, Alexander, Jiang, Haoming, Yin, Qingyu, Wang, Zhengyang, Zhao, Tuo, Shang, Jingbo, Zhang, Chao, Yin, Bing, Li, Xian, Chen, Jianshu, Li, Shiyang
Instruction fine-tuning (IFT) elicits instruction following capabilities and steers the behavior of large language models (LLMs) via supervised learning. However, existing models trained on open-source IFT datasets only have the ability to follow ins
Externí odkaz:
http://arxiv.org/abs/2409.13733
Autor:
Bukharin, Alexander, Hong, Ilgee, Jiang, Haoming, Li, Zichong, Zhang, Qingru, Zhang, Zixuan, Zhao, Tuo
Reinforcement learning from human feedback (RLHF) provides a principled framework for aligning AI systems with human preference data. For various reasons, e.g., personal bias, context ambiguity, lack of training, etc, human annotators may give incorr
Externí odkaz:
http://arxiv.org/abs/2406.15568
Autor:
Hong, Ilgee, Li, Zichong, Bukharin, Alexander, Li, Yixiao, Jiang, Haoming, Yang, Tianbao, Zhao, Tuo
Reinforcement learning from human feedback (RLHF) is a prevalent approach to align AI systems with human values by learning rewards from human preference data. Due to various reasons, however, such data typically takes the form of rankings over pairs
Externí odkaz:
http://arxiv.org/abs/2406.02764
Autor:
Bukharin, Alexander, Li, Shiyang, Wang, Zhengyang, Yang, Jingfeng, Yin, Bing, Li, Xian, Zhang, Chao, Zhao, Tuo, Jiang, Haoming
Recent works have shown that by curating high quality and diverse instruction tuning datasets, we can significantly improve instruction-following capabilities. However, creating such datasets is difficult and most works rely on manual curation or pro
Externí odkaz:
http://arxiv.org/abs/2311.14736
Autor:
Bukharin, Alexander, Li, Yan, Yu, Yue, Zhang, Qingru, Chen, Zhehui, Zuo, Simiao, Zhang, Chao, Zhang, Songan, Zhao, Tuo
Multi-Agent Reinforcement Learning (MARL) has shown promising results across several domains. Despite this promise, MARL policies often lack robustness and are therefore sensitive to small changes in their environment. This presents a serious concern
Externí odkaz:
http://arxiv.org/abs/2310.10810
Reward design is a fundamental, yet challenging aspect of reinforcement learning (RL). Researchers typically utilize feedback signals from the environment to handcraft a reward function, but this process is not always effective due to the varying sca
Externí odkaz:
http://arxiv.org/abs/2309.02632
Autor:
Bukharin, Alexander, Liu, Tianyi, Wang, Shengjie, Zuo, Simiao, Gao, Weihao, Yan, Wen, Zhao, Tuo
Machine learning force fields (MLFF) have been proposed to accelerate molecular dynamics (MD) simulation, which finds widespread applications in chemistry and biomedical research. Even for the most data-efficient MLFFs, reaching chemical accuracy can
Externí odkaz:
http://arxiv.org/abs/2306.03109
Autor:
Zhang, Qingru, Chen, Minshuo, Bukharin, Alexander, Karampatziakis, Nikos, He, Pengcheng, Cheng, Yu, Chen, Weizhu, Zhao, Tuo
Fine-tuning large pre-trained language models on downstream tasks has become an important paradigm in NLP. However, common practice fine-tunes all of the parameters in a pre-trained model, which becomes prohibitive when a large number of downstream t
Externí odkaz:
http://arxiv.org/abs/2303.10512
Autor:
Zhang, Qingru, Zuo, Simiao, Liang, Chen, Bukharin, Alexander, He, Pengcheng, Chen, Weizhu, Zhao, Tuo
Large Transformer-based models have exhibited superior performance in various natural language processing and computer vision tasks. However, these models contain enormous amounts of parameters, which restrict their deployment to real-world applicati
Externí odkaz:
http://arxiv.org/abs/2206.12562