Zobrazeno 1 - 10
of 53
pro vyhledávání: '"Zhou, Enyu"'
Autor:
Dou, Shihan, Jia, Haoxiang, Wu, Shenxi, Zheng, Huiyuan, Zhou, Weikang, Wu, Muling, Chai, Mingxu, Fan, Jessica, Huang, Caishuang, Tao, Yunbo, Liu, Yan, Zhou, Enyu, Zhang, Ming, Zhou, Yuhao, Wu, Yueming, Zheng, Rui, Wen, Ming, Weng, Rongxiang, Wang, Jingang, Cai, Xunliang, Gui, Tao, Qiu, Xipeng, Zhang, Qi, Huang, Xuanjing
The increasing development of large language models (LLMs) in code generation has drawn significant attention among researchers. To enhance LLM-based code generation ability, current efforts are predominantly directed towards collecting high-quality
Externí odkaz:
http://arxiv.org/abs/2407.06153
Autor:
Huang, Caishuang, Zhao, Wanxu, Zheng, Rui, Lv, Huijie, Dou, Shihan, Li, Sixian, Wang, Xiao, Zhou, Enyu, Ye, Junjie, Yang, Yuming, Gui, Tao, Zhang, Qi, Huang, Xuanjing
As the development of large language models (LLMs) rapidly advances, securing these models effectively without compromising their utility has become a pivotal area of research. However, current defense strategies against jailbreak attacks (i.e., effo
Externí odkaz:
http://arxiv.org/abs/2406.18118
Autor:
Bao, Rong, Zheng, Rui, Dou, Shihan, Wang, Xiao, Zhou, Enyu, Wang, Bo, Zhang, Qi, Ding, Liang, Tao, Dacheng
In aligning large language models (LLMs), utilizing feedback from existing advanced AI rather than humans is an important method to scale supervisory signals. However, it is highly challenging for AI to understand human intentions and societal values
Externí odkaz:
http://arxiv.org/abs/2406.11190
Autor:
Dou, Shihan, Liu, Yan, Zhou, Enyu, Li, Tianlong, Jia, Haoxiang, Xiong, Limao, Zhao, Xin, Ye, Junjie, Zheng, Rui, Gui, Tao, Zhang, Qi, Huang, Xuanjing
The success of Reinforcement Learning from Human Feedback (RLHF) in language model alignment is critically dependent on the capability of the reward model (RM). However, as the training process progresses, the output distribution of the policy model
Externí odkaz:
http://arxiv.org/abs/2405.00438
Autor:
Dou, Shihan, Liu, Yan, Jia, Haoxiang, Xiong, Limao, Zhou, Enyu, Shen, Wei, Shan, Junjie, Huang, Caishuang, Wang, Xiao, Fan, Xiaoran, Xi, Zhiheng, Zhou, Yuhao, Ji, Tao, Zheng, Rui, Zhang, Qi, Huang, Xuanjing, Gui, Tao
The advancement of large language models (LLMs) has significantly propelled the field of code generation. Previous work integrated reinforcement learning (RL) with compiler feedback for exploring the output space of LLMs to enhance code generation qu
Externí odkaz:
http://arxiv.org/abs/2402.01391
Autor:
Wang, Binghai, Zheng, Rui, Chen, Lu, Liu, Yan, Dou, Shihan, Huang, Caishuang, Shen, Wei, Jin, Senjie, Zhou, Enyu, Shi, Chenyu, Gao, Songyang, Xu, Nuo, Zhou, Yuhao, Fan, Xiaoran, Xi, Zhiheng, Zhao, Jun, Wang, Xiao, Ji, Tao, Yan, Hang, Shen, Lixing, Chen, Zhan, Gui, Tao, Zhang, Qi, Qiu, Xipeng, Huang, Xuanjing, Wu, Zuxuan, Jiang, Yu-Gang
Reinforcement Learning from Human Feedback (RLHF) has become a crucial technology for aligning language models with human values and intentions, enabling models to produce more helpful and harmless responses. Reward models are trained as proxies for
Externí odkaz:
http://arxiv.org/abs/2401.06080
Autor:
Dou, Shihan, Zhou, Enyu, Liu, Yan, Gao, Songyang, Zhao, Jun, Shen, Wei, Zhou, Yuhao, Xi, Zhiheng, Wang, Xiao, Fan, Xiaoran, Pu, Shiliang, Zhu, Jiang, Zheng, Rui, Gui, Tao, Zhang, Qi, Huang, Xuanjing
Supervised fine-tuning (SFT) is a crucial step for large language models (LLMs), enabling them to align with human instructions and enhance their capabilities in downstream tasks. Increasing instruction data substantially is a direct solution to alig
Externí odkaz:
http://arxiv.org/abs/2312.09979
Autor:
Zhou, Enyu, Zheng, Rui, Xi, Zhiheng, Gao, Songyang, Fan, Xiaoran, Fei, Zichu, Ye, Jingting, Gui, Tao, Zhang, Qi, Huang, Xuanjing
Reports of human-like behaviors in foundation models are growing, with psychological theories providing enduring tools to investigate these behaviors. However, current research tends to directly apply these human-oriented tools without verifying the
Externí odkaz:
http://arxiv.org/abs/2310.11227
Autor:
Xi, Zhiheng, Chen, Wenxiang, Guo, Xin, He, Wei, Ding, Yiwen, Hong, Boyang, Zhang, Ming, Wang, Junzhe, Jin, Senjie, Zhou, Enyu, Zheng, Rui, Fan, Xiaoran, Wang, Xiao, Xiong, Limao, Zhou, Yuhao, Wang, Weiran, Jiang, Changhao, Zou, Yicheng, Liu, Xiangyang, Yin, Zhangyue, Dou, Shihan, Weng, Rongxiang, Cheng, Wensen, Zhang, Qi, Qin, Wenjuan, Zheng, Yongyan, Qiu, Xipeng, Huang, Xuanjing, Gui, Tao
For a long time, humanity has pursued artificial intelligence (AI) equivalent to or surpassing the human level, with AI agents considered a promising vehicle for this pursuit. AI agents are artificial entities that sense their environment, make decis
Externí odkaz:
http://arxiv.org/abs/2309.07864
Optical flow estimation is a fundamental task in computer vision. Recent direct-regression methods using deep neural networks achieve remarkable performance improvement. However, they do not explicitly capture long-term motion correspondences and thu
Externí odkaz:
http://arxiv.org/abs/2203.11335