Výsledky vyhledávání

Report

SafeAligner: Safety Alignment against Jailbreak Attacks via Response Disparity Guidance

Autor: Huang, Caishuang, Zhao, Wanxu, Zheng, Rui, Lv, Huijie, Dou, Shihan, Li, Sixian, Wang, Xiao, Zhou, Enyu, Ye, Junjie, Yang, Yuming, Gui, Tao, Zhang, Qi, Huang, Xuanjing

As the development of large language models (LLMs) rapidly advances, securing these models effectively without compromising their utility has become a pivotal area of research. However, current defense strategies against jailbreak attacks (i.e., effo

Externí odkaz: http://arxiv.org/abs/2406.18118

Zobrazit plný text záznamu

Report

Aligning Large Language Models from Self-Reference AI Feedback with one General Principle

Autor: Bao, Rong, Zheng, Rui, Dou, Shihan, Wang, Xiao, Zhou, Enyu, Wang, Bo, Zhang, Qi, Ding, Liang, Tao, Dacheng

In aligning large language models (LLMs), utilizing feedback from existing advanced AI rather than humans is an important method to scale supervisory signals. However, it is highly challenging for AI to understand human intentions and societal values

Externí odkaz: http://arxiv.org/abs/2406.11190

Zobrazit plný text záznamu

Report

MetaRM: Shifted Distributions Alignment via Meta-Learning

Autor: Dou, Shihan, Liu, Yan, Zhou, Enyu, Li, Tianlong, Jia, Haoxiang, Xiong, Limao, Zhao, Xin, Ye, Junjie, Zheng, Rui, Gui, Tao, Zhang, Qi, Huang, Xuanjing

The success of Reinforcement Learning from Human Feedback (RLHF) in language model alignment is critically dependent on the capability of the reward model (RM). However, as the training process progresses, the output distribution of the policy model

Externí odkaz: http://arxiv.org/abs/2405.00438

Zobrazit plný text záznamu

Report

StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback

Autor: Dou, Shihan, Liu, Yan, Jia, Haoxiang, Xiong, Limao, Zhou, Enyu, Shen, Wei, Shan, Junjie, Huang, Caishuang, Wang, Xiao, Fan, Xiaoran, Xi, Zhiheng, Zhou, Yuhao, Ji, Tao, Zheng, Rui, Zhang, Qi, Huang, Xuanjing, Gui, Tao

The advancement of large language models (LLMs) has significantly propelled the field of code generation. Previous work integrated reinforcement learning (RL) with compiler feedback for exploring the output space of LLMs to enhance code generation qu

Externí odkaz: http://arxiv.org/abs/2402.01391

Zobrazit plný text záznamu

Report

Secrets of RLHF in Large Language Models Part II: Reward Modeling

Reinforcement Learning from Human Feedback (RLHF) has become a crucial technology for aligning language models with human values and intentions, enabling models to produce more helpful and harmless responses. Reward models are trained as proxies for

Externí odkaz: http://arxiv.org/abs/2401.06080

Zobrazit plný text záznamu

Report

LoRAMoE: Alleviate World Knowledge Forgetting in Large Language Models via MoE-Style Plugin

Autor: Dou, Shihan, Zhou, Enyu, Liu, Yan, Gao, Songyang, Zhao, Jun, Shen, Wei, Zhou, Yuhao, Xi, Zhiheng, Wang, Xiao, Fan, Xiaoran, Pu, Shiliang, Zhu, Jiang, Zheng, Rui, Gui, Tao, Zhang, Qi, Huang, Xuanjing

Supervised fine-tuning (SFT) is a crucial step for large language models (LLMs), enabling them to align with human instructions and enhance their capabilities in downstream tasks. Increasing instruction data substantially is a direct solution to alig

Externí odkaz: http://arxiv.org/abs/2312.09979

Zobrazit plný text záznamu

Report

RealBehavior: A Framework for Faithfully Characterizing Foundation Models' Human-like Behavior Mechanisms

Autor: Zhou, Enyu, Zheng, Rui, Xi, Zhiheng, Gao, Songyang, Fan, Xiaoran, Fei, Zichu, Ye, Jingting, Gui, Tao, Zhang, Qi, Huang, Xuanjing

Reports of human-like behaviors in foundation models are growing, with psychological theories providing enduring tools to investigate these behaviors. However, current research tends to directly apply these human-oriented tools without verifying the

Externí odkaz: http://arxiv.org/abs/2310.11227

Zobrazit plný text záznamu

Report

The Rise and Potential of Large Language Model Based Agents: A Survey

For a long time, humanity has pursued artificial intelligence (AI) equivalent to or surpassing the human level, with AI agents considered a promising vehicle for this pursuit. AI agents are artificial entities that sense their environment, make decis

Externí odkaz: http://arxiv.org/abs/2309.07864

Zobrazit plný text záznamu

Report

Global Matching with Overlapping Attention for Optical Flow Estimation

Autor: Zhao, Shiyu, Zhao, Long, Zhang, Zhixing, Zhou, Enyu, Metaxas, Dimitris

Optical flow estimation is a fundamental task in computer vision. Recent direct-regression methods using deep neural networks achieve remarkable performance improvement. However, they do not explicitly capture long-term motion correspondences and thu

Externí odkaz: http://arxiv.org/abs/2203.11335

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání