Výsledky vyhledávání

Report

SafeAligner: Safety Alignment against Jailbreak Attacks via Response Disparity Guidance

Autor: Huang, Caishuang, Zhao, Wanxu, Zheng, Rui, Lv, Huijie, Dou, Shihan, Li, Sixian, Wang, Xiao, Zhou, Enyu, Ye, Junjie, Yang, Yuming, Gui, Tao, Zhang, Qi, Huang, Xuanjing

As the development of large language models (LLMs) rapidly advances, securing these models effectively without compromising their utility has become a pivotal area of research. However, current defense strategies against jailbreak attacks (i.e., effo

Externí odkaz: http://arxiv.org/abs/2406.18118

Zobrazit plný text záznamu

Report

Aligning Large Language Models from Self-Reference AI Feedback with one General Principle

Autor: Bao, Rong, Zheng, Rui, Dou, Shihan, Wang, Xiao, Zhou, Enyu, Wang, Bo, Zhang, Qi, Ding, Liang, Tao, Dacheng

In aligning large language models (LLMs), utilizing feedback from existing advanced AI rather than humans is an important method to scale supervisory signals. However, it is highly challenging for AI to understand human intentions and societal values

Externí odkaz: http://arxiv.org/abs/2406.11190

Zobrazit plný text záznamu

Report

MetaRM: Shifted Distributions Alignment via Meta-Learning

Autor: Dou, Shihan, Liu, Yan, Zhou, Enyu, Li, Tianlong, Jia, Haoxiang, Xiong, Limao, Zhao, Xin, Ye, Junjie, Zheng, Rui, Gui, Tao, Zhang, Qi, Huang, Xuanjing

The success of Reinforcement Learning from Human Feedback (RLHF) in language model alignment is critically dependent on the capability of the reward model (RM). However, as the training process progresses, the output distribution of the policy model

Externí odkaz: http://arxiv.org/abs/2405.00438

Zobrazit plný text záznamu

Report

CC2Vec: Combining Typed Tokens with Contrastive Learning for Effective Code Clone Detection

Autor: Dou, Shihan, Wu, Yueming, Jia, Haoxiang, Zhou, Yuhao, Liu, Yan, Liu, Yang

With the development of the open source community, the code is often copied, spread, and evolved in multiple software systems, which brings uncertainty and risk to the software system (e.g., bug propagation and copyright infringement). Therefore, it

Externí odkaz: http://arxiv.org/abs/2405.00428

Zobrazit plný text záznamu

Report

EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models

Jailbreak attacks are crucial for identifying and mitigating the security vulnerabilities of Large Language Models (LLMs). They are designed to bypass safeguards and elicit prohibited outputs. However, due to significant differences among various jai

Externí odkaz: http://arxiv.org/abs/2403.12171

Zobrazit plný text záznamu

Report

CodeChameleon: Personalized Encryption Framework for Jailbreaking Large Language Models

Autor: Lv, Huijie, Wang, Xiao, Zhang, Yuansen, Huang, Caishuang, Dou, Shihan, Ye, Junjie, Gui, Tao, Zhang, Qi, Huang, Xuanjing

Adversarial misuse, particularly through `jailbreaking' that circumvents a model's safety and ethical protocols, poses a significant challenge for Large Language Models (LLMs). This paper delves into the mechanisms behind such successful attacks, int

Externí odkaz: http://arxiv.org/abs/2402.16717

Zobrazit plný text záznamu

Report

Advancing Translation Preference Modeling with RLHF: A Step Towards Cost-Effective Solution

Autor: Xu, Nuo, Zhao, Jun, Zu, Can, Li, Sixian, Chen, Lu, Zhang, Zhihao, Zheng, Rui, Dou, Shihan, Qin, Wenjuan, Gui, Tao, Zhang, Qi, Huang, Xuanjing

Faithfulness, expressiveness, and elegance is the constant pursuit in machine translation. However, traditional metrics like \textit{BLEU} do not strictly align with human preference of translation quality. In this paper, we explore leveraging reinfo

Externí odkaz: http://arxiv.org/abs/2402.11525

Zobrazit plný text záznamu

Report

Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning

In this paper, we propose R$^3$: Learning Reasoning through Reverse Curriculum Reinforcement Learning (RL), a novel method that employs only outcome supervision to achieve the benefits of process supervision for large language models. The core challe

Externí odkaz: http://arxiv.org/abs/2402.05808

Zobrazit plný text záznamu

Report

StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback

Autor: Dou, Shihan, Liu, Yan, Jia, Haoxiang, Xiong, Limao, Zhou, Enyu, Shen, Wei, Shan, Junjie, Huang, Caishuang, Wang, Xiao, Fan, Xiaoran, Xi, Zhiheng, Zhou, Yuhao, Ji, Tao, Zheng, Rui, Zhang, Qi, Huang, Xuanjing, Gui, Tao

The advancement of large language models (LLMs) has significantly propelled the field of code generation. Previous work integrated reinforcement learning (RL) with compiler feedback for exploring the output space of LLMs to enhance code generation qu

Externí odkaz: http://arxiv.org/abs/2402.01391

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání