Zobrazeno 1 - 10
of 116
pro vyhledávání: '"Huang, Tiansheng"'
Harmful fine-tuning attack poses a serious threat to the online fine-tuning service. Vaccine, a recent alignment-stage defense, applies uniform perturbation to all layers of embedding to make the model robust to the simulated embedding drift. However
Externí odkaz:
http://arxiv.org/abs/2410.09760
Combining large language models during training or at inference time has shown substantial performance gain over component LLMs. This paper presents LLM-TOPLA, a diversity-optimized LLM ensemble method with three unique properties: (i) We introduce t
Externí odkaz:
http://arxiv.org/abs/2410.03953
Recent research demonstrates that the nascent fine-tuning-as-a-service business model exposes serious safety concerns -- fine-tuning over a few harmful data uploaded by the users can compromise the safety alignment of the model. The attack, known as
Externí odkaz:
http://arxiv.org/abs/2409.18169
Data poisoning and leakage risks impede the massive deployment of federated learning in the real world. This chapter reveals the truths and pitfalls of understanding two dominating threats: {\em training data privacy intrusion} and {\em training data
Externí odkaz:
http://arxiv.org/abs/2409.13004
Booster: Tackling Harmful Fine-tuning for Large Language Models via Attenuating Harmful Perturbation
Harmful fine-tuning issue \citep{qi2023fine} poses serious safety concerns for Large language models' fine-tuning-as-a-service. While existing defenses \citep{huang2024vaccine,rosati2024representation} have been proposed to mitigate the issue, their
Externí odkaz:
http://arxiv.org/abs/2409.01586
Safety aligned Large Language Models (LLMs) are vulnerable to harmful fine-tuning attacks \cite{qi2023fine}-- a few harmful data mixed in the fine-tuning dataset can break the LLMs's safety alignment. Existing mitigation strategies include alignment
Externí odkaz:
http://arxiv.org/abs/2408.09600
Face recognition (FR) can be abused for privacy intrusion. Governments, private companies, or even individual attackers can collect facial images by web scraping to build an FR system identifying human faces without their consent. This paper introduc
Externí odkaz:
http://arxiv.org/abs/2407.13975
Recent studies show that Large Language Models (LLMs) with safety alignment can be jail-broken by fine-tuning on a dataset mixed with harmful data. First time in the literature, we show that the jail-broken effect can be mitigated by separating state
Externí odkaz:
http://arxiv.org/abs/2405.18641
Autor:
Tekin, Selim Furkan, Ilhan, Fatih, Huang, Tiansheng, Hu, Sihao, Chow, Ka-Ho, Loper, Margaret L., Liu, Ling
This paper presents FusionShot, a focal diversity optimized few-shot ensemble learning approach for boosting the robustness and generalization performance of pre-trained few-shot models. The paper makes three original contributions. First, we explore
Externí odkaz:
http://arxiv.org/abs/2404.04434
Autor:
Hu, Sihao, Huang, Tiansheng, Ilhan, Fatih, Tekin, Selim, Liu, Gaowen, Kompella, Ramana, Liu, Ling
The development of game agents holds a critical role in advancing towards Artificial General Intelligence (AGI). The progress of LLMs and their multimodal counterparts (MLLMs) offers an unprecedented opportunity to evolve and empower game agents with
Externí odkaz:
http://arxiv.org/abs/2404.02039