Zobrazeno 1 - 10
of 306
pro vyhledávání: '"MITTAL, PRATEEK"'
Traditional data influence estimation methods, like influence function, assume that learning algorithms are permutation-invariant with respect to training data. However, modern training paradigms, especially for foundation models using stochastic alg
Externí odkaz:
http://arxiv.org/abs/2412.09538
Autor:
Qi, Xiangyu, Wei, Boyi, Carlini, Nicholas, Huang, Yangsibo, Xie, Tinghao, He, Luxi, Jagielski, Matthew, Nasr, Milad, Mittal, Prateek, Henderson, Peter
Stakeholders -- from model developers to policymakers -- seek to minimize the dual-use risks of large language models (LLMs). An open challenge to this goal is whether technical safeguards can impede the misuse of LLMs, even when models are customiza
Externí odkaz:
http://arxiv.org/abs/2412.07097
This paper addresses the challenge of estimating high-dimensional parameters in non-standard data environments, where traditional methods often falter due to issues such as heavy-tailed distributions, data contamination, and dependent observations. W
Externí odkaz:
http://arxiv.org/abs/2410.12367
Autor:
Wu, Tong, Zhang, Shujian, Song, Kaiqiang, Xu, Silei, Zhao, Sanqiang, Agrawal, Ravi, Indurthi, Sathish Reddy, Xiang, Chong, Mittal, Prateek, Zhou, Wenxuan
Large Language Models (LLMs) are susceptible to security and safety threats, such as prompt injection, prompt extraction, and harmful requests. One major cause of these vulnerabilities is the lack of an instruction hierarchy. Modern LLM architectures
Externí odkaz:
http://arxiv.org/abs/2410.09102
Autor:
Panda, Ashwinee, Isik, Berivan, Qi, Xiangyu, Koyejo, Sanmi, Weissman, Tsachy, Mittal, Prateek
Existing methods for adapting large language models (LLMs) to new tasks are not suited to multi-task adaptation because they modify all the model weights -- causing destructive interference between tasks. The resulting effects, such as catastrophic f
Externí odkaz:
http://arxiv.org/abs/2406.16797
Autor:
Nair, Vineet J., Venkataramanan, Venkatesh, Srivastava, Priyank, Sarker, Partha S., Srivastava, Anurag, Marinovici, Laurentiu D., Zha, Jun, Irwin, Christopher, Mittal, Prateek, Williams, John, Kumar, Jayant, Poor, H. Vincent, Annaswamy, Anuradha M.
The electricity grid has evolved from a physical system to a cyber-physical system with digital devices that perform measurement, control, communication, computation, and actuation. The increased penetration of distributed energy resources (DERs) inc
Externí odkaz:
http://arxiv.org/abs/2406.14861
Autor:
Xie, Tinghao, Qi, Xiangyu, Zeng, Yi, Huang, Yangsibo, Sehwag, Udari Madhushani, Huang, Kaixuan, He, Luxi, Wei, Boyi, Li, Dacheng, Sheng, Ying, Jia, Ruoxi, Li, Bo, Li, Kai, Chen, Danqi, Henderson, Peter, Mittal, Prateek
Evaluating aligned large language models' (LLMs) ability to recognize and reject unsafe user requests is crucial for safe, policy-compliant deployments. Existing evaluation efforts, however, face three limitations that we address with SORRY-Bench, ou
Externí odkaz:
http://arxiv.org/abs/2406.14598
Data Shapley provides a principled framework for attributing data's contribution within machine learning contexts. However, existing approaches require re-training models on different data subsets, which is computationally intensive, foreclosing thei
Externí odkaz:
http://arxiv.org/abs/2406.11011
Autor:
Qi, Xiangyu, Panda, Ashwinee, Lyu, Kaifeng, Ma, Xiao, Roy, Subhrajit, Beirami, Ahmad, Mittal, Prateek, Henderson, Peter
The safety alignment of current Large Language Models (LLMs) is vulnerable. Relatively simple attacks, or even benign fine-tuning, can jailbreak aligned models. We argue that many of these vulnerabilities are related to a shared underlying issue: saf
Externí odkaz:
http://arxiv.org/abs/2406.05946
Autor:
Qi, Xiangyu, Huang, Yangsibo, Zeng, Yi, Debenedetti, Edoardo, Geiping, Jonas, He, Luxi, Huang, Kaixuan, Madhushani, Udari, Sehwag, Vikash, Shi, Weijia, Wei, Boyi, Xie, Tinghao, Chen, Danqi, Chen, Pin-Yu, Ding, Jeffrey, Jia, Ruoxi, Ma, Jiaqi, Narayanan, Arvind, Su, Weijie J, Wang, Mengdi, Xiao, Chaowei, Li, Bo, Song, Dawn, Henderson, Peter, Mittal, Prateek
The exposure of security vulnerabilities in safety-aligned language models, e.g., susceptibility to adversarial attacks, has shed light on the intricate interplay between AI safety and AI security. Although the two disciplines now come together under
Externí odkaz:
http://arxiv.org/abs/2405.19524