Zobrazeno 1 - 5
of 5
pro vyhledávání: '"Sinha, Aradhana"'
Safety classifiers are critical in mitigating toxicity on online forums such as social media and in chatbots. Still, they continue to be vulnerable to emergent, and often innumerable, adversarial attacks. Traditional automated adversarial data genera
Externí odkaz:
http://arxiv.org/abs/2406.17104
Autor:
Srinivasan, Hansa, Schumann, Candice, Sinha, Aradhana, Madras, David, Olanubi, Gbolahan Oluwafemi, Beutel, Alex, Ricco, Susanna, Chen, Jilin
Capturing the diversity of people in images is challenging: recent literature tends to focus on diversifying one or two attributes, requiring expensive attribute labels or building classifiers. We introduce a diverse people image ranking method which
Externí odkaz:
http://arxiv.org/abs/2401.14322
Autor:
Balashankar, Ananth, Ma, Xiao, Sinha, Aradhana, Beirami, Ahmad, Qin, Yao, Chen, Jilin, Beutel, Alex
As large language models (LLMs) are widely adopted, new safety issues and policies emerge, to which existing safety classifiers do not generalize well. If we have only observed a few examples of violations of a new safety rule, how can we build a cla
Externí odkaz:
http://arxiv.org/abs/2310.16959
Autor:
Sinha, Aradhana, Balashankar, Ananth, Beirami, Ahmad, Avrahami, Thi, Chen, Jilin, Beutel, Alex
Publikováno v:
Transactions on Machine Learning Research (2024)
Real-world natural language processing systems need to be robust to human adversaries. Collecting examples of human adversaries for training is an effective but expensive solution. On the other hand, training on synthetic attacks with small perturbat
Externí odkaz:
http://arxiv.org/abs/2310.16955
Autor:
Sinha, Aradhana
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and
Externí odkaz:
http://hdl.handle.net/1721.1/113109