Zobrazeno 1 - 10
of 1 090
pro vyhledávání: '"P. Himabindu"'
Data Attribution (DA) methods quantify the influence of individual training data points on model outputs and have broad applications such as explainability, data selection, and noisy label identification. However, existing DA methods are often comput
Externí odkaz:
http://arxiv.org/abs/2410.09940
Autor:
Qi, Zhenting, Luo, Hongyin, Huang, Xuliang, Zhao, Zhuokai, Jiang, Yibo, Fan, Xiangjun, Lakkaraju, Himabindu, Glass, James
While large language models (LLMs) have shown exceptional capabilities in understanding complex queries and performing sophisticated tasks, their generalization abilities are often deeply entangled with memorization, necessitating more precise evalua
Externí odkaz:
http://arxiv.org/abs/2410.01769
Autor:
Rawal, Kaivalya, Lakkaraju, Himabindu
This paper presents a novel technique for incorporating user input when learning and inferring user preferences. When trying to provide users of black-box machine learning models with actionable recourse, we often wish to incorporate their personal p
Externí odkaz:
http://arxiv.org/abs/2409.13940
Predictive machine learning models are becoming increasingly deployed in high-stakes contexts involving sensitive personal data; in these contexts, there is a trade-off between model explainability and data privacy. In this work, we push the boundari
Externí odkaz:
http://arxiv.org/abs/2407.17663
Do different generative image models secretly learn similar underlying representations? We investigate this by measuring the latent space similarity of four different models: VAEs, GANs, Normalizing Flows (NFs), and Diffusion Models (DMs). Our method
Externí odkaz:
http://arxiv.org/abs/2407.13449
As Artificial Intelligence (AI) tools are increasingly employed in diverse real-world applications, there has been significant interest in regulating these tools. To this end, several regulatory frameworks have been introduced by different countries
Externí odkaz:
http://arxiv.org/abs/2407.08689
As Large Language Models (LLMs) are increasingly being employed in real-world applications in critical domains such as healthcare, it is important to ensure that the Chain-of-Thought (CoT) reasoning generated by these models faithfully captures their
Externí odkaz:
http://arxiv.org/abs/2406.10625
Interpretability is the study of explaining models in understandable terms to humans. At present, interpretability is divided into two paradigms: the intrinsic paradigm, which believes that only models designed to be explained can be explained, and t
Externí odkaz:
http://arxiv.org/abs/2405.05386
More RLHF, More Trust? On The Impact of Human Preference Alignment On Language Model Trustworthiness
The surge in Large Language Models (LLMs) development has led to improved performance on cognitive tasks as well as an urgent need to align these models with human values in order to safely exploit their power. Despite the effectiveness of preference
Externí odkaz:
http://arxiv.org/abs/2404.18870
Autor:
Kumar, Aounon, Lakkaraju, Himabindu
Large language models (LLMs) are increasingly being integrated into search engines to provide natural language responses tailored to user queries. Customers and end-users are also becoming more dependent on these models for quick and easy purchase de
Externí odkaz:
http://arxiv.org/abs/2404.07981