Zobrazeno 1 - 8
of 8
pro vyhledávání: '"Madhavan, Rahul"'
We study a variant of causal contextual bandits where the context is chosen based on an initial intervention chosen by the learner. At the beginning of each round, the learner selects an initial action, depending on which a stochastic context is reve
Externí odkaz:
http://arxiv.org/abs/2405.18626
Autor:
Madhavan, Rahul, Wadhawan, Kahini
We study attribute control in language models through the method of Causal Average Treatment Effect (Causal ATE). Existing methods for the attribute control task in Language Models (LMs) check for the co-occurrence of words in a sentence with the att
Externí odkaz:
http://arxiv.org/abs/2311.11229
Publikováno v:
Findings of the Association for Computational Linguistics: ACL 2023
We propose a method to control the attributes of Language Models (LMs) for the text generation task using Causal Average Treatment Effect (ATE) scores and counterfactual augmentation. We explore this method, in the context of LM detoxification, and p
Externí odkaz:
http://arxiv.org/abs/2306.00374
We study the causal bandit problem that entails identifying a near-optimal intervention from a specified set $A$ of (possibly non-atomic) interventions over a given causal graph. Here, an optimal intervention in ${A}$ is one that maximizes the expect
Externí odkaz:
http://arxiv.org/abs/2305.04638
We study Markov Decision Processes (MDP) wherein states correspond to causal graphs that stochastically generate rewards. In this setup, the learner's goal is to identify atomic interventions that lead to high rewards by intervening on variables at e
Externí odkaz:
http://arxiv.org/abs/2111.00886
Autor:
Madhavan, Rahul, Makwana, Hemanta
We study the feature-scaled version of the Monte Carlo algorithm with linear function approximation. This algorithm converges to a scale-invariant solution, which is not unduly affected by states having feature vectors with large norms. The usual ver
Externí odkaz:
http://arxiv.org/abs/2104.07361
Autor:
Madhavan, Rahul, Baraskar, Ankit
We have created a framework for analyzing subscription based businesses in terms of a unified metric which we call SCV (single customer value). The major advance in this paper is to model customer churn as an exponential decay variable, which directl
Externí odkaz:
http://arxiv.org/abs/1704.05729
Overdetermined linear systems are common in reinforcement learning, e.g., in Q and value function estimation with function approximation. The standard least-squares criterion, however, leads to a solution that is unduly influenced by rows with large
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_________::73db666deb7363666166a5f4dcf4f106