Výsledky vyhledávání - "Prashanth, L.A."

A Survey of Risk-Aware Multi-Armed Bandits

Autor: Vincent Y. F. Tan, Prashanth L.A., Krishna Jagannathan

In several applications such as clinical trials and financial portfolio optimization, the expected value (or the average reward) does not satisfactorily capture the merits of a drug or a portfolio. In such applications, risk plays a crucial role, and

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::3b8c23da7ff9dd97cfd04d64a7c16a2b

Zobrazit plný text záznamu

Akademický článek

Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.

Akademický článek

Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.

Weighted bandits or: How bandits learn distorted values that are not expected

Autor: Gopalan, A., Prashanth L.A., Fu, M., Marcus, S.

Publikováno v: Scopus-Elsevier

Motivated by models of human decision making proposed to explain commonly observed deviations from conventional expected value preferences, we formulate two stochastic multi-armed bandit problems with distorted probabilities on the cost distributions

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::e0f6956c2e6df98c7b88b5c01e483349

Zobrazit plný text záznamu

Elektronická kniha

Stochastic Recursive Algorithms for Optimization : Simultaneous Perturbation Methods. [electronic resource]

Autor: Bhatnagar, S.

Externí odkaz: Kolekce e-knih KNAV Registrovani uzivatele: plny text online 5 minut, dalsi pristup na vyzadani. Registered users: full text online 5 minutes, further access on requests.

Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control

Autor: Prashanth L.A., Jie, C., Fu, M., Marcus, S., Szepesvari, C.

Publikováno v: Scopus-Elsevier

Cumulative prospect theory (CPT) is known to model human decisions well, with substantial empirical evidence supporting this claim. CPT works by distorting probabilities and is more general than the classic expected utility and coherent risk measures

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::21855451a1cda85ce1d5abc7ec6b9115
http://arxiv.org/abs/1506.02632

Zobrazit plný text záznamu

On TD(0) with function approximation: Concentration bounds and a centered variant with exponential convergence

Autor: Korda, N., Prashanth L.A.

Publikováno v: Scopus-Elsevier

We provide non-asymptotic bounds for the well-known temporal difference learning algorithm TD(0) with linear function approximators. These include high-probability bounds as well as bounds in expectation. Our analysis suggests that a step-size invers

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::89003d9ef54f17e13b75368ca6114a64
http://arxiv.org/abs/1411.3224

Zobrazit plný text záznamu

Akademický článek

Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.

Actor-Critic Algorithms for Risk-Sensitive MDPs

Autor: Prashanth L.A., Ghavamzadeh, M.

Publikováno v: [Technical Report] 2013
Scopus-Elsevier

In many sequential decision-making problems we may want to manage risk by minimizing some measure of variability in rewards in addition to maximizing a standard criterion. Variance-related risk measures are among the most common risk-sensitive criter

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=dedup_wf_001::5e1bac6f54063d6f1d1ccb12b713bf04
https://inria.hal.science/hal-00794721v2/document

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání