Human Belief State-Based Exploration and Exploitation in an Information-Selective Symmetric Reversal Bandit Task

Autor:	Dirk Ostwald, Michael P. Milham, Shruti Ray, Philipp Schwartenbeck, Lilla Horvath, Stanley J. Colcombe
Rok vydání:	2021
Předmět:	Computer science Bayesian probability Control (management) Exploratory research Exploitation Machine learning computer.software_genre Task (project management) Developmental and Educational Psychology Agent-based behavioral modeling Set (psychology) 100 Philosophie und Psychologie::150 Psychologie::150 Psychologie Structure (mathematical logic) Original Paper business.industry Heuristic Perspective (graphical) Probabilistic logic Personal effectiveness Bandit problem Neuropsychology and Physiological Psychology Artificial intelligence Exploration business computer Cognitive psychology
Zdroj:	Computational Brain & Behavior
DOI:	10.17169/refubium-38367
Popis:	Humans often face sequential decision-making problems, in which information about the environmental reward structure is detached from rewards for a subset of actions. For example, a medicated patient may consider partaking in a clinical trial on the effectiveness of a new drug. Taking part in the trial can provide the patient with information about the personal effectiveness of the new drug and the potential reward of a better treatment. Not taking part in the trial does not provide the patient with this information, but is associated with the reward of a (potentially less) effective treatment. In the current study, we introduce a novel information-selective reversal bandit task to model such situations and obtained choice data on this task from 24 participants. To arbitrate between different decision-making strategies that participants may use on this task, we developed a set of probabilistic agent-based behavioural models, including exploitative and explorative Bayesian agents, as well as heuristic control agents. Upon validating the model and parameter recovery properties of our model set and summarizing the participants’ choice data in a descriptive way, we used a maximum likelihood approach to evaluate the participants’ choice data from the perspective of our model set. In brief, we provide evidence that participants employ a belief state-based hybrid explorative-exploitative strategy on the information-selective reversal bandit task, lending further support to the finding that humans are guided by their subjective uncertainty when solving exploration-exploitation dilemmas.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::385af2ccf08327e2e60311a7992d98f2 Zobrazit plný text záznamu