Popis: |
Reinforcement learning models of the basal ganglia map the phasic dopamine signal to reward prediction errors (RPEs). Conventional models assert that, when a stimulus reliably predicts a reward with fixed delay, dopamine activity during the delay period and at reward time should converge to baseline through learning. However, recent studies have found that dopamine exhibits a gradual ramp before reward in certain conditions even after extensive learning, such as when animals are trained to run to obtain the reward, thus challenging the conventional RPE models. In this work, we begin with the limitation of temporal uncertainty (animals cannot perfectly estimate time to reward), and show that sensory feedback, which reduces this uncertainty, will cause an unbiased learner to produce RPE ramps. On the other hand, in the absence of feedback, RPEs will be flat after learning. These results reconcile the seemingly conflicting data on dopamine behaviors under the RPE hypothesis. |