Beyond reward prediction: validating an integrative reinforcement learning model using human behavioral data.

IRIS

Reinforcement learning (RL) models involving phasic striatal dopamine as signal of reward prediction errors (RPEs) are central to many learning theories. Recent rodent studies suggest striatal dopaminergic neurons support different reward-independent learning processes. Various computational models have been proposed. We aim to validate a novel, integrative RL model that integrates the disparate evidence and proposals into two learning systems: (1) a value-based module, that learns how to behave (policy) given what an agent wants (value function), and updates through RPEs; (2) an inverse RL module, that learns what an agent wants given how it behaves, and updates through action prediction errors (APEs). If indeed the two algorithms were driven by different types of information, some statistical traits in evidence may differentially bias which system engages more optimallywith the ongoing task. Several studies have proposed that different noise types in evidence can cause opposite cognitive biases in decision making and learning. Thus, we intend to use a task with two trial types, manipulating two evidence noise types, to probe both systems properties.

Beyond reward prediction: validating an integrative reinforcement learning model using human behavioral data.

Alejandro Sospedra Orellano

2024

Abstract

Reinforcement learning (RL) models involving phasic striatal dopamine as signal of reward prediction errors (RPEs) are central to many learning theories. Recent rodent studies suggest striatal dopaminergic neurons support different reward-independent learning processes. Various computational models have been proposed. We aim to validate a novel, integrative RL model that integrates the disparate evidence and proposals into two learning systems: (1) a value-based module, that learns how to behave (policy) given what an agent wants (value function), and updates through RPEs; (2) an inverse RL module, that learns what an agent wants given how it behaves, and updates through action prediction errors (APEs). If indeed the two algorithms were driven by different types of information, some statistical traits in evidence may differentially bias which system engages more optimallywith the ongoing task. Several studies have proposed that different noise types in evidence can cause opposite cognitive biases in decision making and learning. Thus, we intend to use a task with two trial types, manipulating two evidence noise types, to probe both systems properties.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2024

Appare nelle tipologie:

4.3 Poster

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11591/559328

Citazioni

ND

ND

ND

social impact