Different learning strategies used during Pavlovian conditioning

In Pavlovian conditioning, people form associations between a neutral stimulus (e.g. a bell) and an upcoming unconditioned stimulus (e.g. food). The neutral stimulus later becomes the conditioned stimulus because it elicits the same response as the unconditioned stimulus. People can learn these associations using a value-based or an uncertainty-based strategy. In value-based learning, learning occurs based on the difference between the expected reward and the actual reward received, which is the reward prediction error. In uncertainty-based learning, people learn the probability that a conditioned stimulus will elicit a specific unconditioned stimulus, which generates the state prediction error. There are individual differences in whether people pay more attention to the conditioned stimulus (sign-trackers) or the unconditioned stimulus (goal-trackers). The neural basis of these learning strategies is not yet well understood. This week in Nature Human Behavior, Schad and colleagues used eye-tracking and functional magnetic resonance imaging (fMRI) techniques to investigate the neural substrates of learning strategies used by sign-trackers and goal-trackers.

How did they do it?

Participants were 129 male adults who completed a Pavlovian conditioning task in the fMRI scanner while their eye movements were recorded. They learned associations between visual-auditory cues that predicted monetary reward (appetitive conditioned stimulus; $1, $2), no reward (neutral conditioned stimulus: $0), or loss (aversive conditioned stimulus; -$1, -$2). The authors computed a gaze index to categorize participants as sign-trackers or goal-trackers. The gaze index is the difference between the proportion of fixations made to the unconditioned stimulus and the proportion of fixations made to the conditioned stimulus. A value of 0 indicates that participants made an equal proportion of fixations to both conditioned and unconditioned stimuli, whereas positive and negative values indicate that they made more fixations to the conditioned and the unconditioned stimulus, respectively. To identify sign-trackers, the authors examined the relationship between gaze index and the value of the conditioned stimulus. The top third of the participants who looked more frequently at the conditioned stimulus predicting monetary rewards than at the conditioned stimulus predicting losses were deemed to be sign-trackers. A similar analysis was conducted with the value of the unconditioned stimulus to identify goal-trackers. Eye movement behavior during the conditioning task, including pupil dilation and the number of fixations, was compared across the two groups for the different stimuli.


The authors used computational modeling to determine whether the eye movement patterns of sign-trackers and goal-trackers during the conditioning task reflected value-based or uncertainty-based learning strategies. Value-based learning was assessed in a reinforcement learning model that computes a reward-prediction error value. On the other hand, uncertainty-based learning was assessed in a model that produced a state prediction error value. Finally, the authors examined the neural substrates of the different learning strategies. They used a reinforcement learning model to compute reward prediction error signals in reward-processing regions like the nucleus accumbens in response to the stimuli. Uncertainty-based learning was investigated by computing the state prediction error signal at the onset of the unconditioned stimulus in regions associated with the state prediction error effect, such as the intraparietal sulcus and the lateral prefrontal cortex. The reward prediction and state prediction error effects in the brain were compared between sign-trackers and goal-trackers.

What did they find?

Sign-trackers made more fixations to the appetitive conditioned stimulus associated with a monetary reward than to the aversive conditioned stimulus associated with monetary loss, which is in line with value-based learning. Conversely, goal-trackers made more fixations to the appetitive unconditioned stimulus more than the aversive unconditioned stimulus, but over time, they looked away from the conditioned stimuli and more at the unconditioned stimuli, which is in line with uncertainty-based learning. Pupil dilation in response to both conditioned and unconditioned stimuli also differed between sign-trackers and goal-trackers. Pupil size decreased over the course of learning in goal-trackers but did not change in response to the appetitive and aversive conditioned stimuli. Among sign-trackers, there was no change in pupil size over time, but the pupils dilated in response to appetitive conditioned stimuli compared to the aversive conditioned stimuli. Computational modeling indicated that the value-based model captured pupillary changes for sign-trackers, whereas the uncertainty-based model best explained the pupil dilation in goal-trackers. Overall, these results suggest that eye movement behavior tracks the value of the stimulus among sign-trackers and the upcoming expectation state among goal-trackers.


Distinct patterns of brain activity were associated with learning strategies used by sign-trackers and goal-trackers. In reward-processing brain regions such as the nucleus accumbens, ventromedial prefrontal cortex, and amygdala, the value-based model explained more variance in brain activity for the sign-trackers than for the goal-trackers. In contrast, the uncertainty-based model better explained activity in the intraparietal sulcus of the goal-trackers than the sign-trackers. In sum, the eye-tracking and neural data indicate that sign-trackers used value-based learning strategies while goal-trackers relied more on an uncertainty-based strategy.

What’s the impact?

This study is the first to demonstrate the distinct behavioral and neural profiles of learning in sign-trackers, who primarily use value-based learning strategies, and goal-trackers, who rely on uncertainty-based learning. By providing a deeper understanding of the different learning systems in humans, these findings have important implications for the treatment of disorders that involve aberrant reward learning, like addiction.