Model-free or muddled models in the two stage task?

The simple dichotomy of model-free versus model-based learning is inadequate to explain behaviour in the two-stage task.

Like Comment
Read the paper

Once upon a time, we set out to investigate how stimulus presentation influences habitual learning in humans. Habits are thought to be learned via model-free learning, a strategy that operates by strengthening or weakening associations between stimuli and actions, depending on whether or not the action is followed by a reward. Conversely, another strategy known as model-based learning generates goal-directed behaviour by computing action values at decision time based on a model of the environment. Two-stage learning tasks have frequently been used to dissociate model-free and model-based influences on behaviour (Daw et al. 2011); therefore, to address our questions about habits, we designed a two-stage learning task. However, when we examined our results, we were confused—we observed negative effects of reward that could not be explained by model-free or model-based learning.

After a series of analyses, we concluded that our participants must have misunderstood our task. Inspired by a version of the two-stage task for children, we modified our instructions to tell participants a story that included causes and effects within a physical system, rather than just giving them abstract symbols and numerical probabilities. Such story-like instructions seemed to work well in previous studies, and therefore, we predicted that our improved instructions would alleviate participants' confusion. What we did not predict was that the new instructions eliminated all evidence of model-free learning. This left us even more confused than before.

We were puzzled because so many previous studies reported that both model-based and model-free learning are employed by humans and other animals. In particular, past experiments employing the original form of two-stage task have always found that healthy adult human participants use a mixture of model-free and model-based learning. Moreover, most studies implementing modifications to the two-stage task designed to promote model-based over model-free learning found a reduced, but still substantial influence of model-free learning on behaviour. Thus, it seemed that the influence of model-free learning on behaviour was ubiquitous and robust.

Our results differed from this well-established pattern. When we gave our participants instructions based on a story, we found that, on average, they behaved much more like pure model-based learners. Therefore, we wondered whether behaviour identified as (partially) model-free could, in fact, result from participants performing model-based learning, but using incorrect task models. In other words, could misconceptions of the task cause participants to muddle up the task model in their minds, and thus, falsely appear influenced by model-free learning?

In our paper ‘Humans primarily use model-based inference in the two-stage task’, we used a combination of new experiments, computer simulations, and re-analyses of previously published data to show that someone using an incorrect model-based strategy on the two stage task could easily be misclassified as a hybrid model-based/model-free learner. Moreover, we found ample evidence that human participants often form incorrect mental models of the two-stage task. Our paper thus indicates a need to reevaluate common assumptions about model-free versus model-based learning and the roles they play in both typical and atypical human behaviour.


Daw, N. D., Gershman, S. J., Seymour, B., Dayan, P. & Dolan, R. J. Model-based influences on humans’ choices and striatal prediction errors. Neuron 69, 1204–1215 (2011).

Carolina Feher da Silva

Postdoctoral researcher, University of Zurich