Vision is arguably the sense that we humans rely on most to navigate the world. The deluge of information that enters the brain via the eyes has to be filtered and processed so that relevant aspects can be understood. This filtering process is what we call visual attention. Only the center of the retina relays high resolution information to the brain. This requires us to center our eyes on whatever we are currently attending to. Predictably, most of the time, eye position and attention align. In vision and visual attention research the two concepts are often treated as equal.
This is a plausible and often useful simplification. Visual attention itself is a latent state, i.e. it is not directly measurable. Eye movement on the other hand is straight forward to track. The resulting data contains many systematic effects, which reflect effects of visual attention.
In very controlled environments visual attention can also be measured by performance metrics. The rationale is that if covert attention is deployed to a location, reporting of information at that location should be improved. This kind of research finds performance improvements just before and just after eye movements. It appears that attention is deployed to the upcoming location before the actual movement occurs.
Thus, during each eye movement there exist shifts in attention, where visual attention and eye position are decoupled for a moment. We wondered whether shifts at the micro-level contribute to macro-level eye movement statistics.
The paper describes how we extended an existing biologically inspired model of eye movement on scenes. We added attention shifts before and after each movement to the SceneWalk model. The model computes moment-to-moment probabilistic priority maps from which we sample eye movements .
The priority map is the result of combining two activity streams. The activation stream drives attention based on visual saliency and assumptions about attentional deployment. The inhibition stream
tags previously fixated locations to drive exploration behavior.
The activation stream is analogous to the latent visual attention. In the GIF below we show how both streams evolve over the course of each fixation. We also show how the priority map changes over time.
We can use this extended model of eye movement to generate eye movement patterns. We then compare simulated eye movements using a variety of statistics found in empirical data. The simulated data from the new model represent the empirical data better than the model without attention shifts. The model likelihood of the new model is also improved.
Our knowledge of how the brain builds an understanding of the visual world is very limited. It is a highly complex problem, as computer vision researchers are also discovering when trying to create artificial vision. The biological visual system solves the task by directing the center of the visual field at different aspects of a scene in turn. The choice of locations is based on various factors, from visual saliency to given task. Our paper shows the importance of considering properties of latent states in the conceptual system. Systematic eye movement statistics reveal how they impact the observed behavior.