MIT Modeling search for people

Author: MIT

Partner: No

Contact: Aude Oliva (




Total: 912

Ratings: 14


How predictable are human eye movements as they search real world scenes? Here, we recorded 14 observers’ eye movements as they performed a search task (person detection) on 912 outdoor scenes. Searchers demonstrated high consistency of fixation locations, even when the target was absent from the scene. Furthermore, observers tended to fixate consistent regions even when those regions were not visually salient. We modeled three sources of guidance: saliency, target features, and scene context. Each of these sources independently outperformed a smart chance level at predicting human fixations. Models that combine sources of guidance predicted 94% of human agreement, with the scene context module providing the most explanatory power. Critically, none of the models could reach the precision and fidelity of a human-based attentional map. This work establishes a benchmark for computational models of search in real world scenes. Further improvements in modeling should capture mechanisms underlying the selectivity of observer’s fixations during search.


Data set: Image stimuli: Link: Data set: Eye data: Link: Data set: Context oracle maps: Link: Pre-generated maps: Target features Maps: Link: Pre-generated maps: Saliency Maps:

References and Citation

  • EHT09: Krista A. Ehinger, Barbara Hidalgo-Sotelo, Antonio Torralba, Aude Oliva, Modelling search for people in 900 scenes: A combined source model of eye guidance, Modelling search for people in 900 scenes: A combined source model of eye guidance, VISUAL COGNITION, 2009, 17 (6/7), 945-978.