How predictable are human eye movements as they search real world scenes? Here, we recorded 14 observers’ eye movements as they performed a search task (person detection) on 912 outdoor scenes. Searchers demonstrated high consistency of fixation locations, even when the target was absent from the scene. Furthermore, observers tended to fixate consistent regions even when those regions were not visually salient. We modeled three sources of guidance: saliency, target features, and scene context. Each of these sources independently outperformed a smart chance level at predicting human fixations. Models that combine sources of guidance predicted 94% of human agreement, with the scene context module providing the most explanatory power. Critically, none of the models could reach the precision and fidelity of a human-based attentional map. This work establishes a benchmark for computational models of search in real world scenes. Further improvements in modeling should capture mechanisms underlying the selectivity of observer’s fixations during search.
Data set: Image stimuli: Link: http://cvcl.mit.edu/searchmodels/Dataset_STIMULI.zip Data set: Eye data: Link: http://cvcl.mit.edu/searchmodels/Dataset_EyeData.zip Data set: Context oracle maps: Link: http://cvcl.mit.edu/searchmodels/Dataset_ContextOracle.zip Pre-generated maps: Target features Maps: Link: http://cvcl.mit.edu/searchmodels/targetFeatureMaps.zip Pre-generated maps: Saliency Maps: http://cvcl.mit.edu/searchmodels/saliencyMaps.zip
References and Citation
EHT09: Krista A. Ehinger, Barbara Hidalgo-Sotelo, Antonio Torralba, Aude Oliva, Modelling search for people in 900 scenes: A combined source model of eye guidance, Modelling search for people in 900 scenes: A combined source model of eye guidance, VISUAL COGNITION, 2009, 17 (6/7), 945-978.