Eye fixations of 3 participants for 1000 images from the Pascal VOC dataset and of 8 participants for 104 images from the SUN09 dataset. Also includes descriptions, pre-trained object detectors, and associated bounding boxes. We posit that user behavior during natural viewing of images contains an abundance of information about the content of images as well as information related to user intent and user defined content importance. In this paper, we conduct experiments to better understand the relationship between images, the eye movements people make while viewing images, and how people construct natural language to describe images. We explore these relationships in the context of two commonly used computer vision datasets. We then further relate human cues with outputs of current visual recognition systems and demonstrate prototype applications for gaze-enabled detection and annotation.
Data and code available in one archive (183.2MB): Link: http://www.cs.stonybrook.edu/~kyun/research/gaze/dataset/SBUGazeDetectionDescriptionDataset_v0.2.tgz
References and Citation
YPA13: Kiwon Yun, Yifan Peng, Hossein Adeli, Tamara L. Berg, Dimitris Samaras, and Gregory J. Zelinsky, Specifying the Relationships Between Objects, Gaze, and Descriptions for Scene Understanding, Visual Science Society (VSS) 2013 (Florida/USA)
YPS13a: Kiwon Yun, Yifan Peng, Dimitris Samaras, Gregory J. Zelinsky, and Tamara L. Berg, Studying Relationships Between Human Gaze, Description, and Computer Vision, Computer Vision and Pattern Recognition (CVPR) 2013 (Oregon/USA)
YPS13b: Kiwon Yun, Yifan Peng, Dimitris Samaras, Gregory J. Zelinsky and Tamara L. Berg, Exploring the Role of Gaze Behavior and Object Detection in Scene Understanding, Frontiers in Psychology, December 2013, 4(917): 1-14