California-ND: An Annotated Dataset for Near-Duplicate Detection in Personal Photo Collections

Author: University of Illinois at Urbana-Champaign

Partner: No

Contact: Dr. Stefan Winkler, Vision & InterAction Group (Vintage), Advanced Digital Sciences Center (ADSC), University of Illinois at Urbana-Champaign (UIUC), (




Total: 701

SRC: 701

HRC: 1

Ratings: 10

Resolution: 1024x768

Method: Custom


Managing photo collections involves a variety of image quality assessment tasks, e.g. the selection of the “best” photos. Detecting near-duplicates is a prerequisite for automating these tasks. The California-ND dataset was created to assist researchers in testing algorithms for the detection of near duplicate images. Contrary to other existing datasets in this domain, California-ND contains 701 photos taken directly from a real user’s personal photo collection. As a result, while including many challenging non-identical near-duplicate cases without the use of artificial image transformations. The original image sequence was maintained as much as possible. More importantly, in order to deal with the inevitable subjectivity and ambiguity that near-duplicate cases exhibit, the dataset is annotated by 10 different subjects, including the photographer himself. These annotations can be combined into a non-binary ground truth, representing the probability that a pair of images is considered a near-duplicate.


The dataset is released under a creative commons license and can be downloaded here: Link: The zip-file is encrypted; please email ( for the password.


The dataset is released under a creative commons license (

References and Citation

Please cite the paper [JVW13] if you use the California-ND dataset.


  • JVW13: A. Jinda-Apiraksa, V. Vonikakis, S. Winkler.California-ND: An annotated dataset for near-duplicate detection in personal photo collections. Proc. 5th International Workshop on Quality of Multimedia Experience (QoMEX), Klagenfurt, Austria, July 3-5, 2013.