Managing photo collections involves a variety of image quality assessment tasks, e.g. the selection of the “best” photos. Detecting near-duplicates is a prerequisite for automating these tasks. The California-ND dataset was created to assist researchers in testing algorithms for the detection of near duplicate images. Contrary to other existing datasets in this domain, California-ND contains 701 photos taken directly from a real user’s personal photo collection. As a result, while including many challenging non-identical near-duplicate cases without the use of artificial image transformations. The original image sequence was maintained as much as possible. More importantly, in order to deal with the inevitable subjectivity and ambiguity that near-duplicate cases exhibit, the dataset is annotated by 10 different subjects, including the photographer himself. These annotations can be combined into a non-binary ground truth, representing the probability that a pair of images is considered a near-duplicate.
The dataset is released under a creative commons license and can be downloaded here: Link: http://vintage.winklerbros.net/californiaND.zip The zip-file is encrypted; please email (email@example.com) for the password.
LicenseThe dataset is released under a creative commons license (http://creativecommons.org/licenses/by-nc-nd/3.0/).
References and Citation
Please cite the paper [JVW13] if you use the California-ND dataset.
JVW13: A. Jinda-Apiraksa, V. Vonikakis, S. Winkler.California-ND: An annotated dataset for near-duplicate detection in personal photo collections. Proc. 5th International Workshop on Quality of Multimedia Experience (QoMEX), Klagenfurt, Austria, July 3-5, 2013.