Div400: A Social Image Retrieval Result Diversification Dataset

Author: University Politehnica of Bucharest

Partner: No

Contact: Bogdan Ionescu (bionescu@imag.pub.ro)




Total: 43418


New dataset, Div400, was designed to support shared evaluation in different areas of social media photo retrieval, e.g., machine analysis (re-ranking, machine learning), human-based computation (crowdsourcing) or hybrid approaches (relevance feedback, machine-crowd integration). Div400 comes with associated relevance and diversity assessments performed by human annotators. 396 landmark locations are represented via 43,418 Flickr photos and metadata, Wikipedia pages and content descriptors for text and visual modalities. To facilitate distribution, only Creative Commons content was included in the dataset. The proposed dataset was validated during the 2013 Retrieving Diverse Social Images Task at the MediaEval Benchmarking Initiative for Multimedia Evaluation.


The files are available for download via HTTP. Link: http://traces.cs.umass.edu/index.php/Mmsys/Mmsys The files are available in one archive (7.6GB) for download via HTTP: Link: http://skuld.cs.umass.edu/traces/mmsys/2014/user01.tar

References and Citation

Use of the datasets in published work should be acknowledged by a full citation to the authors' papers [IRM14] at the MMSys conference (Proceedings of ACM MMSys 2014, March 19 - March 21, 2014, Singapore, Singapore).


  • IRM14: B. Ionescu, A.-L. Radu, M. Menéndez, H. Müller, A. Popescu, B. Loni, Div400: a social image retrieval result diversification dataset, Proceedings of ACM MMSys 2014, March 19 - March 21, 2014, Singapore, Singapore.