The most popular 400 queries to query the videos from YouTube were chosen. Those queries are selected from Google Zeitgeist. Each year, Google examines billions of queries that people around the world have typed into Google search to discover the Zeitgeist saved in Google Zeitgeist Archives. Google Zeitgeist Archives from 2004 to 2009 were collected, and the most popular 400 queries were chosen to search YouTube. The downloaded number of videos for each query is up to 1000. More than 200K YouTube videos were crawled from July 2010 to September 2010. After filtering out the videos with sizes greater than 10M, the Combined Dataset contains 169,952 videos in total. Moreover 3,305,525 keyframes from these videos were extracted. This dataset is released to public so that other researchers will be able to use it as a test bed.
The size of the entire dataset is approx. 2.8GB and it is available for download in one archive. Link: http://dropbox.eait.uq.edu.au/uqhshen/uq_video/
The original link was: http://itee.uq.edu.au/~shenht/UQ_VIDEO/
References and Citation
Please, cite the following paper in your reference if you use this database for your work [SYH11].
SYH11: Jingkuan Song, Yi Yang, Zi Huang, Heng Tao Shen, Richang Hong: Multiple feature hashing for real-time large scale near-duplicate video retrieval. ACM Multimedia, pages 423-432, 2011.