The CoPhIR (Content-based Photo Image Retrieval) Test-Collection has been developed to make significant tests on the scalability of the SAPIR project infrastructure (SAPIR: Search In Audio Visual Content Using Peer-to-peer IR) for similarity search.

CoPhIR is the result of a joint effort of NEMIS Lab and HPC Lab of ISTI-CNR in Pisa, Italy. We have extracted metadata from the Flickr archive, using the EGEE European GRID, through the DILIGENT project. For each image, the standard MPEG-7 image feature have been extracted. Each entry of the test-bed contains:

  • The link to the corresponding entry into Flickr Web site

  • The photo image thumbnail

  • An XML structure with the Flickr user information in the corresponding Flickr entry: title, location, GPS, tags, comments, etc.

  • An XML structure with 5 extracted standard MPEG-7 image features: Scalable Colour, Colour Structure, Colour Layout, Edge Histogram, Homogeneous Texture.

The data collected consist of 106 million processed images.

Data and Resources
To access the resources you must log in

    The resource: '' is not accessible as guest user. You must login to access it!
Additional Info
Field Value
Accessibility Virtual Access
AccessibilityMode Download
Attribution requirements
Availability On-Site
Basic rights Temporary download of a single copy only
ChildrenData No
Consent obtained also covers the envisaged transfer of the personal data outside the EU No
Consent of the data subject No
CreationDate 2009-12-31
Creator Paolo Bolettieri, Andrea Esuli, Fabrizio Falchi, Claudio Lucchese, Raffaele Perego, Tommaso Piccioli, Fausto Rabitti
DataProtectionDirective Not applicable
Display requirements
Distribution requirements
External Identifier
Field/Scope of use Research only
License term
ManifestationType Virtual
Personal data was manifestly made public by the data subject No
PersonalData No
ProcessingDegree Secondary
RelatedPaper Paolo Bolettieri, Andrea Esuli, Fabrizio Falchi, Claudio Lucchese, Raffaele Perego, Tommaso Piccioli, Fausto Rabitti. CoPhIR: a Test Collection for Content-Based Image Retrieval. CoRR abs/0905.4627: (2009)
Requirement of non-disclosure (confidentiality mark)
Restrictions on use
Semantic Coverage
Size 106 million processed images
Sublicense rights No
Territory of use World Wide
ThematicCluster Web Analytics
TimeCoverage 2009
system:type Dataset
Management Info
Field Value
Author Muntean Cristina
Maintainer Muntean Cristina
Version 1
Last Updated 26 September 2019, 12:28 (CEST)
Created 26 September 2019, 12:28 (CEST)