The ClueWeb12 dataset consists of 733,019,372 English web pages, collected between February 10, 2012 and May 10, 2012. It was created to support research on information retrieval and related human language technologies. ClueWeb12 is a companion or successor to the ClueWeb09 web dataset. Distribution of ClueWeb12 began in January 2013.

Data and Resources
To access the resources you must log in

This item has no data

Additional Info
Field Value
Accessibility Trans National Access
AccessibilityMode OnLine Access
Attribution requirements
Availability On-Site
Basic rights Temporary download of a single copy only
ChildrenData No
Consent obtained also covers the envisaged transfer of the personal data outside the EU No
Consent of the data subject No
CreationDate 2013-01-17
Creator Carnegie Mellon University
DataProtectionDirective Not applicable
Display requirements
Distribution requirements
External Identifier
Field/Scope of use Research only
Format ascii
License term
ManifestationType Replica
Personal data was manifestly made public by the data subject No
PersonalData No
ProcessingDegree Primary
Requirement of non-disclosure (confidentiality mark)
Restrictions on use
Semantic Coverage
Size 733,019,372 English web pages
Sublicense rights No
Territory of use World Wide
ThematicCluster Web Analytics
TimeCoverage 2012-02-10 - 2012-05-10
system:type Dataset
Management Info
Field Value
Author Muntean Cristina
Maintainer Muntean Cristina
Version 1
Last Updated 26 September 2019, 12:28 (CEST)
Created 26 September 2019, 12:28 (CEST)