approved
ClueWeb09

The ClueWeb09 dataset consists of about 1 billion web pages in ten languages that were collected in January and February 2009. It was created to support research on information retrieval and related human language technologies. It is currently used by several tracks of the TREC conference.

Tags
Data and Resources
To access the resources you must log in

This item has no data

Personal Data Attributes

Description: Personal Data related Information

Field Value
ChildrenData No
Personal Data No
Personal data was manifestly made public by the data subject No
Additional Info
Field Value
Accessibility Trans National Access
Accessibility Mode OnLine Access
Availability On-Site
Basic rights Temporary download of a single copy only
Consent obtained also covers the envisaged transfer of the personal data outside the EU No
Consent of the data subject No
Creation Date 2011-06-14
Creator AA.VV., Carnegie Mellon University
DataProtectionDirective Not applicable
Field/Scope of use Research only
Group Societal Debates and Misinformation
Language eng, English
Manifestation Type Replica
Processing Degree Primary
Size 1 billion web pages
SoBigData Node SoBigData EU
Sublicense rights No
Territory of use World Wide
Thematic Cluster Web Analytics [WA]
TimeCoverage 2009-01-01 - 2009-02-28
system:type Dataset
Management Info
Field Value
Author Muntean Cristina
Maintainer Muntean Cristina
Version 1
Last Updated 27 October 2023, 01:20 (CEST)
Created 29 April 2021, 11:15 (CEST)