GERDAQ Dataset - Items - D4Science Catalogue

Item
Groups

approved

GERDAQ Dataset

This is a benchmark dataset of annotated search-engine queries. Mentions of entities in search-engine queries are tagged with the entity they refer to. Wikipedia is used as knowledge base.

For example, the query armstrong moon landing is tagged with two annotations:

armstrong -> http://en.wikipedia.org/wiki/Neil_Armstrong
moon landing -> http://en.wikipedia.org/wiki/Moon_landing

While the query armstrong doping is tagged with:

armstrong -> http://en.wikipedia.org/wiki/Lance_Armstrong
doping -> http://en.wikipedia.org/wiki/Doping_in_sport

The dataset has been constructed through the Crowdflower crowdsourcing platform. Queries are drawn randomly from the KDD 2005 Cup dataset.

Tags

Data and Resources

To access the resources you must log in

GERDAQ datasetXML

The resource: 'GERDAQ dataset' is not accessible as guest user. You must login to access it!

Item URL

http://data.d4science.org/ctlg/ResourceCatalogue/gerdaq_dataset

Additional Info

Field	Value
Accessibility	Virtual Access
AccessibilityMode	Download
Area	Natural Language Understanding
Availability	On-Line
Basic rights	Modification
ChildrenData	No
Consent obtained also covers the envisaged transfer of the personal data outside the EU	No
Consent of the data subject	No
CreationDate	2014-05-19
Creator	Cornolti, Marco, cornolti@di.unipi.it
DataProtectionDirective	Data needs no protection.
DiskSize	0.244
Field/Scope of use	Any use
Format	application/xml
Language	eng, English
ManifestationType	Original
Personal data was manifestly made public by the data subject	No
PersonalData	No
ProcessingDegree	Primary
RelatedPaper	http://doi.acm.org/10.1145/2633211.2634348
Semantic Coverage	entities
Size	244KB
Sublicense rights	No
Territory of use	World Wide
ThematicCluster	Text and Social Media Mining
system:type	Dataset

Management Info

Field	Value
Author	Cornolti Marco
Maintainer	Cornolti Marco
Version	1
Last Updated	22 June 2023, 12:22 (CEST)
Created	29 April 2021, 11:19 (CEST)