-
DNA 12-mers
A 179 MB dataset containing all the ~14M unique 12-mers in the DNA sequences available in the Pizza&Chili Corpus (https://pizzachili.dcc.uchile.cl/texts.html). This dataset...-
ZIP
The resource: 'DNA 12-mers' is not accessible as guest user. You must login to access it!
-
ZIP
-
Synthetic Datasets for Fine-Grained Fairness Analysis of Abusive Language Det...
Three synthetic datasets covering different types of bias grouped by target, namely sexism, racism and ableism. The reason for distinguishing the records by abuse targets is...-
CSV
The resource: 'Synthetic Datasets for ...' is not accessible as guest user. You must login to access it!
-
CSV
-
Private Italian Thesaurus for Tourism domain
An Italian thesaurus in the domain of the Tourism, counting 2,684 concepts, organized according to semantic relationships (equivalence, hierarchical and associative). The... -
Santorini Tweets July-August 2021
This dataset contains 225.501 tweets written by 141.277 users. These tweets are geolocated in Santorini, or they contain the word or the hashtag "santorini" in the text. They...-
ZIP
The resource: 'tweet_santorini.csv' is not accessible as guest user. You must login to access it!
-
ZIP
-
FANCY Dataset
(NLI) FANCY (FActivity, Negation, Common-sense, hYpernimy) is a new dataset with 4000 sentence pairs concerning complex linguistic phenomena such as factivity, negation,... -
SWH Filenames
A 69 GB dataset with ~2.3 billion strings representing deduplicated names of source code files collected by Software Heritage, the great library of source code...-
ZIP
The resource: 'SWH Filenames' is not accessible as guest user. You must login to access it!
-
ZIP
-
DNA 31-mers
A 12 GB dataset containing all the ~367M unique 31-mers in the DNA sequences available in the Pizza&Chili Corpus (https://pizzachili.dcc.uchile.cl/texts.html). This dataset...-
ZIP
The resource: 'DNA 31-mers' is not accessible as guest user. You must login to access it!
-
ZIP
-
Articles and comments of major Estonian newspapers
The dataset contains articles and comments of four major Estonian news portals since early 2000s to 2016. -
Emergency Tweets 2012 Emilia earthquake
This dataset contains 3,170 Italian tweets about the earthquakes that stroke the Emilia Romagna regional district in Italy on 20 May 2012 starting from 4 a.m. local time...-
ZIP
The resource: 'EAQ-EML.zip' is not accessible as guest user. You must login to access it!
-
ZIP
-
Emergency Tweets 2016 Amatrice earthquake
This dataset contais Italian tweets related to the earthquake of 2016 in the Centre of Italy (https://it.wikipedia.org/wiki/Terremoto_del_Centro_Italia_del_2016_e_d...). is...-
ZIP
The resource: 'EAQ-AMA.zip' is not accessible as guest user. You must login to access it!
-
ZIP
-
Brexit Twitter User Vote Intent
A list of users for which vote intent in the UK EU membership referendum has been established. -
Emergency Tweets 2014 Genoa flood
This dataset contains Italian tweets collected during and in the aftermath of the floods that occurred near the city of Genoa between 9 and 11 October 2014...-
ZIP
The resource: 'FLO-GEN.zip' is not accessible as guest user. You must login to access it!
-
ZIP
-
Sheffield NERD Tweet Corpus
The dataset contais 794 tweets annotated with named entities disambiguated against DBpedia, and split into equally sized training and test portions. 400 tweets from 2013 comes...-
FINF
The resource: 'Sheffield NERD Tweet Corpus' is not accessible as guest user. You must login to access it!
-
FINF
-
UK General Election Vote Intent
A list of Twitter users for whom party political allegiance/vote intent has been established. -
Emergency Tweets 2013 Sardinia flood
This dataset is related to the floods that occurred in the Sardinia regional district between 17 and 19 November 2013 (https://en.wikipedia.org/wiki/2013_Sardinia_floods), as...-
ZIP
The resource: 'FLO-SAR.zip' is not accessible as guest user. You must login to access it!
-
ZIP
-
-
ZIP
The resource: 'geo-annotated tweets.zip' is not accessible as guest user. You must login to access it!
-
ZIP
-
Emergency Tweets 2011 Christchurch earthquake
This dataset contains tweets related to the devastating earthquake occurred on 22 February 2011, at around 12 p.m. local time in Christchurch, New Zealand...-
CSV
The resource: 'EAQ-CHR_tweets.csv' is not accessible as guest user. You must login to access it!
-
CSV
-
BioTAGME: A comprehensive platform for biological knowledge network analysis
This Network was built through BioTAGME, a system that combines TAGME, an entity-annotation framework based on Wikipedia corpus with a network-based inference methodology (i.e.,... -
Emergency Tweets 2013 Milan blackout
This dataset is related to a power outage (i.e., a blackout) that occurred in the city of Milan, in northern Italy, in the night between 14 and 15 May 2013. Despite not...-
CSV
The resource: 'PWO-MIL_tweets.csv' is not accessible as guest user. You must login to access it!
-
CSV
-
Emergency Tweets 2009 L'Aquila earthquake
This dataset comprises 1,100 Italian tweets shared in the aftermath of the 2009 L’Aquila earthquake (https://en.wikipedia.org/wiki/2009_L%27Aquila_earthquake). The earthquake...-
ZIP
The resource: 'EAQ-LAQ.zip' is not accessible as guest user. You must login to access it!
-
ZIP