-
DNA 12-mers
A 179 MB dataset containing all the ~14M unique 12-mers in the DNA sequences available in the Pizza&Chili Corpus (https://pizzachili.dcc.uchile.cl/texts.html). This dataset...-
ZIP
The resource: 'DNA 12-mers' is not accessible as guest user. You must login to access it!
-
ZIP
-
Synthetic Datasets for Fine-Grained Fairness Analysis of Abusive Language Det...
Three synthetic datasets covering different types of bias grouped by target, namely sexism, racism and ableism. The reason for distinguishing the records by abuse targets is...-
CSV
The resource: 'Synthetic Datasets for ...' is not accessible as guest user. You must login to access it!
-
CSV
-
Private Italian Thesaurus for Tourism domain
An Italian thesaurus in the domain of the Tourism, counting 2,684 concepts, organized according to semantic relationships (equivalence, hierarchical and associative). The... -
Santorini Tweets July-August 2021
This dataset contains 225.501 tweets written by 141.277 users. These tweets are geolocated in Santorini, or they contain the word or the hashtag "santorini" in the text. They...-
ZIP
The resource: 'tweet_santorini.csv' is not accessible as guest user. You must login to access it!
-
ZIP
-
FANCY Dataset
(NLI) FANCY (FActivity, Negation, Common-sense, hYpernimy) is a new dataset with 4000 sentence pairs concerning complex linguistic phenomena such as factivity, negation,... -
SWH Filenames
A 69 GB dataset with ~2.3 billion strings representing deduplicated names of source code files collected by Software Heritage, the great library of source code...-
ZIP
The resource: 'SWH Filenames' is not accessible as guest user. You must login to access it!
-
ZIP
-
DNA 31-mers
A 12 GB dataset containing all the ~367M unique 31-mers in the DNA sequences available in the Pizza&Chili Corpus (https://pizzachili.dcc.uchile.cl/texts.html). This dataset...-
ZIP
The resource: 'DNA 31-mers' is not accessible as guest user. You must login to access it!
-
ZIP