20 items found

Types: SoBigData.eu: Dataset Tags: Text mining

Filter Results
  • SoBigData.eu: Dataset

    WIRE dataset

    This dataset consists of 503 pairs of Wikipedia entities drawn from the New York Times dataset with a human assigned relatedness score. The domain experts based their...
    • CSV
      The resource: 'WIRE dataset' is not accessible as guest user. You must login to access it!
  • SoBigData.eu: Dataset

    Articles and comments of major Estonian newspapers

    The dataset contains articles and comments of four major Estonian news portals since early 2000s to 2016.
  • SoBigData.eu: Dataset

    Emergency Tweets 2012 Emilia earthquake

    This dataset contains 3,170 Italian tweets about the earthquakes that stroke the Emilia Romagna regional district in Italy on 20 May 2012 starting from 4 a.m. local time...
    • ZIP
      The resource: 'EAQ-EML.zip' is not accessible as guest user. You must login to access it!
  • SoBigData.eu: Dataset

    Amazon reviews

    This dataset contains product reviews and metadata from Amazon, including 142.8 million reviews spanning May 1996 - July 2014. This dataset includes reviews (ratings, text,...
    • HTML
      The resource: 'Julian McAuley's repository.' is not accessible as guest user. You must login to access it!
  • SoBigData.eu: Dataset

    Emergency Tweets 2016 Amatrice earthquake

    This dataset contais Italian tweets related to the earthquake of 2016 in the Centre of Italy (https://it.wikipedia.org/wiki/Terremoto_del_Centro_Italia_del_2016_e_d...). is...
    • ZIP
      The resource: 'EAQ-AMA.zip' is not accessible as guest user. You must login to access it!
  • SoBigData.eu: Dataset

    Brexit Twitter User Vote Intent

    A list of users for which vote intent in the UK EU membership referendum has been established.
  • SoBigData.eu: Dataset

    Emergency Tweets 2014 Genoa flood

    This dataset contains Italian tweets collected during and in the aftermath of the floods that occurred near the city of Genoa between 9 and 11 October 2014...
    • ZIP
      The resource: 'FLO-GEN.zip' is not accessible as guest user. You must login to access it!
  • SoBigData.eu: Dataset

    Sheffield NERD Tweet Corpus

    The dataset contais 794 tweets annotated with named entities disambiguated against DBpedia, and split into equally sized training and test portions. 400 tweets from 2013 comes...
    • FINF
      The resource: 'Sheffield NERD Tweet Corpus' is not accessible as guest user. You must login to access it!
  • SoBigData.eu: Dataset

    UK General Election Vote Intent

    A list of Twitter users for whom party political allegiance/vote intent has been established.
  • SoBigData.eu: Dataset

    Emergency Tweets 2013 Sardinia flood

    This dataset is related to the floods that occurred in the Sardinia regional district between 17 and 19 November 2013 (https://en.wikipedia.org/wiki/2013_Sardinia_floods), as...
    • ZIP
      The resource: 'FLO-SAR.zip' is not accessible as guest user. You must login to access it!
  • SoBigData.eu: Dataset

    The Italian Music Dataset

    The dataset is built by exploiting the Spotify and SoundCloud APIs. It is composed of over 14,500 different songs of both famous and less famous Italian musicians. Each song...
    • JSON
      The resource: 'Dataset' is not accessible as guest user. You must login to access it!
  • SoBigData.eu: Dataset

    Geo-annotated tweets ENG-ITA

    • ZIP
      The resource: 'geo-annotated tweets.zip' is not accessible as guest user. You must login to access it!
  • SoBigData.eu: Dataset

    Emergency Tweets 2011 Christchurch earthquake

    This dataset contains tweets related to the devastating earthquake occurred on 22 February 2011, at around 12 p.m. local time in Christchurch, New Zealand...
    • CSV
      The resource: 'EAQ-CHR_tweets.csv' is not accessible as guest user. You must login to access it!
  • SoBigData.eu: Dataset

    Emergency Tweets 2013 Milan blackout

    This dataset is related to a power outage (i.e., a blackout) that occurred in the city of Milan, in northern Italy, in the night between 14 and 15 May 2013. Despite not...
    • CSV
      The resource: 'PWO-MIL_tweets.csv' is not accessible as guest user. You must login to access it!
  • SoBigData.eu: Dataset

    Emergency Tweets 2009 L'Aquila earthquake

    This dataset comprises 1,100 Italian tweets shared in the aftermath of the 2009 L’Aquila earthquake (https://en.wikipedia.org/wiki/2009_L%27Aquila_earthquake). The earthquake...
    • ZIP
      The resource: 'EAQ-LAQ.zip' is not accessible as guest user. You must login to access it!
  • SoBigData.eu: Dataset

    Wikipedia Word Embeddings

    Embeddings were created through applying word2vec skipgram to a corpus of wikipedia non-stub articles from a December 2015 English dump with the following parameters: -cbow 0...
    • The resource: 'Embeddings' is not accessible as guest user. You must login to access it!
  • SoBigData.eu: Dataset

    Twitter social bots

    Spambots are automated accounts (i.e., accounts driven by a bot) that repeatedly advertise unsolicited and often harmful content (e.g., malware, URLs to phishing Web sites,...
  • SoBigData.eu: Dataset

    Broad Twitter Corpus

    The Broad Twitter Corpus is a named entity-annotated dataset of tweets, collected in order to capture temporal, spatial and social diversity. The goal of the corpus is to...
    • JSON
      The resource: 'Broad Twitter Corpus' is not accessible as guest user. You must login to access it!
  • SoBigData.eu: Dataset

    Twitter fake followers

    Fake followers are fake accounts massively created to follow a target account and that can be bought from online markets. In other words, their goal is that of increasing the...
  • SoBigData.eu: Dataset

    Wikinews dataset

    This dataset consists of a sample of 365 news published by Wikinews from November 2004 to June 2014 and annotated with about 5000 entities, each associated with a saliency...
    • JSON
      The resource: 'entity-saliency' is not accessible as guest user. You must login to access it!