The Italian Music Dataset

The dataset is built by exploiting the Spotify and SoundCloud APIs. It is composed of over 14,500 different songs of both famous and less famous Italian musicians. Each song in the dataset is identified by its Spotify id and its title. Tracks' metadata include also lemmatized and POS-tagged lyrics and, in the most of cases, ten musical features directly gathered from Spotify. Musical features include acousticness (float), danceability (float), duration_ms (int), energy (float), instrumentalness (float), liveness (float), loudness (float), speechiness (float), tempo (float) and valence (float). All features range from 0.0 to 1.0 except for loudness that typically ranges between -60 and 0 db, the tempo that represents beats per minute (BPM) and the duration that represents the track in milliseconds. For further information refer to the Spotify's documentation at Moreover, every song is enriched with information regarding its artist, such as the Spotify id, name, music genre, region of provenance and its popularity score. In particular, the artists' genre belongs to one of the 11 major music genres identified by using lists of popular music genres gathered from AllMusic and Wikipedia. Major music genres are limited to latin, classical, jazz, pop, blues, electronic, folk, light music, R&B, hip-hop, and rock.

Data and Resources
To access the resources you must log in
  • DatasetJSON

    The resource: 'Dataset' is not accessible as guest user. You must login to access it!
Additional Info
Field Value
Accessibility Both
AccessibilityMode Download
Attribution requirements
Availability On-Line
Basic rights Download
ChildrenData No
Consent obtained also covers the envisaged transfer of the personal data outside the EU No
Consent of the data subject No
CreationDate 2018-06-26 14:00
Creator Laura Pollacci,
DataProtectionDirective None
Display requirements
Distribution requirements
External Identifier
Field/Scope of use Any use
Format JSON
Language Select Language
License term /Not specified
ManifestationType Virtual
Personal data was manifestly made public by the data subject No
PersonalData No
PersonalSensitiveData No
ProcessingDegree Primary
RelatedPaper "The Italian Music Superdiversity. Geography, Emotion and Language: one resource to find them, one resource to rule them all.", L. Pollacci, R. Guidotti, G. Rossetti, F.Giannotti, D. Pedreschi, In Multimedia Tools and Applications.
Requirement of non-disclosure (confidentiality mark)
Restrictions on use
Semantic Coverage
Size 22MB
Sublicense rights No
Territory of use World Wide
ThematicCluster Text and Social Media Mining
TimeCoverage 2018-06-26
doi 10.5281/zenodo.1298556
system:type Dataset
Management Info
Field Value
Author Rossetti Giulio
Maintainer Laura Pollacci
Version 1
Last Updated 26 September 2019, 12:27 (CEST)
Created 26 September 2019, 12:27 (CEST)