Přejít k navigační liště

Zdroják » Databáze » Obrovský únik dat ze Spotify, co vše víme?

Obrovský únik dat ze Spotify, co vše víme?

Články Databáze

Spotify se v posledních dnech ocitlo v centru pozornosti poté, co blog Anna’s Archive zveřejnil tvrzení o rozsáhlém získání dat z této streamovací platformy. Podle autorů nejde o klasický bezpečnostní incident, ale o systematické stahování a archivaci veřejně dostupných i technických dat, která má mapovat téměř celý hudební katalog služby. Situace však otevírá otázky kolem bezpečnosti, centralizace hudby a toho, jak snadno lze data ze streamovacích platforem ve velkém měřítku získat.

Spotify je dnes de facto centrální archiv světové hudby – většina poslechu, objevování i hudebních trendů je navázaná na jednu komerční platformu. Právě tato centralizace je ale zároveň slabinou. Pokud by služba změnila podmínky, odstranila část katalogu nebo v budoucnu zanikla, obrovská část digitální hudební historie by mohla zmizet.

Na tento problém upozorňuje blog Anna’s Archive, který zveřejnil text Backing up Spotify. Autoři v něm tvrdí, že se jim podařilo systematicky zazálohovat téměř celý Spotify katalog, a to jak z pohledu metadat, tak samotných audio souborů. Cílem podle nich není konkurovat streamovacím službám, ale dlouhodobá archivace a ochrana kulturního dědictví, podobně jako u knih nebo akademických publikací.

Podle blogu dnes Spotify obsahuje desítky milionů skladeb, přičemž drtivá většina poslechů se soustředí na relativně malou část katalogu. Právě tuto strukturu se archiv snaží kopírovat – data jsou rozdělená podle popularity, kompletnosti a významu. Výsledkem má být archiv v řádu stovek terabajtů, který mapuje prakticky celý hudební ekosystém Spotify.

Jaká jsou ty nejzajímavější leaknutá data?

Přehled kolekce spotify

Písně podle skóre oblíbenosti

Top 10 000 skladeb má popularitu v rozmezí 70-100.

Analýza počtu streamů podle popularity

Můžeme také odhadnout, že top 3 skladby mají vyšší celkový počet streamů než spodních 20-100 milionů skladeb dohromady:

UmělecJménoPopularitaPočet streamů
Lady Gaga, Bruno MarsDie With A Smile1003.075 miliard
Billie EilishBIRDS OF A FEATHER983.137 miliard
Bad BunnyDtMF981.124 miliard

Skladby podle délky trvání

Písně podle hodnocení obsahu

Top 30 Hudebních žánrů podle počtu interpretů

Protože každý žánr je velmi specifický, můžeme žánry také seskupit a spočítat výsledky:

Můžeme také seskupit interprety podle popularity. Výsledný graf vypadá velmi podobně jako graf popularity skladeb:

Vydání alb podle roku

Alba podle počtu skladeb

Z tohoto grafu je poznat, že nejvíce skladeb na Spotify jsou singly.

Metadata Files

Stručný přehled tří hlavních APIs Spotify: interpreti, alba, skladby. Každá skladba existuje přesně v jednom albu, ale každá skladba a každé album může mít více interpretů.

CREATE TABLE `artists` (
          `rowid` integer PRIMARY KEY NOT NULL,
          /* The original Spotify base62 ID. */
          `id` text NOT NULL,
          /* When the item was fetched (unixepoch ms). */
          `fetched_at` integer NOT NULL,
          /* "The name of the artist."*/
          `name` text NOT NULL,
          /* followers.total - "The total number of followers." */
          `followers_total` integer NOT NULL,
          /* "The popularity of the artist. The value will be between 0 and 100, with 100 being the most popular. The artist's popularity is calculated from the popularity of all the artist's tracks." */
          `popularity` integer NOT NULL
  );
  /* "A list of the genres the artist is associated with. If not yet classified, the array is empty." */
  CREATE TABLE `artist_genres` (
          `artist_rowid` integer NOT NULL,
          `genre` text NOT NULL,
          FOREIGN KEY (`artist_rowid`) REFERENCES `artists`(`rowid`)
  );
  /* Images of the artist in various sizes, widest first. */
  CREATE TABLE `artist_images` (
          `artist_rowid` integer NOT NULL,
          `width` integer NOT NULL,
          `height` integer NOT NULL,
          `url` text NOT NULL,
          FOREIGN KEY (`artist_rowid`) REFERENCES `artists`(`rowid`)
  );
  /*
   * Information about artist-albums relationships from /artists/{id}/albums, album.artists[] and album.tracks[].artists[].
   * The relationships "album", "single", "compilation" are left out because they can be reconstructed from `album.type`.
   */
  CREATE TABLE "artist_albums" (
          `artist_rowid` integer NOT NULL,
          `album_rowid` integer NOT NULL,
          /* True if this link was retrieved from /artists/{id}/albums with an "album_group" response of "appears_on". */
          `is_appears_on` integer NOT NULL,
          /* True if this link is based on the actual artists of each track in the album. Only exists if the link is not explicit (above). */
          `is_implicit_appears_on` integer NOT NULL,
          /* If neither is_appears_on or is_implicit_appears_on, the index of album.data.artists[] this was retrieved from. */
          `index_in_album` integer,
          FOREIGN KEY (`artist_rowid`) REFERENCES `artists`(`rowid`),
          FOREIGN KEY (`album_rowid`) REFERENCES `albums`(`rowid`)
  );
  /* combinations of available markets in a separate table to save space */
  CREATE TABLE `available_markets` (
          `rowid` integer PRIMARY KEY NOT NULL,
          /* comma separated ISO 3166-1 alpha-2 country codes. */
          `available_markets` text NOT NULL
  );
  /* /albums/{id} - "Get Spotify catalog information for a single album." */
  CREATE TABLE `albums` (
          `rowid` integer PRIMARY KEY NOT NULL,
          /* The original Spotify base62 ID. */
          `id` text NOT NULL,
          /* When the item was fetched (unixepoch ms). */
          `fetched_at` integer NOT NULL,
          /* "The name of the album. In case of an album takedown, the value may be an empty string." */
          `name` text NOT NULL,
          /* 'The type of the album. Allowed values: "album", "single", "compilation"' */
          `album_type` text NOT NULL,
          /* available markets as an index into the available_markets table to save space. - "The markets in which the album is available: ISO 3166-1 alpha-2 country codes. NOTE: an album is considered available in a market when at least 1 of its tracks is available in that market." */
          `available_markets_rowid` integer NOT NULL,
          /* external_id.upc - Universal Product Code */
          `external_id_upc` text,
          /* external_id.amgid - undocumented - AMG MUSIC GROUP Internal ID */
          "external_id_amgid" text,
          /* "The copyright" */
          `copyright_c` text,
          /* "The sound recording (performance) copyright." */
          `copyright_p` text,
          /* "The label associated with the album." */
          `label` text NOT NULL,
          /* "The popularity of the album. The value will be between 0 and 100, with 100 being the most popular." */
          `popularity` integer NOT NULL,
          /* "The date the album was first released." */
          `release_date` text NOT NULL,
          /* 'The precision with which release_date value is known. Allowed values: "year", "month", "day"' */
          `release_date_precision` text NOT NULL,
          /* tracks.total */
          `total_tracks` integer NOT NULL,
          FOREIGN KEY (`available_markets_rowid`) REFERENCES `available_markets`(`rowid`)
  );

  /* album.images[] - "The cover art for the album in various sizes, widest first." */
  CREATE TABLE "album_images" (
          `album_rowid` integer NOT NULL,
          `width` integer NOT NULL,
          `height` integer NOT NULL,
          `url` text NOT NULL,
          FOREIGN KEY (`album_rowid`) REFERENCES `albums`(`rowid`)
  );
  /* /tracks/{id} */
  CREATE TABLE `tracks` (
          `rowid` integer PRIMARY KEY NOT NULL,
          /* The original Spotify base62 ID. */
          `id` text NOT NULL,
          /* When the item was fetched (unixepoch ms). */
          `fetched_at` integer NOT NULL,
          /* "The name of the track." */
          `name` text NOT NULL,
          /* "A link to a 30 second preview (MP3 format) of the track. Can be null" */
          `preview_url` text,
          `album_rowid` integer NOT NULL,
          /* "The number of the track. If an album has several discs, the track number is the number on the specified disc." */
          `track_number` integer NOT NULL,
          /* http://en.wikipedia.org/wiki/International_Standard_Recording_Code */
          `external_id_isrc` text,
          `external_id_ean` text,
          `external_id_upc` text,
          /* "The popularity of the track. The value will be between 0 and 100, with 100 being the most popular.
  The popularity of a track is a value between 0 and 100, with 100 being the most popular. The popularity is calculated by algorithm and is based, in the most part, on the total number of plays the track has had and how recent those plays are.
  Generally speaking, songs that are being played a lot now will have a higher popularity than songs that were played a lot in the past. Duplicate tracks (e.g. the same track from a single and an album) are rated independently. Artist and album popularity is derived mathematically from track popularity. Note: the popularity value may lag actual popularity by a few days: the value is not updated in real time." */
          `popularity` integer NOT NULL,
          /* A reference into the available_markets table. - "A list of the countries in which the track can be played, identified by their ISO 3166-1 alpha-2 code." */
          `available_markets_rowid` integer NOT NULL,
          /* "The disc number (usually 1 unless the album consists of more than one disc)." */
          `disc_number` integer NOT NULL,
          /* "The track length in milliseconds." */
          `duration_ms` integer NOT NULL,
          /* "Whether or not the track has explicit lyrics ( true = yes it does; false = no it does not OR unknown)." */
          `explicit` integer NOT NULL,
          FOREIGN KEY (`available_markets_rowid`) REFERENCES `available_markets`(`rowid`)
  );
  /* "The artists who performed the track." */
  CREATE TABLE `track_artists` (
          `track_rowid` integer NOT NULL,
          `artist_rowid` integer NOT NULL,
          FOREIGN KEY (`track_rowid`) REFERENCES `tracks`(`rowid`),
          FOREIGN KEY (`artist_rowid`) REFERENCES `artists`(`rowid`)
  );

  CREATE INDEX `artist_genres_artist_id` ON `artist_genres` (`artist_rowid`);
  CREATE INDEX `artist_genres_genre` ON `artist_genres` (`genre`);
  CREATE INDEX `artist_images_artist_id` ON `artist_images` (`artist_rowid`);
  CREATE UNIQUE INDEX `artists_id_unique` ON `artists` (`id`);
  CREATE INDEX `artists_name` ON `artists` (`name`);
  CREATE INDEX `artists_popularity` ON `artists` (`popularity`);
  CREATE INDEX `artists_followers` ON `artists` (`followers_total`);
  CREATE INDEX `artist_album_artist_id` ON "artist_albums" (`artist_rowid`);
  CREATE INDEX `artist_album_album_id` ON "artist_albums" (`album_rowid`);
  CREATE UNIQUE INDEX `albums_id_unique` ON `albums` (`id`);
  CREATE INDEX `album_name` ON `albums` (`name`);
  CREATE INDEX `album_popularity` ON `albums` (`popularity`);
  CREATE UNIQUE INDEX `available_markets_available_markets_unique` ON `available_markets` (`available_markets`);
  CREATE INDEX `track_artists_artist_id` ON `track_artists` (`artist_rowid`);
  CREATE INDEX `track_artists_track_id` ON `track_artists` (`track_rowid`);
  CREATE UNIQUE INDEX `tracks_id_unique` ON `tracks` (`id`);
  CREATE INDEX `tracks_popularity` ON `tracks` (`popularity`);
  CREATE INDEX `tracks_album` ON `tracks` (`album_rowid`);
  CREATE INDEX `album_images_album_id` ON `album_images` (`album_rowid`);
  CREATE INDEX tracks_isrc on tracks(external_id_isrc);Code language: SQL (Structured Query Language) (sql)

Scrape objektů Playlist ze Spotify API. Vyžaduje databázi spotify_clean.sqlite3 pro mapování track_rowid na track_id.

/* /get-playlist/{id} - "Get a playlist owned by a Spotify user." */
  CREATE TABLE "playlists" (
          `rowid` integer PRIMARY KEY NOT NULL,
          /* the original spotify base62 ID */
          `id` text NOT NULL,
          /* the spotify snapshot ID that was fetched (we only store one copy of each playlist) - "The version identifier for the current playlist. Can be supplied in other requests to target a specific playlist version" */
          `snapshot_id` text NOT NULL,
          /* When the playlist was fetched (unixepoch ms). */
          `fetched_at` integer NOT NULL,
          /* "The name of the playlist." */
          `name` text NOT NULL,
          /* "The playlist description. Only returned for modified, verified playlists, otherwise null. */
          `description` text,
          /* "true if the owner allows other users to modify the playlist." */
          `collaborative` integer NOT NULL,
          -- "The playlist's public/private status (if it is added to the user's profile): true the playlist is public, false the playlist is private, null the playlist status is not relevant. For more about public/private status, see Working with Playlists"
          `public` integer NOT NULL,
          `primary_color` text,
          /* owner.id - "The unique string identifying the Spotify user that you can find at the end of the Spotify URI for the user. The ID of the current user can be obtained via the Get Current User's Profile endpoint." */
          `owner_id` text,
          /* owner.display_name - "The name displayed on the user's profile. null if not available." */
          `owner_display_name` text,
          /* followers.total */
          `followers_total` integer,
          `tracks_total` integer NOT NULL
  );
  CREATE UNIQUE INDEX `playlists_id_unique` ON `playlists` (`id`);
  CREATE INDEX `playlists_name` ON `playlists` (`name`);
  CREATE INDEX `playlists_owner_id` ON `playlists` (`owner_id`);
  CREATE INDEX `playlists_snapshot_id` ON `playlists` (`snapshot_id`);
  CREATE INDEX `playlists_followers` ON `playlists` (`followers_total`);


  /* "Images for the playlist. The array may be empty or contain up to three images. The images are returned by size in descending order. See Working with Playlists. Note: If returned, the source URL for the image (url) is temporary and will expire in less than a day." */
  CREATE TABLE `playlist_images` (
          `playlist_rowid` integer NOT NULL,
          `width` integer,
          `height` integer,
          `url` text NOT NULL,
          FOREIGN KEY (`playlist_rowid`) REFERENCES `playlists`(`rowid`)
  );
  CREATE INDEX `playlist_images_playlist_id` ON `playlist_images` (`playlist_rowid`);

  /* The tracks of the playlist. Merged from all pages of getPlaylistItems. */
  CREATE TABLE "playlist_tracks" (
          `playlist_rowid` integer NOT NULL,
          /* 0-based integer position within the playlists response (playlist.tracks.items[i]) */
          `position` integer NOT NULL,
          /* true if track.type == "episode". false if track.type == "track" */
          `is_episode` integer NOT NULL,
          /* The rowid of this track in the `tracks` table. */
          `track_rowid` integer,
          /* If the rowid is null, this is the spotify base62 ID instead. */
          `id_if_not_in_tracks_table` text,
          /* (unixepoch seconds) - "The date and time the track or episode was added. Note: some very old playlists may return null in this field." */
          `added_at` integer NOT NULL,
          /* added_by.id - "The Spotify user who added the track or episode. Note: some very old playlists may return null in this field." */
          `added_by_id` text,
          /* undocumented */
          `primary_color` text,
          /* video_thumbnail.url - undocumented */
          `video_thumbnail_url` text,
          /* "Whether this track or episode is a local file or not." https://developer.spotify.com/documentation/web-api/concepts/playlists#local-files */
          `is_local` integer NOT NULL,
          /* track.name if is_local is true. For non-local tracks, name can be retrieved from the `tracks` table. */
          `name_if_is_local` text,
          /* track.uri if is_local is true */
          `uri_if_is_local` text,
          /* track.album if is_local is true. Spotify constructs a fake album object for local tracks. For non-local tracks, album can be retrieved from the `albums` table. */
          `album_name_if_is_local` text,
          /* track.artists[0].name if is_local is true. Spotify constructs a single fake artists object for local tracks. For non-local tracks, artist name can be retrieved from the `artists` table. */
          `artists_name_if_is_local` text,
          /* track.duration_ms if is_local is true. For non-local tracks, duration_ms can be retrieved from the `tracks` table. */
          `duration_ms_if_is_local` integer,
          PRIMARY KEY(`playlist_rowid`, `position`),
          FOREIGN KEY (`playlist_rowid`) REFERENCES `playlists`(`rowid`)
  ) WITHOUT ROWID;Code language: SQL (Structured Query Language) (sql)

Komentáře

Odebírat
Upozornit na
guest
0 Komentářů
Nejstarší
Nejnovější Most Voted
Inline Feedbacks
Zobrazit všechny komentáře

Urban VPN Browser Extension tajně sbírá a prodává AI konverzace milionů uživatelů

AI, Různé
Komentáře: 0
Bezplatné rozšíření do prohlížeče Urban VPN Proxy, které slibovalo anonymní přístup k internetu, se podle bezpečnostních expertů ukázalo být hrozbou pro soukromí uživatelů. Rozšíření totiž potají sbíralo konverzace s AI asistenty a odesílalo je na servery třetích stran, aniž by uživatelé o této aktivitě věděli. Případ upozorňuje na rizika bezplatných VPN nástrojů a potřebu průběžné kontroly bezpečnosti nainstalovaných doplňků.