Design Spotify Music Streaming

Spotify serves 675+ million monthly active users across 180+ markets, streaming from a catalog of 100+ million tracks. Unlike video platforms where files are gigabytes, audio files are megabytes—but the scale of concurrent streams, personalization depth, and the expectation of instant playback create unique challenges. This design covers the audio delivery pipeline, the recommendation engine that drives 30%+ of listening, offline sync, and the microservices architecture that enables 300+ autonomous teams to ship independently.

Mermaid diagram — High-level architecture: clients connect through API gateway to microservices; audio delivered via multi-CDN; events flow through Pub/Sub to analytics.

Abstract

Spotify’s architecture is shaped by three fundamental constraints:

Audio is lightweight but latency-critical: A 3-minute track at 320 kbps is ~7 MB—trivial compared to video. But users expect instant playback on tap. The architecture optimizes for time-to-first-byte, not throughput.
Personalization is the product: Discovery features (Discover Weekly, Daily Mix, Release Radar) drive 30%+ of streams. The recommendation system processes billions of listening events daily to generate personalized content for each user.
Offline mode is a first-class feature: Premium subscribers can download thousands of tracks. This requires a license management system, intelligent sync, and storage management across devices.

The core mechanisms:

Multi-CDN delivery via Akamai, Fastly, and AWS CloudFront, with intelligent routing based on user location and CDN health
Ogg Vorbis encoding at multiple bitrates (96-320 kbps) with automatic quality adaptation based on network conditions
Cassandra for user data (playlists, listening history) with write-optimized schema design
Hybrid recommendation system combining collaborative filtering, content-based analysis (via Echo Nest audio features), and natural language processing
Google Cloud Platform for compute, storage, and data processing after 2016 migration from on-premise

Requirements

Functional Requirements

Requirement	Priority	Notes
Audio playback	Core	Adaptive streaming, gapless playback, crossfade
Search	Core	Tracks, artists, albums, playlists, podcasts
Playlists	Core	Create, edit, collaborative playlists
Library management	Core	Save tracks, albums, follow artists
Offline downloads	Core	Premium feature, license-protected
Personalized recommendations	Core	Discover Weekly, Daily Mix, Release Radar
Social features	Extended	Friend activity, shared playlists
Podcasts	Extended	Episodes, shows, in-progress tracking
Lyrics	Extended	Synced lyrics display
Live events	Out of scope	Concerts, virtual events
Audiobooks	Out of scope	Separate purchase model

Non-Functional Requirements

Requirement	Target	Rationale
Playback availability	99.99%	Revenue-critical, user retention
Time to first audio	p99 < 500ms	User expectation for instant playback
Search latency	p99 < 200ms	Responsive search experience
Recommendation freshness	< 24 hours	Daily personalization updates
Offline sync reliability	99.9%	Downloaded content must play
Concurrent streams	Support 50M+	Peak evening traffic globally
Catalog update latency	< 4 hours	New releases available quickly

Scale Estimation

Spotify-scale baseline (2024):

1
Monthly active users: 675 million
2
Premium subscribers: 265 million (39%)
3
Free (ad-supported) users: 410 million (61%)
4

5
Catalog:
6
- Tracks: 100+ million
7
- Podcasts: 6+ million shows
8
- New tracks added daily: ~100,000
9

10
Streaming traffic:
11
- Average streams per DAU: ~25 tracks/day
12
- DAU estimate: 300M (45% of MAU)
13
- Daily streams: 7.5 billion
14
- Peak concurrent: ~50M streams
15

16
Audio file sizes (3-minute track):
17
- 96 kbps (Low): ~2.2 MB
18
- 160 kbps (Normal): ~3.6 MB
19
- 320 kbps (Very High): ~7.2 MB
20
- Average effective: ~4 MB per track
21

22
Daily bandwidth:
23
- 7.5B streams × 4 MB = 30 PB/day
24
- With 90% CDN hit rate: 3 PB/day from origin

Storage estimation:

1
Audio storage:
2
- 100M tracks × 4 quality levels × 4 MB avg = 1.6 PB
3
- With metadata, artwork: ~2 PB total
4

5
User data:
6
- 675M users × 500 playlists avg × 100 tracks = massive
7
- Listening history: 100 events/user/day × 30 days = 600B events/month

Design Paths

Path A: Single-CDN with Origin Shield

Best when:

Smaller scale (< 100M users)
Geographic concentration
Simpler operations preferred

Architecture:

Single CDN provider (e.g., CloudFront)
Origin shield layer to reduce origin load
Simple routing via DNS

Trade-offs:

✅ Simpler vendor management
✅ Consistent caching behavior
✅ Easier debugging
❌ Single point of failure
❌ Vendor lock-in on pricing
❌ May have regional coverage gaps

Real-world example: SoundCloud relies primarily on AWS CloudFront for audio delivery.

Path B: Multi-CDN with Intelligent Routing (Spotify Model)

Best when:

Massive global scale (100M+ users)
Need for high availability
Leverage competitive CDN pricing

Architecture:

Multiple CDN providers (Akamai, Fastly, AWS)
Real-time CDN health monitoring
Client-side CDN selection based on performance
Specialized CDNs for different content types

Trade-offs:

✅ No single point of failure
✅ Cost optimization through CDN arbitrage
✅ Best performance per region
✅ Leverage each CDN’s strengths
❌ Complex routing logic
❌ Inconsistent caching behavior
❌ Multiple vendor relationships

Real-world example: Spotify uses Akamai and AWS for audio streaming, Fastly for images and UI assets.

Historical Path: P2P-Assisted Delivery

Used by Spotify 2008-2014:

Spotify originally used peer-to-peer technology, with 80% of traffic served by peers in 2011.

Why they moved away:

Improved CDN economics at scale
Mobile devices (poor P2P participants)
Complexity of P2P on modern networks (NAT, firewalls)
Sufficient server capacity globally

Path Comparison

Factor	Single CDN	Multi-CDN	P2P-Assisted
Availability	99.9%	99.99%	Variable
Setup complexity	Low	High	Very High
Operating cost	Medium	Lower at scale	Lowest
Mobile support	Full	Full	Limited
Latency consistency	High	Medium	Variable
Best for	< 100M users	> 100M users	Cost-sensitive

This Article’s Focus

This article focuses on Path B (Multi-CDN) because:

Spotify scale requires geographic diversity
The multi-CDN pattern demonstrates advanced content delivery
It represents the current industry standard for major streaming services

Binary serialization (smaller payloads than JSON)
Strong typing via protobuf
Bidirectional streaming support
Code generation for multiple languages

Service mesh:

Envoy proxy for load balancing, observability
Circuit breakers for fault isolation
Automatic retries with exponential backoff

Playback Flow

User taps play → Client sends play request to Playback Service
Playback Service validates → Checks subscription, licensing, availability
License acquired → DRM key retrieved for encrypted content
CDN URL returned → Client receives signed URL with CDN selection
Audio streamed → Client fetches segments from edge CDN
Prefetch triggered → Next track segments pre-fetched for gapless playback
Event logged → Stream event sent to Pub/Sub for analytics

Event-Driven Architecture

Every user action generates events:

1
Play event → Pub/Sub → Dataflow → BigQuery (analytics)
2
                    → Bigtable (real-time features)
3
                    → Recommendation update
4

5
Search event → Pub/Sub → Search ranking signals
6
Follow event → Pub/Sub → Social graph update

Event volume:

1 trillion+ Pub/Sub messages per day
Sub-second end-to-end latency for real-time features

Quality	Bitrate	Format	Availability	File Size (3 min)
Low	24 kbps	Ogg Vorbis	Free/Premium	~0.5 MB
Normal	96 kbps	Ogg Vorbis	Free/Premium	~2.2 MB
High	160 kbps	Ogg Vorbis	Free/Premium	~3.6 MB
Very High	320 kbps	Ogg Vorbis	Premium only	~7.2 MB
Web	256 kbps	AAC	Web player	~5.8 MB

Why Ogg Vorbis:

Open-source, royalty-free codec
Comparable quality to MP3 at lower bitrates
Better than MP3 at same bitrate (especially < 128 kbps)
Efficient hardware decoding on mobile devices

AAC for web:

Native browser support without plugins
Required for Safari/iOS web player

Adaptive Bitrate Selection

The client dynamically selects quality based on:

1
if network_type == "cellular" and data_saver_enabled:
2
    quality = LOW (24 kbps)
3
elif network_type == "cellular":
4
    quality = NORMAL (96 kbps)
5
elif buffering_recently:
6
    quality = decrease_one_level()
7
elif buffer_healthy and bandwidth_sufficient:
8
    quality = user_preference (up to 320 kbps)

Buffer management:

Target buffer: 10-30 seconds of audio
Low watermark: 5 seconds (trigger quality drop)
High watermark: 30 seconds (allow quality increase)

Gapless Playback

For seamless album listening:

Prefetch: Start fetching next track when current is 90% complete
Decode ahead: Decode first 5 seconds of next track
Crossfade boundary: Handle precise sample-accurate transitions
Memory management: Release previous track’s buffer

Implementation challenges:

Different sample rates between tracks
Metadata gaps in some files
Client memory constraints on mobile

1
def select_cdn(user_location, content_type, cdns_health):
2
    """Select optimal CDN for request."""
3
    candidates = []
4

5
    for cdn in available_cdns:
6
        if not cdns_health[cdn].is_healthy:
7
            continue
8

9
        latency = get_latency_estimate(cdn, user_location)
10
        availability = cdns_health[cdn].availability_99p
11
        cost = get_cost_per_gb(cdn, user_location)
12

13
        score = (
14
            0.5 * normalize(latency, lower_is_better=True) +
15
            0.3 * normalize(availability, lower_is_better=False) +
16
            0.2 * normalize(cost, lower_is_better=True)
17
        )
18
        candidates.append((cdn, score))
19

20
    return max(candidates, key=lambda x: x[1])[0]

Cache Key Design

1
Audio: /{track_id}/{quality}/{segment}.ogg
2
Images: /{image_id}/{size}.jpg

Cache TTL strategy:

Content Type	TTL	Rationale
Audio files	1 year	Immutable content
Album artwork	30 days	Rarely changes
Artist images	7 days	Occasional updates
Playlist covers	1 day	User-generated
API responses	5 minutes	Balance freshness/load

Signed URLs

Audio URLs include authentication:

1
https://audio-cdn.spotify.com/tracks/{track_id}/320.ogg
2
    ?sig={hmac_signature}
3
    &exp={expiration_timestamp}
4
    &uid={user_id}

Signature validation:

HMAC-SHA256 with rotating keys
1-hour expiration for streaming URLs
Rate limiting per user/IP

API Design

Play Track

Endpoint: POST /v1/me/player/play

Request:

1
{
2
  "context_uri": "spotify:playlist:37i9dQZF1DXcBWIGoYBM5M",
3
  "offset": {
4
    "position": 0
5
  },
6
  "position_ms": 0
7
}

Response (204 No Content on success)

Error Responses:

401 Unauthorized: Invalid or expired token
403 Forbidden: Premium required for this feature
404 Not Found: Track/playlist not available
429 Too Many Requests: Rate limit exceeded

Get Track

Endpoint: GET /v1/tracks/{id}

Response (200 OK):

1
{
2
  "id": "3n3Ppam7vgaVa1iaRUc9Lp",
3
  "name": "Mr. Brightside",
4
  "duration_ms": 222973,
5
  "explicit": false,
6
  "popularity": 87,
7
  "preview_url": "https://p.scdn.co/mp3-preview/...",
8
  "album": {
9
    "id": "4OHNH3sDzIxnmUADXzv2kT",
10
    "name": "Hot Fuss",
11
    "images": [
12
      {
13
        "url": "https://i.scdn.co/image/...",
14
        "height": 640,
15
        "width": 640
16
      }
17
    ],
18
    "release_date": "2004-06-07"
19
  },
20
  "artists": [
21
    {
22
      "id": "0C0XlULifJtAgn6ZNCW2eu",
23
      "name": "The Killers"
24
    }
25
  ],
26
  "available_markets": ["US", "GB", "DE", ...]
27
}

Search

Endpoint: GET /v1/search

Parameters:

Parameter	Type	Required	Description
q	string	Yes	Search query
type	string	Yes	Comma-separated: track,artist,album,playlist
limit	integer	No	Max results per type (default: 20, max: 50)
offset	integer	No	Pagination offset
market	string	No	ISO country code for availability filtering

Response:

1
{
2
  "tracks": {
3
    "items": [...],
4
    "total": 1000,
5
    "limit": 20,
6
    "offset": 0,
7
    "next": "https://api.spotify.com/v1/search?offset=20&..."
8
  },
9
  "artists": {...},
10
  "albums": {...}
11
}

Create Playlist

Endpoint: POST /v1/users/{user_id}/playlists

Request:

1
{
2
  "name": "Road Trip",
3
  "description": "Songs for the drive",
4
  "public": false,
5
  "collaborative": false
6
}

Response (201 Created):

1
{
2
  "id": "7d2D2S5F4d0r33mDf0d33D",
3
  "name": "Road Trip",
4
  "owner": {
5
    "id": "user123",
6
    "display_name": "John"
7
  },
8
  "tracks": {
9
    "total": 0
10
  },
11
  "snapshot_id": "MTY4MzI0..."
12
}

Rate Limits

Endpoint Category	Limit	Window
Standard endpoints	100 requests	30 seconds
Search	30 requests	30 seconds
Player control	50 requests	30 seconds
Playlist modifications	25 requests	30 seconds

Data Modeling

Track Schema (PostgreSQL)

1
CREATE TABLE tracks (
2
    id VARCHAR(22) PRIMARY KEY,  -- Spotify base62 ID
3
    name VARCHAR(500) NOT NULL,
4
    duration_ms INTEGER NOT NULL,
5
    explicit BOOLEAN DEFAULT false,
6
    popularity SMALLINT DEFAULT 0,
7
    isrc VARCHAR(12),  -- International Standard Recording Code
8
    preview_url TEXT,
9

10
    -- Denormalized for read performance
11
    album_id VARCHAR(22) REFERENCES albums(id),
12

13
    -- Audio features (from Echo Nest analysis)
14
    tempo DECIMAL(6,3),  -- BPM
15
    key SMALLINT,  -- 0-11 pitch class
16
    mode SMALLINT,  -- 0=minor, 1=major
17
    time_signature SMALLINT,
18
    danceability DECIMAL(4,3),
19
    energy DECIMAL(4,3),
20
    valence DECIMAL(4,3),
21

22
    created_at TIMESTAMPTZ DEFAULT NOW(),
23
    updated_at TIMESTAMPTZ DEFAULT NOW()
24
);
25

26
-- Track-Artist relationship (many-to-many)
27
CREATE TABLE track_artists (
28
    track_id VARCHAR(22) REFERENCES tracks(id),
29
    artist_id VARCHAR(22) REFERENCES artists(id),
30
    position SMALLINT NOT NULL,  -- Artist order
31
    PRIMARY KEY (track_id, artist_id)
32
);
33

34
-- Indexes for common queries
35
CREATE INDEX idx_tracks_album ON tracks(album_id);
36
CREATE INDEX idx_tracks_popularity ON tracks(popularity DESC);
37
CREATE INDEX idx_tracks_isrc ON tracks(isrc);

Playlist Schema (Cassandra)

Cassandra excels at playlist storage due to write-heavy patterns:

1
CREATE TABLE playlists (
2
    user_id TEXT,
3
    playlist_id TEXT,
4
    name TEXT,
5
    description TEXT,
6
    is_public BOOLEAN,
7
    is_collaborative BOOLEAN,
8
    snapshot_id TEXT,
9
    follower_count COUNTER,
10
    created_at TIMESTAMP,
11
    updated_at TIMESTAMP,
12
    PRIMARY KEY (user_id, playlist_id)
13
) WITH CLUSTERING ORDER BY (playlist_id ASC);
14

15
CREATE TABLE playlist_tracks (
16
    playlist_id TEXT,
17
    position INT,
18
    track_id TEXT,
19
    added_by TEXT,
20
    added_at TIMESTAMP,
21
    PRIMARY KEY (playlist_id, position)
22
) WITH CLUSTERING ORDER BY (position ASC);
23

24
-- Denormalized for efficient ordering
25
CREATE TABLE playlist_tracks_by_added (
26
    playlist_id TEXT,
27
    added_at TIMESTAMP,
28
    position INT,
29
    track_id TEXT,
30
    PRIMARY KEY (playlist_id, added_at, position)
31
) WITH CLUSTERING ORDER BY (added_at DESC, position ASC);

Why Cassandra for playlists:

Write-optimized (append-only storage)
Horizontal scaling for 675M users
Tunable consistency (eventual for non-critical reads)
Counter support for follower counts

User Listening History (Cassandra)

1
CREATE TABLE listening_history (
2
    user_id TEXT,
3
    listened_at TIMESTAMP,
4
    track_id TEXT,
5
    context_uri TEXT,  -- playlist, album, or artist
6
    duration_ms INT,
7
    PRIMARY KEY (user_id, listened_at)
8
) WITH CLUSTERING ORDER BY (listened_at DESC)
9
  AND default_time_to_live = 7776000;  -- 90 days TTL

Search Index (Elasticsearch)

1
{
2
  "mappings": {
3
    "properties": {
4
      "track_id": { "type": "keyword" },
5
      "name": {
6
        "type": "text",
7
        "analyzer": "standard",
8
        "fields": {
9
          "exact": { "type": "keyword" },
10
          "autocomplete": {
11
            "type": "text",
12
            "analyzer": "autocomplete"
13
          }
14
        }
15
      },
16
      "artist_names": {
17
        "type": "text",
18
        "fields": { "exact": { "type": "keyword" } }
19
      },
20
      "album_name": { "type": "text" },
21
      "popularity": { "type": "integer" },
22
      "duration_ms": { "type": "integer" },
23
      "explicit": { "type": "boolean" },
24
      "available_markets": { "type": "keyword" },
25
      "release_date": { "type": "date" }
26
    }
27
  },
28
  "settings": {
29
    "analysis": {
30
      "analyzer": {
31
        "autocomplete": {
32
          "tokenizer": "autocomplete",
33
          "filter": ["lowercase"]
34
        }
35
      },
36
      "tokenizer": {
37
        "autocomplete": {
38
          "type": "edge_ngram",
39
          "min_gram": 1,
40
          "max_gram": 20,
41
          "token_chars": ["letter", "digit"]
42
        }
43
      }
44
    }
45
  }
46
}

Database Selection Matrix

Data Type	Store	Rationale
Catalog (tracks, albums, artists)	PostgreSQL	Relational queries, complex joins
User data (playlists, saves)	Cassandra	Write-heavy, horizontal scaling
Listening history	Cassandra	Time-series, high volume
Search index	Elasticsearch	Full-text search, faceting
ML features	Cloud Bigtable	Wide columns, sparse data
Hot metadata	Redis/Memcached	Sub-ms latency
Analytics	BigQuery	Ad-hoc queries, massive scale

$R \approx U \times V^T$

Where:

$U$ = user matrix (675M × 128)
$V$ = track matrix (100M × 128)

Implementation:

Alternating Least Squares (ALS) on Spark
Weekly retraining on full dataset
Incremental updates for new users/tracks

Content-Based Features (Echo Nest)

Each track has computed audio features:

Feature	Range	Description
Tempo	0-250 BPM	Beats per minute
Key	0-11	Pitch class (C=0, C#=1, …)
Mode	0-1	Minor=0, Major=1
Danceability	0.0-1.0	Rhythmic suitability for dancing
Energy	0.0-1.0	Perceptual intensity
Valence	0.0-1.0	Musical positivity
Speechiness	0.0-1.0	Presence of spoken words
Acousticness	0.0-1.0	Acoustic vs. electronic
Instrumentalness	0.0-1.0	Absence of vocals
Liveness	0.0-1.0	Presence of audience

Discover Weekly Pipeline

Generation schedule:

Runs Sunday night for Monday delivery
Pre-computed on Cloud Bigtable
675M personalized 30-track playlists

Algorithm:

User taste profile: Aggregate recent listening into genre/artist weights
Candidate selection: Find tracks listened to by similar users (collaborative)
Audio filtering: Match audio features to user preferences (content-based)
Freshness boost: Prioritize tracks user hasn’t heard
Diversity injection: Ensure variety across genres, artists
Final ranking: ML model predicts skip probability

Approximate Nearest Neighbor Index

For real-time recommendations, use Annoy (Approximate Nearest Neighbors Oh Yeah):

Index structure:

128-dimensional embeddings for 100M tracks
Forest of random projection trees
Trade-off: accuracy vs. query time

Query performance:

10ms for top-100 nearest neighbors
95% recall vs. exact search
100 trees provides good balance

Encrypted audio files using AES-256
Per-device keys tied to account
Keys stored in secure enclave (iOS) or hardware-backed keystore (Android)

License constraints:

Constraint	Value	Rationale
Offline validity	30 days	Requires periodic online check
Device limit	5 devices	Prevent account sharing
Track limit	10,000 per device	Storage management
Concurrent offline	1 device	Licensing terms

Sync Strategy

Smart downloads:

1
def prioritize_downloads(playlist, device_storage):
2
    """Prioritize which tracks to download first."""
3
    scored_tracks = []
4

5
    for track in playlist.tracks:
6
        score = 0
7

8
        # User explicitly requested
9
        if track in user_requested:
10
            score += 100
11

12
        # Recently played (likely to play again)
13
        if track in recent_plays:
14
            score += 50
15

16
        # High popularity in playlist
17
        score += track.playlist_position_score
18

19
        # Already partially downloaded
20
        if track.partial_download:
21
            score += 30
22

23
        scored_tracks.append((track, score))
24

25
    # Download in priority order until storage full
26
    for track, _ in sorted(scored_tracks, reverse=True):
27
        if device_storage.available > track.size:
28
            download(track)

Storage Management

Eviction policy:

Remove tracks not played in 90+ days
Remove tracks from unfollowed playlists
LRU eviction when approaching storage limit

Storage estimation UI:

1
Playlist: Road Trip (50 tracks)
2
Download size: 180 MB (Normal quality)
3
              350 MB (Very High quality)
4
Device storage: 2.1 GB available

1
{
2
  "query": {
3
    "bool": {
4
      "should": [
5
        {
6
          "match": {
7
            "name.autocomplete": {
8
              "query": "mr bright",
9
              "operator": "and"
10
            }
11
          }
12
        },
13
        {
14
          "match": {
15
            "artist_names.autocomplete": {
16
              "query": "mr bright",
17
              "operator": "and"
18
            }
19
          }
20
        }
21
      ],
22
      "minimum_should_match": 1
23
    }
24
  },
25
  "sort": ["_score", { "popularity": "desc" }],
26
  "size": 10
27
}

Performance targets:

Typeahead latency: p99 < 50ms
Full search latency: p99 < 200ms
Index update lag: < 4 hours for new releases

Ranking Signals

Signal	Weight	Description
Text relevance	0.3	BM25 score from Elasticsearch
Popularity	0.25	Global stream count (log-scaled)
User affinity	0.2	Based on listening history
Freshness	0.15	Boost for new releases
Market availability	0.1	Available in user’s region

Frontend Considerations

Player State Management

Global player state:

1
interface PlayerState {
2
  // Current playback
3
  currentTrack: Track | null
4
  position_ms: number
5
  duration_ms: number
6
  isPlaying: boolean
7

8
  // Queue
9
  queue: Track[]
10
  queuePosition: number
11

12
  // Context (what initiated playback)
13
  context: {
14
    type: "playlist" | "album" | "artist" | "search"
15
    uri: string
16
  }
17

18
  // Shuffle and repeat
19
  shuffle: boolean
20
  repeatMode: "off" | "context" | "track"
21

22
  // Device
23
  activeDevice: Device
24
  volume: number
25
}

State synchronization:

Local state for immediate UI feedback
WebSocket for cross-device sync (Spotify Connect)
Optimistic updates with reconciliation

Audio Buffering Strategy

1
class AudioBuffer {
2
  private segments: Map<number, ArrayBuffer> = new Map()
3
  private prefetchAhead = 30 // seconds
4

5
  async ensureBuffered(currentPosition: number): Promise<void> {
6
    const currentSegment = Math.floor(currentPosition / SEGMENT_SIZE)
7
    const targetSegment = Math.ceil((currentPosition + this.prefetchAhead) / SEGMENT_SIZE)
8

9
    for (let i = currentSegment; i <= targetSegment; i++) {
10
      if (!this.segments.has(i)) {
11
        const segment = await this.fetchSegment(i)
12
        this.segments.set(i, segment)
13
      }
14
    }
15

16
    // Evict old segments to manage memory
17
    this.evictOldSegments(currentSegment - 2)
18
  }
19
}

Mobile Optimizations

Constraint	Mitigation
Battery	Batch network requests, use efficient codecs
Data usage	Quality auto-adjust, download on WiFi
Memory	Limit buffer size, lazy-load images
Background	iOS: Background Audio mode; Android: Foreground Service
Offline	SQLite for metadata, encrypted file storage

Web Player Architecture

Web Audio API usage:

1
const audioContext = new AudioContext()
2
const source = audioContext.createBufferSource()
3
const gainNode = audioContext.createGain()
4

5
// Crossfade between tracks
6
function crossfade(currentSource, nextSource, duration) {
7
  const now = audioContext.currentTime
8

9
  // Fade out current
10
  currentSource.gainNode.gain.setValueAtTime(1, now)
11
  currentSource.gainNode.gain.linearRampToValueAtTime(0, now + duration)
12

13
  // Fade in next
14
  nextSource.gainNode.gain.setValueAtTime(0, now)
15
  nextSource.gainNode.gain.linearRampToValueAtTime(1, now + duration)
16

17
  nextSource.start(now)
18
}

Service	Use Case	Scale
GKE	Microservices orchestration	300+ services
Cloud Pub/Sub	Event streaming	1T+ messages/day
Cloud Dataflow	Stream/batch processing	Petabytes/day
BigQuery	Analytics, ML training	10M+ queries/month
Cloud Bigtable	ML feature store	Petabytes
Cloud Storage	Audio files, backups	Exabytes
Cloud Spanner	Transactional data	Global consistency

Migration Story

Timeline:

2016: Announced migration from on-premise to GCP
2017: Fully migrated to Google Cloud
Result: 60% cost reduction, faster product development

Key decisions:

Kafka → Pub/Sub for event delivery (4x lower latency)
Hadoop → Dataflow for batch/stream processing
Custom dashboards → BigQuery for analytics

Multi-Region Strategy

1
Regions:
2
- us-central1 (Primary Americas)
3
- europe-west1 (Primary EMEA)
4
- asia-east1 (Primary APAC)
5

6
Data replication:
7
- User data: Multi-region Spanner
8
- Audio: Cloud Storage multi-region
9
- Analytics: BigQuery cross-region

Developer Platform (Backstage)

Spotify open-sourced Backstage in 2020—their internal developer portal:

Features:

Service catalog (track all microservices)
TechDocs (documentation as code)
Software templates (scaffold new services)
Plugin ecosystem (integrate with tools)

Impact:

2,200+ contributors
3,000+ adopting companies
CNCF incubating project

Conclusion

Designing Spotify-scale music streaming requires different optimizations than video platforms:

Key architectural decisions:

Multi-CDN delivery (Akamai, AWS, Fastly) provides global reach with failover and cost optimization
Ogg Vorbis encoding at multiple bitrates (96-320 kbps) balances quality and bandwidth with adaptive switching
Cassandra for user data handles write-heavy workloads (playlists, history) with horizontal scaling
Hybrid recommendation combining collaborative filtering, audio features, and NLP drives 30%+ of listening
GCP migration (2016-2017) reduced costs 60% while enabling faster product iteration
Event-driven architecture via Pub/Sub processes 1T+ events/day for real-time personalization

What this design optimizes for:

Instant playback (< 500ms time-to-first-audio)
Seamless cross-device experience (Spotify Connect)
Deep personalization (Discover Weekly, Daily Mix)
Offline reliability (encrypted downloads with license management)

What this design sacrifices:

Lossless audio quality (limited to 320 kbps lossy until recent Premium updates)
Real-time social features (friend activity delayed)
Podcast transcription/search (limited compared to dedicated platforms)

When to choose this design:

Audio streaming at scale (100M+ users)
Personalization as core differentiator
Need for offline mode with DRM

Appendix

Prerequisites

CDN architecture: edge caching, origin shield concepts
Audio encoding: codecs, bitrates, compression
Distributed databases: Cassandra data modeling, consistency trade-offs
Recommendation systems: collaborative filtering, content-based filtering basics
Stream processing: event-driven architecture, Pub/Sub patterns

Terminology

Term	Definition
ABR	Adaptive Bitrate—dynamically selecting audio quality based on network conditions
Ogg Vorbis	Open-source, royalty-free audio codec used by Spotify
Gapless playback	Seamless transition between tracks without silence gaps
Crossfade	Gradual blend between end of one track and start of next
Collaborative filtering	Recommendation based on similar users’ behavior
Content-based filtering	Recommendation based on item attributes (audio features)
Echo Nest	Music intelligence company acquired by Spotify in 2014
Spotify Connect	Protocol for cross-device playback control
Pub/Sub	Publish-Subscribe messaging pattern for event streaming
Edge n-gram	Tokenization for autocomplete (prefixes: “s”, “sp”, “spo”…)

Summary

Spotify serves 675M+ MAU with multi-CDN delivery (Akamai, AWS, Fastly) for global reach
Ogg Vorbis encoding at 96-320 kbps with client-side adaptive quality selection
Cassandra handles write-heavy user data (playlists, history) with horizontal scaling
Hybrid recommendation (collaborative + content-based + NLP) drives 30%+ of streams
Event pipeline via Pub/Sub processes 1T+ events/day for real-time personalization
Offline mode uses encrypted storage with per-device DRM licensing
Backstage developer portal (open-sourced 2020) manages 300+ internal microservices

References

How Spotify Aligned CDN Services - Multi-CDN strategy
Personalization at Spotify Using Cassandra - Cassandra architecture
Spotify’s Event Delivery: Road to the Cloud - Kafka to Pub/Sub migration
Spotify chooses Google Cloud Platform - GCP migration
Spotify Case Study (Google Cloud) - Infrastructure details
Backstage Developer Portal - Open source developer platform
Spotify Audio Quality - Official bitrate documentation
Spotify Audio Features API - Echo Nest features
Spotify Removes P2P (TechCrunch) - P2P deprecation
IEEE: Spotify P2P Architecture - Original P2P design paper
Spotify Statistics (2024) - Scale numbers