Design Netflix Video Streaming
Netflix ended 2024 with 301.63 million paid subscribers and 94 billion hours of viewing in H2 2024 alone. Unlike user-generated platforms (YouTube, TikTok), Netflix is a consumption-first architecture: the catalog is finite and known in advance, so the design problem is not ingest at scale — it is delivering pre-encoded bytes to the player with sub-2-second start, near-zero rebuffering, and the highest perceptual quality the path can sustain. This article is the system design view of how that works: the Open Connect CDN, the Cosmos encoding pipeline, per-title and shot-based optimization, adaptive bitrate playback, multi-DRM, and the personalization stack that decides what to play and what artwork to show.
Mental model
Three constraints shape almost every decision in Netflix’s streaming stack:
- Predictable catalog enables proactive caching. A finite, slowly-changing library lets Netflix populate edge caches before anyone presses play, instead of relying on demand-driven cache fills. That is what the Open Connect program is built around.
- Quality-per-bit varies by content. A talking-heads interview compresses dramatically more efficiently than an action sequence at the same perceptual quality. Per-title and per-shot encoding turn that variance into bandwidth savings without losing perceived quality, measured with VMAF.
- Device fragmentation forces a multi-codec strategy. A 2015 smart TV decodes only H.264; a 2024 smart TV decodes AV1 in hardware. Netflix encodes each title in several codecs and serves the best one each device can decode, which is why AV1 now powers about 30% of Netflix viewing.
Two architectural splits hold the whole stack together:
- Control plane on AWS, data plane on Open Connect. API requests, manifests, DRM licenses, and personalization run on EC2 / Titus inside a few AWS regions. Bytes of video never leave Open Connect.
- Pipeline first, playback second. Anything that can be precomputed (encoding, popularity prediction, fill placement) runs offline. Online playback is mostly just edge fetches plus a thin manifest/license round-trip.
Requirements
Functional
| Requirement | Priority | Notes |
|---|---|---|
| Adaptive video playback | Core | Multiple bitrates and codecs per title |
| Multi-device support | Core | TVs, mobile, tablets, browsers, game consoles |
| Personalized recommendations | Core | Drives roughly 80% of viewing |
| Continue watching | Core | Cross-device position sync |
| Multiple profiles | Core | Per-user personalization |
| Offline downloads | Core | Mobile viewing without connectivity |
| Search and browse | Core | Full catalog discovery |
| Subtitles and audio tracks | Core | Multiple languages per title |
| DRM protection | Core | Content security across all platforms |
| Parental controls | Extended | Content filtering by maturity rating |
| Live streaming | Extended | Sport and event streaming, recently added |
Non-functional targets
| Property | Target | Source / rationale |
|---|---|---|
| Playback availability | ~99.99% | Subscription churn cost; chaos engineering bar |
| Playback start latency | p99 < 2 s | Industry rule of thumb; Netflix QoE target |
| Rebuffering ratio | < 0.1% of playback time | AV1 rollout cut buffering by 45% on TVs |
| Perceptual quality | VMAF > 93 | Netflix’s deployment threshold |
| Edge cache hit rate | > 95% | Origin egress cost dominates without it |
Scale baseline (publicly disclosed and inferred)
Subscribers (paid, Q4 2024): 301.63M, in 190+ countriesViewing (H2 2024): 94B hours, +5% YoYLive peak (Nov 2024): 65M concurrent streams (Tyson v Paul fight)Edge footprint: ISP partners: 1,000+ (Open Connect program) Locations: ~1,000+ (Netflix; an academic snapshot in 2016 mapped 4,669 servers) ASNs in BGP: AS2906 (Netflix POPs), AS40027 (embedded OCAs), AS55095AWS estate (control plane only — no video bytes): EVCache (Memcached): ~200 clusters, ~22,000 instances, ~400M ops/sec, ~2T cached items, ~14.3 PB of memoryNote
Concurrent-stream and DAU figures are not published at full granularity. The 65M concurrent peak above is from a single live event; steady-state evening peaks are not officially disclosed. Daily active subscribers, total encoded storage, and the absolute video-traffic figure are educated estimates, not Netflix releases.
Two viable design paths
For interview / decision contexts, contrast a commercial-CDN design with the custom-CDN design Netflix actually runs.
Path A — third-party CDN
Best when the catalog is small, the geographic footprint is narrow, traffic is variable, or the team cannot justify dedicated hardware.
- Use a commercial CDN (Akamai, CloudFront, Fastly, Cloudflare).
- Origin lives in a cloud provider; CDN handles caching, routing, and TLS termination.
Trade-offs:
- Pros: zero infrastructure investment, pay-per-GB pricing aligned with usage, instant global reach, vendor handles TLS/cert hygiene.
- Cons: per-GB costs become punitive at scale; no ISP-level embedding, so an extra peering hop on every byte; less control over cache key / eviction; harder to ship content-aware optimisation.
Most OTT services (Hulu, Disney+, Peacock, Paramount+) live here.
Path B — custom CDN with ISP embedding (Netflix’s model)
Best when the catalog is large and predictable, scale justifies hardware, and quality differentiation is a competitive lever.
- Custom appliances (OCAs) deployed inside ISP facilities or at IXPs.
- Proactive overnight content distribution from S3 origin to OCAs.
- No per-GB transit costs after hardware investment; direct peering kills intermediate hops.
Trade-offs:
- Pros: ~98% edge hit rate with proactive fill; sub-millisecond OCA-to-subscriber latency; amortised cost far below commercial CDN at Netflix scale; complete control over caching and steering.
- Cons: large upfront hardware investment; ISP relationship management is a real organisation; long lead times for new regions; you own all the hardware and on-call.
Real-world parallels: Google does the equivalent with Google Global Cache for YouTube; Meta operates Facebook Edge Network.
Path comparison
| Factor | Third-party CDN | Custom CDN (Open Connect) |
|---|---|---|
| Setup time | Hours | Months |
| Upfront cost | None | Hardware-heavy |
| Per-GB cost | Near-zero amortised | |
| ISP RTT add | +10 – 50 ms peering | +1 – 5 ms embedded |
| Edge hit rate | 85 – 95% | ~98% |
| Operational ownership | Vendor | You |
| Sweet spot | < 50M subscribers | > 100M subscribers |
The rest of this article focuses on Path B: it is what makes Netflix’s architecture distinctive, and most of the engineering reasoning generalises to any service willing to invest in an edge.
High-level design
Component overview
The control plane and data plane are operated as two independent stacks. The split is deliberate — the workloads have different scaling, security, and failure properties, and treating them separately keeps each one tractable.
Traffic split: AWS vs Open Connect
| Traffic type | Infrastructure | Why it lives there |
|---|---|---|
| Video segments | Open Connect (100%) | Bandwidth-heavy, latency-sensitive |
| API requests | AWS | Compute-heavy, elastic |
| Personalization | AWS | ML training and inference, data-heavy |
| DRM license issuance | AWS | Security-critical, transactional |
| Manifest generation | AWS | Per-request, device-specific |
This is the single most important architectural decision in the system. Each side optimises for its workload: AWS scales millions of small API requests; Open Connect scales hundreds of terabits of video bytes.
Playback flow at request time
- Client requests playback → API gateway (Netflix-internal Zuul) authenticates and routes to the Playback service.
- Playback service generates a manifest → device-specific list of streams (codec, resolution, bitrate, audio, subtitle).
- Steering service ranks OCAs → returns a sorted list keyed on proximity, current load, and health.
- Client acquires a DRM license → key exchange with the License service, in parallel with the first segment fetch.
- Client fetches segments from the chosen OCA → ABR algorithm picks a quality rung per segment.
- Playback proceeds → continuous adaptation based on buffer level and observed throughput.
Control plane vs data plane
| Plane | Where | Responsibilities |
|---|---|---|
| Control plane | AWS | Auth, manifest generation, DRM licensing, recommendations, A/B tests, billing |
| Data plane | Open Connect | Video segment delivery, OCA cache management, BGP route announcements, telemetry |
Open Connect CDN
Topology
OCA appliance specifications
Netflix currently publishes two OCA hardware tiers on the Open Connect site:1
| Appliance | Raw storage | Operational throughput | Typical use |
|---|---|---|---|
| Storage appliance | up to 120 TB | ~200 Gbps | High-traffic ISPs that need most of the catalog |
| Global appliance | up to 60 TB | ~80 Gbps | Smaller ISPs / emerging markets |
A single appliance has come a long way at the per-server throughput frontier:
- 2017 — serving 100 Gbps from a single OCA by combining FreeBSD, NGINX, and a careful NUMA layout. The earlier ~90 Gbps milestone was for mostly unencrypted traffic; reaching 100 Gbps with TLS required moving encryption into the kernel TLS path.
- 2021 — 400 Gbps from a single FreeBSD server by offloading TLS to the NIC. Without that offload, throughput was capped at roughly 240 Gbps by memory bandwidth and CPU encryption cost.
CPU: AMD EPYC 7502P (32-core "Rome")Memory: 256 GB DDR4-3200 (8 channels)Storage: 18 x 2 TB Western Digital SN720 NVMe (PCIe Gen3 x4)Network: 2 x Mellanox ConnectX-6 Dx (PCIe Gen4 x16, 4 x 100 GbE)OS / WS: FreeBSD-CURRENT, NGINXTLS: Kernel TLS with NIC offload (key enabler for 400 Gbps)Deployment models
Embedded (inside ISP network): OCA sits in the ISP’s data center. Zero transit cost for the ISP, lowest possible RTT (often 1–5 ms) to subscribers, and the ISP supplies power, space, and connectivity.
Peering (at an IXP): OCA cluster sits at an Internet Exchange Point such as Equinix or DE-CIX. Serves multiple ISPs via peering. Higher RTT than embedded (roughly 10–50 ms), but no ISP partnership required.
In practice, an ISP starts on the peering side and graduates to embedded once its Netflix traffic justifies hardware on-prem.
BGP routing
OCAs announce routes via BGP so traffic stays as close to the subscriber as possible. The Open Connect partner docs are the source of truth here:2
AS2906 Netflix Streaming Services (POPs and direct peering)AS40027 Netflix Streaming Services (embedded OCAs)AS55095 Netflix Streaming Services (additional OCA prefixes)as-set AS-NFLX (covers all three)Route preference at the partner router, in the usual BGP order:
- OCA availability and health (BGP withdrawn if the OCA is unhealthy)
- Most-specific prefix wins
- Shortest AS path
- MED (multi-exit discriminator)
- Geographic / iBGP tie-breakers
Fill pipeline
Netflix doesn’t wait for cache misses. The fill pipeline predicts demand and pre-positions files during off-peak hours.
The crucial optimisation came from switching from title-level popularity to file-level popularity. Not every file for a title is equally popular — the 4K HDR AV1 file is only useful on capable devices, while the 720p H.264 fallback might serve the long tail. By ranking files individually, Netflix reached the same effective hit rate with roughly half the storage footprint per OCA.
Cache performance
| Metric | Target | Achieved (publicly stated) |
|---|---|---|
| Edge cache hit rate | > 95% | ~98% |
| Origin fetch rate | < 5% | ~2% |
At Netflix scale, a 1% improvement in edge hit rate eliminates terabytes of daily egress from S3 and transit cost everywhere upstream of the OCA.
Video encoding pipeline
Cosmos: the platform under everything
Netflix completed migration of its video pipeline from the legacy Reloaded system to Cosmos in September 2023. Cosmos is “orchestrated functions as a service” — a media-centric microservice platform built around three subsystems and an asynchronous bus:3
| Component | Role |
|---|---|
| Optimus | API layer — maps external requests to internal media business models |
| Plato | Workflow orchestration — DAGs / rule-based step modelling |
| Stratum | Serverless compute — stateless containers for CPU-heavy work |
| Timestone | High-priority asynchronous messaging between the layers above |
A Stranger Things episode with an average shot length of around four seconds runs through about 900 shots per episode through that pipeline, which is the practical scale that justifies the workflow orchestration layer.4
From fixed ladder to per-shot allocation
The encoding strategy has evolved through three generations, each one trading more CPU for fewer bits at the same perceptual quality.
Per-title encoding (2015)
Introduced in December 2015. Netflix runs hundreds of trial encodes per title at varying resolutions and quantization parameters, plots the resulting (bitrate, VMAF) points, and selects the convex-hull points as the bitrate ladder for that specific title.
Worked example from the original blog: Orange Is the New Black’s 1080p top rung dropped from 5800 kbps on the fixed ladder to 4640 kbps on the per-title ladder — a 20% reduction at the same quality. Light-motion content (animations, talking-head shows) sees larger savings; high-motion content sees smaller ones.
Shot-based encoding / Dynamic Optimizer (2018+)
Per-title still applies a uniform bitrate within a single rung. The Dynamic Optimizer goes further: it segments the source on shot boundaries, scores each shot’s complexity, and allocates bits per shot. The pipeline then assembles the shots back together, with IDR frames aligned to shot boundaries so the assembly remains valid.
Reported impact:
| Improvement | Headline number |
|---|---|
| 4K SDR top-rung bitrate | ~8 Mbps average vs the prior 16 Mbps ceiling5 |
| HDR catalog storage | DO ladder uses ~58% of the fixed-ladder storage footprint (full HDR catalog by June 2023) |
| Per-codec bitrate savings | ~28% on x264, ~34% on HEVC, ~38% on VP9 vs fixed-QP6 |
Codec strategy
Netflix encodes each title in multiple codecs and serves the best one each device can decode in hardware. The current line-up:
| Codec | Bandwidth vs H.264 | Device support | Encoding cost | Netflix usage |
|---|---|---|---|---|
| H.264 (AVC) | Baseline | Universal | 1x | Legacy fallback |
| H.265 (HEVC) | ~50% better | iOS, Safari, modern smart TVs | 2 – 4x | Apple ecosystem and recent HDR TVs |
| VP9 | ~50% better | Chrome, Firefox, Android, many TVs | 2 – 3x | Web and Android baseline |
| AV1 | ~30 – 50% vs VP9 | Modern browsers, recent TVs and Apple silicon | 5 – 10x | ~30% of all Netflix viewing as of Dec 2025 |
Netflix’s December 2025 AV1 update reports that AV1 streams averaged +4.3 VMAF over AVC and +0.9 VMAF over HEVC at roughly one-third less bandwidth, and contributed to a 45% reduction in rebuffering on TVs. Since 2023, almost every device submitted for Netflix certification has supported AV1, and Netflix expects AV1 to become the primary delivery codec.
The decision logic the player uses at startup, in pseudocode:
def pick_codec(device): if device.supports_hardware("av1"): return "av1" if device.supports_hardware("vp9"): return "vp9" if device.supports_hardware("hevc") and device.platform == "apple": return "hevc" return "h264"Important
“Supports AV1” almost always means hardware decode — software decode of AV1 in a TV main loop melts the CPU and battery. The Android rollout used the dav1d software decoder as a stopgap, but TV adoption only became viable once hardware decoders were widespread.
VMAF: the quality metric that replaced PSNR
Video Multimethod Assessment Fusion (VMAF) was open-sourced by Netflix in June 2016, built jointly with USC, IPI/LS2N Nantes, and the UT Austin LIVE lab.
Components fused via SVM regression:
- Visual Information Fidelity (VIF) at four spatial scales
- Detail Loss Metric (DLM)
- Motion — average absolute pixel difference between adjacent frames
Score interpretation Netflix uses internally:
| VMAF score | Quality level |
|---|---|
| 93+ | Excellent (deployment target) |
| 85 – 93 | Good |
| 70 – 85 | Fair |
| < 70 | Poor — re-encode |
PSNR is a decent signal-to-noise metric, but it correlates poorly with what humans actually see: film grain looks fine to viewers but tanks PSNR; smoothed/blurred output looks bad to viewers but scores well. VMAF was trained against subjective Mean Opinion Score (MOS) data, which is what made it adoptable as an industry-wide quality target.
Adaptive bitrate streaming
Streaming protocols
| Protocol | Where Netflix uses it | Notes |
|---|---|---|
| MPEG-DASH | Most non-Apple devices | ISO/IEC 23009-1 (5th edition, 2022) |
| HLS | Safari, iOS, Apple TV | RFC 8216 (Informational, 2017); draft-pantos-hls-rfc8216bis is the active 2nd-edition draft (rev 21, March 2026) |
| CMAF | Underlying packaging | ISO/IEC 23000-19; single fMP4 serves both HLS and DASH |
Netflix uses CMAF (Common Media Application Format) so a single set of fragmented MP4 segments serves both HLS and DASH manifests; the per-protocol manifest just changes how the same byte ranges are described. CMAF also defines the chunk unit inside a segment — typically ~200 ms inside a ~4 s segment — which is what enables low-latency variants. HLS Low-Latency (#EXT-X-PART partial segments) and DASH-LL both publish those chunks as soon as the encoder closes them, instead of waiting for the full segment to land.
Manifest generation
When a client requests playback, the Playback service generates a per-device manifest containing:
- The bitrate ladder filtered to what the device can actually decode and render
- The codec set the device hardware supports
- Audio tracks (languages, formats including spatial audio where supported)
- Subtitle tracks
- DRM information — license server URL, key IDs, robustness level
- A ranked list of OCA URLs from the steering service
A simplified HLS variant playlist for an AV1 ladder:
#EXTM3U#EXT-X-VERSION:7#EXT-X-STREAM-INF:BANDWIDTH=12000000,RESOLUTION=3840x2160,CODECS="av01.0.13M.08"4k-av1/playlist.m3u8#EXT-X-STREAM-INF:BANDWIDTH=8000000,RESOLUTION=1920x1080,CODECS="av01.0.08M.08"1080p-av1/playlist.m3u8#EXT-X-STREAM-INF:BANDWIDTH=3000000,RESOLUTION=1280x720,CODECS="av01.0.04M.08"720p-av1/playlist.m3u8#EXT-X-STREAM-INF:BANDWIDTH=1000000,RESOLUTION=854x480,CODECS="av01.0.01M.08"480p-av1/playlist.m3u8ABR algorithm
Netflix’s ABR work is published in pieces — the highest-signal references are the Buffer-Based Approach paper (Huang et al., SIGCOMM 2014) and follow-up work on RL-based ABR. The deployed algorithm is hybrid: throughput estimation gates what is safe, buffer level decides whether to climb or hold. The two academic baselines worth knowing in this space are BOLA (Spiteri et al., INFOCOM 2016) — a near-optimal Lyapunov-based buffer-only controller, used in dash.js — and MPC (Yin et al., SIGCOMM 2015), which formulates ABR as a model-predictive control problem over a short horizon. Netflix’s production controller is closer in spirit to a hybrid buffer-based approach with throughput as the safety floor.
Inputs the ABR controller actually consumes:
- Buffer level (seconds of playable content already downloaded)
- Throughput history (exponential weighted moving average of recent segment download rates)
- Device constraints (memory ceiling, CPU headroom, battery state on mobile)
- Network type (Wi-Fi vs cellular — cellular has tighter caps and harsher tails)
Sketch of the selection rule:
safe_bitrate = throughput_estimate * safety_margin # safety_margin ~ 0.7 in practicebuffer_factor = buffer_level / target_bufferif just_rebuffered: selected = lowest_rungelif buffer_factor < 0.5: selected = highest rung strictly below safe_bitrateelse: selected = highest rung <= safe_bitrateStartup behaviour: open at a conservative quality (often 720p or below), fetch the first few segments to fill the buffer past 10 s, then ramp up. Reaching the target rung typically takes 15 – 30 s of healthy network.
Tip
Practical guard-rails most ABR implementations enforce: a minimum dwell time (~10 s) between rung switches to prevent oscillation, a maximum drop of two rungs per switch under normal conditions, and an emergency drop to the lowest rung if the buffer falls under ~5 s. Exact thresholds vary by client.
Rebuffering prevention
Netflix targets a rebuffering ratio under 0.1% of playback time. The mechanisms that get there:
| Mechanism | How it helps |
|---|---|
| Proactive quality drop | Drop a rung before the buffer empties, not after |
| Aggressive prefetch | Always fetch enough segments to absorb a brief stall |
| OCA fallback | Switch to the next OCA in the steered list if current degrades |
| Conservative startup | Avoid overcommitting on the first few segments |
The recent AV1 rollout deserves a callout here: the Dec 2025 update credits AV1 with a ~45% reduction in TV rebuffering versus prior codecs, mostly because the smaller segment sizes are more resilient to bursty network conditions.
DRM and license management
Multi-DRM strategy
Netflix encrypts each segment once using MPEG Common Encryption (CENC) and decrypts on-device using whichever DRM the platform exposes:
| DRM | Vendor | Devices |
|---|---|---|
| Widevine | Android, Chrome, smart TVs | |
| PlayReady | Microsoft | Windows, Xbox, Edge |
| FairPlay | Apple | iOS, macOS, Safari, tvOS |
CENC isn’t fully uniform — FairPlay historically requires CBCS encryption inside HLS packaging, while Widevine and PlayReady traditionally use CTR inside DASH, though the AOMedia work on AV1 is converging on CBCS for both. The orchestration logic on the License service sits on top of those differences.
License acquisition flow
Security layers actually enforced:
- Device attestation — the CDM proves it is a legitimate, untampered implementation; broken keys get revoked
- Entitlement check — the License service verifies the subscription is active for that title
- Key rotation — periodic re-keying for long sessions and live content
- Output protection — HDCP enforcement on the output port for HD/4K rungs (no HDCP, no high-rung license)
Offline downloads
- Client requests a download license — longer expiration, often 7 – 30 days.
- Segments are downloaded and stored re-encrypted on device.
- The download license expires; playback after expiration requires online renewal.
- The mobile-only AVCHi-Mobile and VP9-Mobile encodes trade some quality for substantially smaller download sizes.
The hard parts of offline aren’t cryptographic — they are storage estimation, codec / bitrate changes that invalidate stored content, and license expiration UX.
Personalization
Recommendation architecture
Netflix’s own published estimate — from the Gomez-Uribe and Hunt paper The Netflix Recommender System (ACM TMIS, 2016) — is that recommendations drive about 80% of viewing hours and that the combined personalization stack is worth on the order of $1B per year in retention. That economic case is what funds the entire personalization organisation.
Algorithm families
| Family | Where it shows up |
|---|---|
| Collaborative filtering | Long-running baseline — “users like you watched” |
| Content-based | Cold-start for new titles, taste similarity |
| Matrix factorization | Latent-factor models for user × title preference |
| Deep learning (PVR) | Personalised Video Ranking — the production ranking model |
| Contextual bandits | Artwork selection, row ordering — explore vs exploit |
| Reinforcement learning | Long-horizon engagement objectives |
Personalised artwork
The thumbnail you see for a given title is not the thumbnail your friend sees. Netflix runs contextual bandits over a pool of candidate images per title, optimising for click-through subject to long-term engagement signals (so the system doesn’t degenerate into clickbait).
Components that make this practical:
- A pool of artwork variants per title, often featuring different actors / scenes / moods
- Pre-computed image features (faces, scenes, dominant colours, mood tags)
- A bandit model that picks per-impression based on the viewer’s signal vector
- A heavy A/B framework underneath, so wins are confirmed before going to 100%
A/B testing scale
Every meaningful product change passes through A/B testing. Worth knowing about Netflix’s setup specifically:
- A given user is in many concurrent tests; the platform handles orthogonality
- Tests run long enough to measure retention, not just immediate engagement
- Causal-inference techniques (CUPED, surrogate metrics) let small effects be detected sooner
- Multi-objective optimisation balances engagement against satisfaction, account behaviour, and downstream churn
Frontend / client
Client architecture
Netflix’s TV, mobile, and web clients share the same skeleton:
| Component | Responsibility |
|---|---|
| Manifest parser | HLS / DASH parsing |
| ABR controller | Quality selection (the state machine above) |
| Buffer manager | Segment scheduling and prefetch |
| DRM handler | License acquisition, key management |
| Player core | Decode, render, audio sync |
| Telemetry | QoE metrics — start time, rebuffer events, quality |
Playback start optimisation
The published target is first frame in under 2 seconds. The startup budget breaks down roughly as:
| Phase | Typical budget | Levers used to hit it |
|---|---|---|
| DNS | < 20 ms | Pre-resolved CDN domains during browsing |
| TLS handshake | < 50 ms | TLS 1.3, 0-RTT resumption |
| Manifest fetch | < 100 ms | Edge-cached, often pre-fetched on hover |
| License fetch | < 200 ms | Issued in parallel with first segment |
| First segment | < 500 ms | Small initial segments at lower rung |
| Decode + render | < 200 ms | Hardware decode in mainstream codecs |
The non-obvious unlock is doing license acquisition in parallel with the first segment fetch, not serially after it.
Offline playback (mobile)
Mobile downloads add a few responsibilities:
- Storage estimation — show the file size before download starts
- Partial downloads — resume cleanly after interruption
- Smart downloads — auto-fetch the next episode in a binge
- License handling — gracefully prompt for renewal when offline tokens expire
- Quality choice — let the user pick a download tier, with mobile-specific encodes underneath
Infrastructure (control plane on AWS)
Netflix runs every non-video service on AWS. The build-up below is a load-bearing snapshot of the current shape.
| Service | AWS / Netflix component | Purpose |
|---|---|---|
| API gateway | Zuul + ELB | Request routing, auth, fan-out |
| Service discovery | Eureka | Dynamic service registry |
| Microservices | EC2 + Titus (containers) | Hundreds of services |
| Caching | EVCache (Memcached) | Hot data, fronted on Cassandra |
| Databases | Cassandra, RDS / MySQL | Metadata, profiles, billing |
| Streaming events | Kafka, Spark, Flink | Real-time analytics, ML features |
Note
The open-source Spring Cloud wrappers around Netflix OSS (Spring Cloud Netflix Eureka, Zuul 1, Ribbon) are in maintenance mode for the broader community. Netflix continues to maintain Zuul and Eureka internally; community projects such as Spring Cloud Gateway and Spring Cloud LoadBalancer have superseded them in many non-Netflix stacks.
EVCache itself is operated at extreme scale — current reporting puts it at roughly 200 clusters, 22,000 instances, 400 million ops/sec, 2 trillion items, and 14.3 PB of memory.
Resilience and chaos engineering
Netflix popularised chaos engineering as a discipline. The tooling is part of the architecture, not just a process.
| Tool | Function |
|---|---|
| Chaos Monkey | Randomly terminates production instances |
| Chaos Gorilla | Drops an entire AWS Availability Zone |
| Chaos Kong | Drops an entire AWS Region |
| FIT | Failure Injection Testing — targeted dependency failure |
The philosophy is that resilience is a property you can only verify under failure, so the system is failure-tested constantly in production.
Global architecture
Failure modes worth designing for
A few failure modes are useful to think through, both as an interview lens and as design heuristics for a system like this:
- OCA hardware failure → BGP withdraws the route, steering returns the next OCA in the ranked list, the client switches mid-stream. The client should be able to switch OCAs without rebuffering.
- Encoded variant missing → manifest validation must catch this before the client sees a 404 mid-segment. A missing 4K variant should silently fall back to the next rung.
- Cosmos DAG failure → Plato re-runs the failed Stratum step; failed VMAF segments are re-encoded at higher bitrate without re-encoding the whole title.
- License service degradation → clients enter a soft-fail state and continue playing previously-licensed content; new playback fails closed.
- Hot live event (e.g., the Tyson v Paul fight) → the OCA fleet must absorb the ramp; the Netflix engineering retro of that fight acknowledged player-side issues at peak, which is why live still gets called out as a separate engineering problem from on-demand.
Practical takeaways
- Custom CDN economics only flip past a certain scale. Don’t build Open Connect for a 10M-subscriber service; pay Akamai. Build one when the per-GB transit savings dwarf the hardware capex.
- Proactive caching beats reactive caching when the catalog is predictable. Netflix’s 98% hit rate is not a smarter LRU; it is a popularity model that pre-positions before anyone presses play.
- Encode for perceptual quality, not for a fixed bitrate. The progression from fixed ladder → per-title → shot-based is what unlocked 4K at sane bitrates.
- Pick your codec based on the device, not the encoder. AV1 is a clear win on capable hardware; H.264 is still required for the long tail.
- Separate control plane from data plane. They have different scaling, failure, and security shapes; treating them as one stack will eventually pin the wrong constraint.
- Personalization is a load-bearing system, not a UX nicety. It’s worth >$1B/yr in retention; design the infra accordingly.
Appendix
Prerequisites
- CDN architecture: edge caching, origin shield, peering vs embedded
- Video encoding: codecs, containers, bitrates, transcoding
- Streaming protocols: HLS, DASH, CMAF
- DRM: encryption, license servers, CDM integration
- Distributed systems fundamentals: microservices, caching, eventual consistency
Terminology
| Term | Definition |
|---|---|
| ABR | Adaptive Bitrate — dynamically selecting video quality at runtime |
| AOM / AV1 | Open, royalty-free video codec from the Alliance for Open Media |
| CDM | Content Decryption Module — platform component handling DRM |
| CMAF | Common Media Application Format — fMP4 packaging shared across HLS/DASH |
| DASH | Dynamic Adaptive Streaming over HTTP (ISO/IEC 23009-1) |
| HEVC | High Efficiency Video Coding (H.265), successor to H.264 |
| HLS | HTTP Live Streaming (Apple) |
| OCA | Open Connect Appliance — Netflix’s edge cache server |
| Per-title | Bitrate ladder customised per title from a convex-hull search |
| Shot-based | Variable bitrate allocation per scene / shot |
| VMAF | Video Multimethod Assessment Fusion — perceptual quality metric |
| VP9 | Google’s open codec, predecessor to AV1 |
Summary
- Netflix splits control plane (AWS) from data plane (Open Connect) to scale each independently.
- Open Connect places OCAs inside ISP networks via embedding or peering, achieving ~98% edge hit rate via proactive overnight fill.
- A single OCA scaled from 100 Gbps in 2017 to 400 Gbps in 2021, with NIC-offloaded TLS as the key unlock.
- Per-title (2015) and shot-based / Dynamic Optimizer (2018+) encoding cut bitrate at fixed VMAF; HDR is now wholly dynamically optimised.
- AV1 powers ~30% of viewing as of Dec 2025 with measurable VMAF and rebuffer wins; H.264 remains the long-tail fallback.
- Multi-DRM (Widevine / PlayReady / FairPlay) over CENC; license acquisition is parallelised with the first segment fetch.
- Personalization (recommendations, artwork) drives most viewing hours and is worth ~$1B/yr in retention.
References
- Netflix Open Connect — official CDN site
- Open Connect Appliance specs — current published OCA tiers
- Open Connect Network Configuration — ASN / BGP details
- Per-Title Encode Optimization — Netflix Tech Blog (Dec 2015)
- Dynamic Optimizer — perceptual video encoding — Netflix Tech Blog
- Optimized shot-based encodes for 4K: Now Streaming — Netflix Tech Blog (Aug 2020)
- All of Netflix’s HDR Streaming is Dynamically Optimized — Netflix Tech Blog (Jun 2023)
- AV1 — Now Powering 30% of Netflix Streaming — Netflix Tech Blog (Dec 2025)
- Bringing AV1 Streaming to Netflix Members’ TVs — Netflix Tech Blog
- Toward A Practical Perceptual Video Quality Metric (VMAF) — Netflix Tech Blog (Jun 2016)
- Serving 100 Gbps from an Open Connect Appliance — Netflix Tech Blog
- Serving Netflix Video at 400 Gb/s on FreeBSD — EuroBSDcon 2021
- Rebuilding Netflix Video Processing Pipeline with Microservices — Netflix Tech Blog (Feb 2024)
- The Netflix Cosmos Platform — Netflix Tech Blog (Mar 2021)
- Packaging award-winning shows with award-winning technology — CMAF at Netflix
- Netflix VMAF GitHub
- Netflix EVCache GitHub
- Building a Global Caching System at Netflix (EVCache) — InfoQ
- Artwork Personalization at Netflix — Netflix Tech Blog
- The Netflix Recommender System (Gomez-Uribe & Hunt, ACM TMIS 2016)
- HLS Specification (RFC 8216, Informational, 2017)
- HLS 2nd Edition draft (
draft-pantos-hls-rfc8216bis) — active IETF Internet-Draft (rev 21, March 2026) - DASH Specification (ISO/IEC 23009-1, 5th edition, 2022)
- CMAF Specification (ISO/IEC 23000-19)
- A Buffer-Based Approach to Rate Adaptation (Huang et al., SIGCOMM 2014)
- BOLA: Near-Optimal Bitrate Adaptation for Online Videos (Spiteri et al., INFOCOM 2016)
- A Control-Theoretic Approach for Dynamic Adaptive Video Streaming (MPC; Yin et al., SIGCOMM 2015)
- What We Watched — H2 2024 — Netflix Engagement Report
- 60 Million Households Tuned in for Tyson v Paul — Netflix newsroom
Footnotes
-
Netflix Open Connect Appliances (current as of 2025–2026). Earlier appliance lines (Standard Flash, Large Storage, Large Flash) were retired; the current spec sheet supersedes any older description. ↩
-
Netflix Open Connect — Network configuration. When an ISP has both peering with AS2906 and an embedded OCA in AS40027, the OCA wins because the embedded route has a shorter AS path. ↩
-
The Netflix Cosmos Platform and The Making of VES — the Cosmos microservice for Netflix video encoding. ↩
-
Optimized shot-based encodes: Now Streaming! — example used by Netflix to illustrate per-episode shot counts. ↩
-
Per-codec figures reported in Streaming Media’s coverage of Netflix’s 2019 dynamic-optimization work. ↩