Skip to main content

Playing Videos on the Web

System Design Fundamentals

Deep dive into the different ways to play videos on the web - Playing vidoes via file, using HTTP Range Requests and Adaptive Streaming like HLS and DASH. We also dive into supporting multiple languages (audio tracks).

Table of Contents

Introduction

We will start with the simplest way to play a video on the web - Directly serving the video file. And then trying to understand the problems with the approach and how can we improve it via HTTP Range Requests.

And finally we will look into Adaptive Streaming like HLS and DASH, which are the most popular techniques to play videos on the web today.

Directly serving the video file

  • The video is served as a single file
  • Modern browsers can start playing as soon as the first chunk is received
    • You technically don’t need to wait for the entire file to download before playing
  • Modern browsers would not download the entire file or keep downloading if the user is not playing.

Problems

  • Seekability: Users CAN NOT seek
  • **No Resumability: **Uses has to always download the file from the beginning
  • Network Interruption will lead to the user having to start over
  • Not possible to change the quality of the video
  • Caching is not optimized, since the entire file is cached.
    • Most of the videos are not watched from start to end
    • But the entire file needs to be cached
    • Not possible to cache only sub-parts of the file
  • Higher Network Bandwidth cost, since everyone downloads from start to end

Checkout the Poc for working samples

import fs from 'node:fs';
import http from 'node:http';
import path from 'node:path';
const PORT = 3000;
const VIDEO_PATH = path.join(__dirname, 'video.mp4');
const server = http.createServer((req, res) => {
// Serve video without range support
const videoStream = fs.createReadStream(VIDEO_PATH);
res.writeHead(200, { 'Content-Type': 'video/mp4' });
videoStream.on('error', () => {
res.writeHead(500);
res.end('Error loading video');
});
videoStream.pipe(res);
});
server.listen(PORT, () => {
console.log(`Server running at http://localhost:${PORT}`);
});
<video controls>
<source src="/path/to/video.mp4" type="video/mp4">
Your browser does not support the video tag.
</video>
connection: keep-alive
content-type: video/mp4
date: Thu, 20 Feb 2025 10:28:02 GMT
keep-alive: timeout=5
transfer-encoding: chunked

Using HTTP Range Requests to fix Seekability & Resumability

  • For the first request, the server sends the entire file contents.
    • 'Content-Length': fileSize
  • For subsequent requests, the server sends a partial file contents.
    • 'Content-Range': bytes ${start}-${end}/${fileSize},
    • 'Content-Length': chunksize,
  • Allows the user to seek to a specific position in the video.
  • Resumability: If the connection is interrupted, downloading can resume from the last received byte.

This approach can be used for below use cases

  • Small to medium size videos
  • Internal applications with predictable network conditions
  • Video Previews of Freshly Uploaded Videos (For content authors)
  • Security camera footage or other real time video, where the user base is limited

Problems

  • Fixed Bitrate: Video resolution / bitrate is fixed & Users CAN NOT change the quality of the video
  • Serving a wide audience with different network speeds and devices is challenging
  • Caching is not optimized, since the entire file is cached.
    • Most of the videos are not watched from start to end
    • But the entire file needs to be cached
    • Not possible to cache only sub-parts of the file
import fs from 'node:fs';
import http from 'node:http';
import path from 'node:path';
const PORT = 3000;
const VIDEO_PATH = path.join(__dirname, 'video.mp4');
const server = http.createServer((req, res) => {
// Get video stats (about 61MB)
const stat = fs.statSync(VIDEO_PATH);
const fileSize = stat.size;
const range = req.headers.range;
if (range) {
// Parse range
// Example: "bytes=32324-"
const parts = range.replace(/bytes=/, '').split('-');
const start = parseInt(parts[0], 10);
const end = parts[1] ? parseInt(parts[1], 10) : fileSize - 1;
const chunksize = (end - start) + 1;
// Create read stream for this chunk
const stream = fs.createReadStream(VIDEO_PATH, { start, end });
// Send partial content response
res.writeHead(206, {
'Content-Range': `bytes ${start}-${end}/${fileSize}`,
'Accept-Ranges': 'bytes',
'Content-Length': chunksize,
'Content-Type': 'video/mp4'
});
// Pipe the video chunk to response
stream.pipe(res);
} else {
// No range requested, send entire file
res.writeHead(200, {
'Content-Length': fileSize,
'Content-Type': 'video/mp4',
'Accept-Ranges': 'bytes'
});
fs.createReadStream(VIDEO_PATH).pipe(res);
}
});
server.listen(PORT, () => {
console.log(`Server running at http://localhost:${PORT}`);
});
<video controls>
<source src="/path/to/video.mp4" type="video/mp4">
Your browser does not support the video tag.
</video>
# Initial Response
HTTP/1.1 206 Partial Content
Accept-Ranges: bytes
Content-Type: video/mp4
Date: Thu, 20 Feb 2025 10:29:16 GMT
Connection: keep-alive
Keep-Alive: timeout=5
Content-Range: bytes 0-158008373/158008374
Content-Length: 158008374
# Request 2
range: bytes=107380736-
# Response 2
HTTP/1.1 206 Partial Content
Accept-Ranges: bytes
Content-Type: video/mp4
Date: Thu, 20 Feb 2025 10:32:19 GMT
Connection: keep-alive
Keep-Alive: timeout=5
Content-Range: bytes 107380736-158008373/158008374
Content-Length: 50627638

Why not use HTTP Range Requests with multiple video files?

  • In order to serve different qualities of the same video, we need to have multiple versions of the same video.
  • Each version of the video has a different resolution and bitrate.
  • Since we have multiple version of the same video, we need some sort of playlist file, that lists all the versions of the video.
    • This playlist file will have the information about the video file name, video file size, video file resolution and bitrate.
  • The video player will select the best quality video based on the user’s network conditions and device capabilities.
  • When the network condition change, the video player will switch to a different quality video.

The problem with this approach is that, how to we identify which byte range to request when the network speed changes.

Adaptive Streaming

Adaptive streaming is a technique that allows a video player to adjust the video quality based on the user’s network conditions and device capabilities.

  • From a single video, we create multiple versions of the same video, with different resolutions and bitrates.
  • Each version of the video is split into multiple segments (generally 2s-10s)
  • We create a playlist file, that lists all the segments of the video.
  • The video player will select the best quality video based on the user’s network conditions and device capabilities.
  • When the network condition change, the video player will switch to a different quality video.

Here the key is the playlist file, that lists all the segments of the video. For each segment, we have the following information:

  • Start time of the segment
  • End time of the segment
  • Resolution of the segment
  • Bitrate of the segment
  • URL of the segment

m3u8 file samples

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-STREAM-INF:BANDWIDTH=2200291,AVERAGE-BANDWIDTH=995604,RESOLUTION=640x360,CODECS="avc1.640016,mp4a.40.2"
v0/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=3366885,AVERAGE-BANDWIDTH=1598631,RESOLUTION=854x480,CODECS="avc1.64001e,mp4a.40.2"
v1/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=5901087,AVERAGE-BANDWIDTH=3070517,RESOLUTION=1280x720,CODECS="avc1.64001f,mp4a.40.2"
v2/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=10987497,AVERAGE-BANDWIDTH=5306897,RESOLUTION=1920x1080,CODECS="avc1.640028,mp4a.40.2"
v3/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=16867619,AVERAGE-BANDWIDTH=8399048,RESOLUTION=1920x1080,CODECS="avc1.640028,mp4a.40.2"
v4/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=33220766,AVERAGE-BANDWIDTH=16615212,RESOLUTION=3840x2160,CODECS="avc1.640033,mp4a.40.2"
v5/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=41078907,AVERAGE-BANDWIDTH=20719432,RESOLUTION=3840x2160,CODECS="avc1.640033,mp4a.40.2"
v6/playlist.m3u8
#EXTM3U
#EXT-X-VERSION:3
#EXT-X-TARGETDURATION:15
#EXT-X-MEDIA-SEQUENCE:0
#EXTINF:12.000000,
segment0.ts
#EXTINF:3.850000,
segment1.ts
#EXTINF:7.300000,
segment2.ts
#EXTINF:12.500000,
segment3.ts
#EXTINF:12.200000,
segment4.ts
#EXTINF:8.350000,
segment5.ts
#EXTINF:4.600000,
segment6.ts
#EXTINF:3.100000,
segment7.ts
#EXTINF:5.650000,
segment8.ts
#EXTINF:1.850000,
segment9.ts
.... More segments
#EXT-X-ENDLIST

Video Processing

Video transcoding is the process of converting a video file from one format, codec, or resolution to another. This is often done to ensure compatibility with different devices, reduce file size, or optimize video playback quality.

Transcoding vs. Encoding vs. Transmuxing

  • Encoding: Compresses raw video into a digital format.
  • Transcoding: Converts an already encoded video into a different format or settings.
  • Transmuxing: Changes the container format (e.g., MP4 to HLS) without altering the video codec.

Using ffmpeg to generate the playlist file

Terminal window
# Create output directory for HLS files
mkdir -p video/hls
# HLS Transcoding for multiple resolutions and fps
ffmpeg -i ./video/big-buck-bunny.mp4 \
-filter_complex \
"[0:v]split=7[v1][v2][v3][v4][v5][v6][v7]; \
[v1]scale=640:360[v1out]; \
[v2]scale=854:480[v2out]; \
[v3]scale=1280:720[v3out]; \
[v4]scale=1920:1080[v4out]; \
[v5]scale=1920:1080[v5out]; \
[v6]scale=3840:2160[v6out]; \
[v7]scale=3840:2160[v7out]" \
-map "[v1out]" -c:v:0 h264 -r 30 -b:v:0 800k \
-map "[v2out]" -c:v:1 h264 -r 30 -b:v:1 1400k \
-map "[v3out]" -c:v:2 h264 -r 30 -b:v:2 2800k \
-map "[v4out]" -c:v:3 h264 -r 30 -b:v:3 5000k \
-map "[v5out]" -c:v:4 h264 -r 30 -b:v:4 8000k \
-map "[v6out]" -c:v:5 h264 -r 15 -b:v:5 16000k \
-map "[v7out]" -c:v:6 h264 -r 30 -b:v:6 20000k \
-map a:0 -map a:0 -map a:0 -map a:0 -map a:0 -map a:0 -map a:0 \
-c:a aac -b:a 128k \
-var_stream_map "v:0,a:0 v:1,a:1 v:2,a:2 v:3,a:3 v:4,a:4 v:5,a:5 v:6,a:6" \
-master_pl_name master.m3u8 \
-f hls \
-hls_time 6 \
-hls_list_size 0 \
-hls_segment_filename "video/hls/v%v/segment%d.ts" \
video/hls/v%v/playlist.m3u8

Initial Setup

Terminal window
# Create output directory for HLS files
mkdir -p video/hls

This creates a directory structure for the HLS output files.

Input and Filter Complex Setup (Transcoding):

Terminal window
ffmpeg -i ./video/big-buck-bunny.mp4 \
  • Takes the input video and

Resolution Scaling (Transcoding):

Terminal window
-filter_complex \
"[0:v]split=7[v1][v2][v3][v4][v5][v6][v7]; \
[v1]scale=640:360[v1out]; \ # 360p
[v2]scale=854:480[v2out]; \ # 480p
[v3]scale=1280:720[v3out]; \ # 720p
[v4]scale=1920:1080[v4out]; \ # 1080p
[v5]scale=1920:1080[v5out]; \ # 1080p (different bitrate)
[v6]scale=3840:2160[v6out]; \ # 4K
[v7]scale=3840:2160[v7out]" \ # 4K (different bitrate)
  • splits it into 7 identical streams for different quality versions
  • [0:v] refers to the first video stream from input
  • Scales each stream to different resolutions
  • Creates variants for different quality levels

Video Stream Configuration (Encoding):

Configures each video stream with:

  • H.264 codec
  • Frame rate (15-30 fps)
  • Bitrate (800Kbps to 20Mbps)
Terminal window
-map "[v1out]" -c:v:0 h264 -r 30 -b:v:0 800k \ # 360p @ 800Kbps
-map "[v2out]" -c:v:1 h264 -r 30 -b:v:1 1400k \ # 480p @ 1.4Mbps
-map "[v3out]" -c:v:2 h264 -r 30 -b:v:2 2800k \ # 720p @ 2.8Mbps
-map "[v4out]" -c:v:3 h264 -r 30 -b:v:3 5000k \ # 1080p @ 5Mbps
-map "[v5out]" -c:v:4 h264 -r 30 -b:v:4 8000k \ # 1080p @ 8Mbps
-map "[v6out]" -c:v:5 h264 -r 15 -b:v:5 16000k \ # 4K @ 16Mbps, 15fps
-map "[v7out]" -c:v:6 h264 -r 30 -b:v:6 20000k \ # 4K @ 20Mbps, 30fps
  • -map assigns the scaled output to a video stream
  • -c:v sets H.264 codec (c:v c for codec, v for video)
  • -r sets frame rate
  • -b:v sets bitrate (b:v b for bitrate, v for video)

Audio Configuration (Encoding):

Terminal window
-map a:0 -map a:0 -map a:0 -map a:0 -map a:0 -map a:0 -map a:0 \
-c:a aac -b:a 128k \
-var_stream_map "v:0,a:0 v:1,a:1 v:2,a:2 v:3,a:3 v:4,a:4 v:5,a:5 v:6,a:6" \
  • -var_stream_map maps video and audio streams together

  • Uses AAC codec at 128kbps (Code by -c, bitrate by -b), :a refers to audio stream

  • Maps video and audio streams together using -var_stream_map

HLS Specific Configuration (Transmuxing):

Terminal window
-master_pl_name master.m3u8 \
-f hls \
-hls_time 6 \
-hls_list_size 0 \
-hls_segment_filename "video/hls/v%v/segment%d.ts" \
video/hls/v%v/playlist.m3u8
  • -master_pl_name sets the name of the master playlist

  • -f hls specifies HLS format

  • -hls_time sets the segment duration

  • -hls_list_size sets the number of playlist files to keep

  • -hls_segment_filename sets the filename pattern for segments

  • -master_pl_name sets the name of the master playlist

  • Configures output paths for segments and playlists

    • video/hls/v%v/segment%d.ts - Segment filename pattern
    • video/hls/v%v/playlist.m3u8 - Playlist filename pattern

Adding Multiple Audio Tracks (Languages)

In order to support multiple languages, we need to add multiple audio tracks to the video.

Terminal window
-map a:1 -map a:1 -map a:1 -map a:1 -map a:1 -map a:1 -map a:1 \ # Additonal audio streams
-var_stream_map "v:0,a:0,a:7 v:1,a:1,a:8 v:2,a:2,a:9 v:3,a:3,a:10 v:4,a:4,a:11 v:5,a:5,a:12 v:6,a:6,a:13" \
Terminal window
# Create output directory for HLS files
mkdir -p video/hls
# HLS Transcoding for multiple resolutions and fps
ffmpeg -i ./video/big-buck-bunny.mp4 \
-filter_complex \
"[0:v]split=7[v1][v2][v3][v4][v5][v6][v7]; \
[v1]scale=640:360[v1out]; \
[v2]scale=854:480[v2out]; \
[v3]scale=1280:720[v3out]; \
[v4]scale=1920:1080[v4out]; \
[v5]scale=1920:1080[v5out]; \
[v6]scale=3840:2160[v6out]; \
[v7]scale=3840:2160[v7out]" \
-map "[v1out]" -c:v:0 h264 -r 30 -b:v:0 800k \
-map "[v2out]" -c:v:1 h264 -r 30 -b:v:1 1400k \
-map "[v3out]" -c:v:2 h264 -r 30 -b:v:2 2800k \
-map "[v4out]" -c:v:3 h264 -r 30 -b:v:3 5000k \
-map "[v5out]" -c:v:4 h264 -r 30 -b:v:4 8000k \
-map "[v6out]" -c:v:5 h264 -r 15 -b:v:5 16000k \
-map "[v7out]" -c:v:6 h264 -r 30 -b:v:6 20000k \
-map a:0 -map a:0 -map a:0 -map a:0 -map a:0 -map a:0 -map a:0 \
-map a:1 -map a:1 -map a:1 -map a:1 -map a:1 -map a:1 -map a:1 \ # Additonal audio streams
-c:a aac -b:a 128k \
-var_stream_map "v:0,a:0,a:7 v:1,a:1,a:8 v:2,a:2,a:9 v:3,a:3,a:10 v:4,a:4,a:11 v:5,a:5,a:12 v:6,a:6,a:13" \
-master_pl_name master.m3u8 \
-f hls \
-hls_time 6 \
-hls_list_size 0 \
-hls_segment_filename "video/hls/v%v/segment%d.ts" \
video/hls/v%v/playlist.m3u8

Tags