Design Instagram: Photo Sharing at Scale — Sujeet Jaiswal

Instagram crossed 3 billion monthly active users in September 2025, serving photo and short-video traffic at a scale where every architectural choice — fan-out shape, image variant count, TTL on ephemeral media, ranking model topology — is dictated by power-law follower distributions, mobile network constraints, and ML inference budgets measured in milliseconds. This design walks through the image upload pipeline, the hybrid fan-out that bounds write amplification, the Stories TTL architecture, the MQTT-based real-time messaging path, and the multi-stage Explore recommender, citing the Instagram and Meta engineering posts that document each subsystem.

Note

Meta stopped disclosing per-app daily active user (DAU) numbers in April 2024, so any DAU figure in this article is an industry estimate, not an official metric. Treat the order-of-magnitude as load-shaping context, not a published number.

High-level architecture: upload → process → store → deliver. Feed generation uses hybrid fan-out; Stories have separate TTL-aware caching. Discovery systems run 1000+ ML models for personalization.

Abstract

Instagram’s architecture is structured around three load-shaping problems that any photo-sharing platform faces once it crosses a few hundred million users:

Write amplification vs. read latency. A post from a celebrity with tens of millions of followers would, under naive write-time fan-out, force tens of millions of timeline cache updates. A hybrid fan-out — push for “normal” accounts, pull-merge for high-fan-out accounts — bounds the per-post write cost while keeping cached reads in the low-millisecond range.
Ephemeral vs. persistent content. Stories (24-hour TTL) and feed posts (permanent) have different read patterns and different lifetime budgets. Stories ride aggressive client prefetch and TTL-synced caching; posts ride tiered object storage with CDN caching.
Cold start vs. engagement optimization. New sessions need value immediately; returning sessions need personalization. Meta’s Instagram Explore recommender uses Two Towers retrieval to narrow billions of candidates to thousands, then a multi-task neural ranker to produce the final ordering, and now spans over 1,000 production ML models.

Core mechanisms covered below:

Hybrid fan-out — push to follower timeline caches for the long tail of accounts; pull-merge for high-fan-out accounts at read time.
Multi-resolution image pipeline — variants generated for the 320–1080 px supported width range, with filtering offloaded to GPU on-device.
Stories architecture — 24-hour TTL with aggressive prefetch targeting sub-200 ms perceived load.
Feed ranking — multi-task neural networks fine-tuned continually on engagement events.
MQTT for real-time — pioneered for Facebook Messenger in 2011 and now powers Instagram DMs, notifications, and presence over a 2-byte minimum-overhead binary protocol.
Privacy + safety pipeline at upload time — EXIF metadata sanitization (drop GPS, serial numbers, raw timestamps before any public CDN URL exists) and perceptual-hash-based CSAM/violence/terror screening against industry banks (Microsoft PhotoDNA, Meta’s open-sourced PDQ + TMK+PDQF hashes, the GIFCT hash database).

Requirements

Functional Requirements

Requirement	Priority	Notes
Photo/video upload	Core	Multiple resolutions, filters, up to 10 items per post
Feed (home timeline)	Core	Ranked, personalized, infinite scroll
Stories	Core	24-hour ephemeral, ring UI, reply capability
Follow/unfollow	Core	Social graph management
Likes and comments	Core	Real-time counts, threaded comments
Direct messages	Core	Real-time chat, media sharing
Explore page	Core	Content discovery, personalized recommendations
Search	Core	Users, hashtags, locations
Notifications	Core	Likes, comments, follows, DM alerts
Reels	Extended	Short-form video (separate video pipeline)
Shopping	Out of scope	E-commerce integration
Ads	Out of scope	Separate ad-tech stack

Non-Functional Requirements

Requirement	Target	Rationale
Upload availability	99.9%	Brief maintenance acceptable
Feed availability	99.99%	Core engagement driver
Feed load latency	p99 < 500ms	User experience threshold
Stories load latency	p99 < 200ms	Tap-and-swipe UX requires instant response
Image processing time	< 10s	User waits for upload confirmation
DM delivery latency	p99 < 500ms	Real-time conversation expectation
Notification delivery	< 2s	Engagement driver
Feed freshness	< 30s for non-celebrities	Balance freshness vs. ranking quality

Scale Estimation

These numbers blend Meta’s published headline figures (MAU, ML model count) with order-of-magnitude estimates that are common in system-design write-ups but not officially disclosed (DAU, uploads/day, average image size). Treat them as load-shaping context for capacity decisions, not as a published specification.

1Monthly active users:   3,000M  (Meta, Sept 2025 announcement)2Daily active users:     ~500M   (industry estimate; Meta no longer discloses)3Photos uploaded daily:  ~100M   (estimate; older "95M+" figures from circa 2018)45Upload traffic (estimate)6  100M uploads/day        ≈ 1,150 uploads/second average7  Peak (3x)               ≈ 3,500 uploads/second8  Avg compressed payload  ≈ 2 MB per image9  Daily ingestion         ≈ 200 TB/day at the wire1011Per-image storage (estimate)12  Original                  2 MB13  Resolution variants       4 widths (1080 / 640 / 320 / 150 px)14  Aspect variants           ×2 (square + original aspect)15  Total variants            ≈ 8 derived files16  Total stored per image    ≈ 5 MB across variants17  Daily storage growth      ≈ 500 TB/day1819Feed reads (estimate)20  500M DAU × 20 sessions    ≈ 10B feed reads/day21  Average rate              ≈ 115K reads/second22  Peak                      ≈ 350K+ reads/second2324Social graph (estimate)25  Average followers/user    ~150 (skewed by power law)26  Accounts with > 1M followers ~50,00027  Graph edges               ~10^11

CDN efficiency. Photo and video traffic follows a strong power-law: a small fraction of media accounts for most reads. Assuming a 95% edge cache hit rate (which is what an Instagram-shape CDN should be tuned for), the origin only sees the cold tail:

1Reads to origin    ≈ 5% × 350K rps peak ≈ 17K rps2Hot content        served almost entirely from CDN edge3Cold tail          dominates origin egress and storage I/O

Design Paths

Path A: Push-First Fan-out

Best when:

Smaller scale (<100M users)
Read latency is critical
Most users have similar follower counts (no extreme outliers)

Architecture:

On post creation, push the post ID to every follower’s timeline cache. Reads are O(1) cache lookups.

Trade-offs:

✅ Extremely fast reads (pre-computed timelines)
✅ Simple read path (single cache lookup)
❌ Massive write amplification for popular accounts
❌ Wasted writes for inactive followers
❌ Storage explosion (N copies per post where N = followers)

Real-world example: Early Twitter (~2010–2012) is the canonical illustration of push-only running into a celebrity wall: a single high-fan-out tweet had to be written into millions of follower timelines, which is why Twitter moved to a hybrid model later in that period.

Path B: Pull-Only Fan-out

Best when:

Read latency tolerance is higher
Storage cost is primary concern
Feed freshness can be slightly stale

Architecture:

On feed request, query the social graph for followed users, fetch their recent posts, merge and rank.

Trade-offs:

✅ No write amplification
✅ Minimal storage (posts stored once)
✅ Always fresh (computed at read time)
❌ High read latency (multiple DB queries)
❌ Expensive computation per request
❌ Difficult to rank effectively (limited time for ML)

Real-world example: Early News Feed implementations on social platforms (pre-2010) used pull-only computation and abandoned it as following counts grew, since per-request graph traversal blew the read latency budget.

Path C: Hybrid Fan-out (Instagram Model)

Best when:

Massive scale with power-law follower distribution
Sub-second read latency required
High-fan-out accounts exist (>1M followers)

Architecture:

Long-tail accounts (low follower count, commonly modeled at < 5–10K followers in public write-ups): push the new post id into each follower’s timeline cache on write.
High-fan-out accounts (above the threshold): keep posts in a per-author store and merge them in at read time.
Inactive followers: skip fan-out for them and compute on demand if they return.

Note

Instagram and Twitter both use hybrid fan-out, but neither has officially published its production threshold. Raffi Krikorian’s Timelines at Scale (QCon 2013) is the canonical engineering talk on Twitter’s home-timeline architecture — push-on-write into Redis-backed materialised timelines for the long tail, pull-and-merge for high-fan-out accounts at read time — and the 5–10K follower band shows up consistently in community write-ups of the architecture. Treat the threshold as a tunable knob, not a magic number: the right value is the point where the marginal cost of a fan-out write exceeds the marginal cost of a read-time merge.

Trade-offs:

✅ Bounded write amplification (capped at the threshold)
✅ Fast reads for most users (cache hit + small merge)
✅ Handles celebrity scale without storage explosion
❌ Two code paths to maintain
❌ Merge logic adds complexity
❌ Posts from above-threshold accounts pay a small read-time merge cost

Path Comparison

Factor	Push-First	Pull-Only	Hybrid
Read latency	O(1)	O(following × posts)	O(1) + O(celebrities)
Write amplification	O(followers)	O(1)	O(min(followers, threshold))
Storage per post	O(followers)	O(1)	O(min(followers, threshold))
Code complexity	Low	Low	Medium
Freshness	Immediate	Immediate	Immediate (regular), slight delay (celebrity merge)
Best scale	<100M users	<10M users	Billions

This Article’s Focus

The rest of this article assumes Path C (Hybrid Fan-out) as the production design, because:

At Instagram’s scale (3B MAU as of Sept 2025) the per-post write cost has to be bounded.
The top of the follower distribution (Cristiano Ronaldo’s account is in the 650M-follower range) makes pure push impractical — a single post would dominate the cluster’s write capacity.
The hybrid model is the design Instagram and Twitter both publicly describe, even if exact thresholds are not.

High-Level Design

Component Overview

Service architecture (write & media path): API Gateway routes uploads and posts; image processor and fan-out workers populate timeline caches and CDN.

Service architecture (read, discovery & realtime): feed, search, explore, DM, and notification services backed by ranking, caches, and MQTT push.

Service Responsibilities

Service	Responsibility	Data Store	Key Operations
Upload Service	Media ingestion, validation, processing	S3, PostgreSQL	Resize, filter, generate variants
Post Service	Post CRUD, metadata management	PostgreSQL	Create, update, delete, soft-delete
Feed Service	Timeline generation, ranking	Redis, PostgreSQL	Fan-out, merge, rank
Stories Service	Ephemeral content management	Redis (TTL), S3	Create, expire, ring ordering
Social Service	Follow graph management	Cassandra	Follow, unfollow, follower lists
Search Service	User/hashtag/location search	Elasticsearch	Index, query, autocomplete
Explore Service	Content discovery, recommendations	ML platform	Candidate retrieval, ranking
DM Service	Real-time messaging	Cassandra, Redis	Send, receive, sync
Notification Service	Push and in-app notifications	PostgreSQL, Redis	Queue, dedupe, deliver

Image Upload Service

Upload Flow

Two-phase upload: media upload returns immediately with media_id; post creation triggers fan-out asynchronously.

Image Processing Pipeline

Input validation mirrors what Instagram documents in its image resolution help center page:

Supported feed widths land in the 320–1080 px band; sub-320 px uploads get upscaled to 320 px and >1080 px uploads get downscaled to 1080 px.
Supported source formats: JPEG, PNG, HEIC (HEIC is normalized to JPEG/HEIF for delivery).
Maximum upload size is generous on the wire (tens of MB), but client-side compression typically lands the post in the 2–5 MB range before the server sees it.

Metadata sanitization (EXIF stripping). Camera-generated JPEG/HEIC files carry EXIF blocks that frequently include precise GPS coordinates, device serial numbers, and capture timestamps. Any photo-sharing platform must strip privacy-sensitive EXIF tags before serving variants from the CDN — public-share variants typically retain only orientation, color profile, and a sanitized capture timestamp; GPS, serial numbers, and thumbnail-embedded EXIF are dropped. The original-resolution copy may keep the full EXIF in cold storage for the uploader’s own archive view, but it is never served to followers. Internally this is a small re-encode step that runs alongside variant generation; it is cheap relative to the resize ladder but mandatory for compliance.

Safety scanning (hash matching at the edge of the pipeline). Before a media id becomes addressable from a public CDN URL, every upload runs through perceptual-hash matching against the industry CSAM hash banks (Microsoft’s PhotoDNA, Meta’s PDQ image hash and TMK+PDQF video hash — both open-sourced in 2019 — and the NCMEC hash list) plus internal classifiers for nudity, violence, and policy-violating content. Hash matches block the post and trigger a NCMEC report; classifier hits route into human review queues. The hashing step adds milliseconds at the most (PDQ is a 256-bit hash with sub-millisecond compute), so the pipeline can run it inline with variant generation rather than as a deferred audit. Meta also runs perceptual-hash matching against terror content via the GIFCT shared hash database.

Resolution variants generated. The exact ladder is implementation-specific; the shape below is representative of what a power-law CDN footprint requires:

Variant	Dimensions	Use Case
Original	Up to 1080px	Full-screen view
Large	1080 × 1080	Feed (high-DPI devices)
Medium	640 × 640	Feed (standard devices)
Small	320 × 320	Grid view, thumbnails
Thumbnail	150 × 150	Notifications, search results

Filter processing. On modern devices the heavy lifting is done on-device via GPU shaders (Metal on iOS, OpenGL ES / Vulkan on Android), so the server pipeline only has to deal with already-baked pixels for filtered uploads. Server-side filtering is a fallback path (for example, web uploads). Conceptually a filter is:

1output = Blend( Adjust( LUT(input) ) )23  LUT        3D color look-up table (per-filter asset)4  Adjust     brightness / contrast / saturation / warmth5  Blend      vignette, frame, grain overlay

Processing time budget:

Operation	Target	Notes
Upload to storage	< 2s	Depends on connection
Generate variants	< 3s	Parallel processing
Filter application	< 1s	GPU-accelerated
CDN propagation	< 5s	Edge cache warming
Total	< 10s	User-perceived upload time

Storage Strategy

Object storage layout:

1s3://instagram-media/2  /{user_id}/3    /{media_id}/4      original.jpg          # Raw upload5      1080.jpg             # Full resolution6      640.jpg              # Medium7      320.jpg              # Small8      150.jpg              # Thumbnail9      metadata.json        # EXIF, dimensions, filter applied

CDN caching rules:

Content Type	Cache Duration	Cache Key
Original	1 year	`{media_id}/original`
Variants	1 year	`{media_id}/{size}`
Profile pictures	1 hour	`{user_id}/profile`
Stories media	24 hours	`{story_id}/media`

Storage tiering (power-law optimization): photo workloads at this scale follow a textbook hot/warm/cold split, and Meta has published the two papers that shape the canonical design — Haystack for hot blobs (OSDI 2010) and f4 for warm blobs (OSDI 2014). The conceptual ladder used by any photo-sharing platform of this shape is:

Tier	Inspired by	Storage shape	Replication	Access pattern
Hot	Haystack	Append-only volume files; in-memory needle index; one disk seek per read	3x replicated	Recent uploads, top of the long-tail
Warm	f4	Reed-Solomon (10,4) erasure coding inside a cell; XOR coding across regions	Effective replication ~2.1x (vs. 3.6x in Haystack)	Older content, access rate has decayed
Cold	Archive	Deep archive on cheap media	Geo-redundant	Rarely accessed, kept for durability + recall

1Hot tier (SSD-backed Haystack-style volumes): Last 7 days of uploads, frequently accessed2Warm tier (f4-style erasure-coded cells):     7 days - 1 year, moderate access3Cold tier (deep archive):                     > 1 year, rare access45Migration policy:6- Content accessed > 10x/day stays hot7- Content accessed < 1x/week moves to warm8- Content not accessed in 90 days moves to cold

Note

Haystack’s central optimization is keeping the (volume, offset, size) index for every needle in main memory, so a hot read is at most one disk seek. f4 trades a small read-amplification penalty for a large storage win — its production Reed-Solomon (10,4) configuration tolerates four simultaneous failures inside a cell while cutting effective replication from 3.6x to 2.1x. The hot-vs-warm decision is a function of access rate and age, not file type.

Photo storage tiers: Haystack-style hot tier, f4-style warm tier with Reed-Solomon (10,4), cold archive with XOR geo-replication.

Feed Generation Service

Hybrid Fan-out Implementation

Hybrid fan-out: regular users trigger push to follower caches; celebrity posts stored separately and merged at read time.

Timeline Cache Structure

Redis data model:

1# Timeline cache (sorted set by timestamp)2ZADD timeline:{user_id} {timestamp} {post_id}34# Keep last 800 posts per timeline5ZREMRANGEBYRANK timeline:{user_id} 0 -80167# Post metadata cache (hash)8HSET post:{post_id}9  author_id "{user_id}"10  media_url "{cdn_url}"11  caption "{text}"12  like_count {count}13  created_at {timestamp}1415# Celebrity posts (separate sorted set per celebrity)16ZADD celebrity:{user_id}:posts {timestamp} {post_id}

Timeline composition at read:

1def get_feed(user_id, cursor=None, limit=20):2    # 1. Get cached timeline posts3    cached_posts = redis.zrevrange(4        f"timeline:{user_id}",5        start=cursor or 0,6        end=(cursor or 0) + limit * 2  # Fetch extra for ranking7    )89    # 2. Get followed celebrities10    celebrities = get_followed_celebrities(user_id)1112    # 3. Fetch recent celebrity posts (last 24h)13    celebrity_posts = []14    for celeb_id in celebrities:15        posts = redis.zrevrangebyscore(16            f"celebrity:{celeb_id}:posts",17            max=now(),18            min=now() - 86400,  # 24 hours19            limit=520        )21        celebrity_posts.extend(posts)2223    # 4. Merge and rank24    all_posts = cached_posts + celebrity_posts25    ranked_posts = ranking_service.rank(user_id, all_posts)2627    return ranked_posts[:limit]

Feed Ranking

Instagram’s ranking system uses deep neural networks fed by tens-to-hundreds of thousands of dense and sparse features (the Explore engineering post describes the same shape for the discovery surface).

Signal categories:

Category	Signals	Weight (approx)
Relationship	DM history, profile visits, comments, tags	High
Interest	Content type engagement, hashtag affinity	High
Timeliness	Post age, time since last seen	Medium
Popularity	Like velocity, comment rate, share count	Medium
Creator	Posting frequency, content quality score	Low

Ranking model architecture:

1Input: User embeddings + Post embeddings + Context features2  ↓3Feature extraction (10K–100K+ dense + sparse features)4  ↓5Multi-task neural network6  ↓7Outputs:8  - P(like)9  - P(comment)10  - P(save)11  - P(share)12  - P(time_spent > 10s)13  ↓14Weighted combination → Final score

Model training:

Trained on billions of engagement events
Fine-tuned hourly with recent interactions
A/B tested continuously (10+ experiments running at any time)

Consistency and Pagination

Consistency model:

Operation	Consistency	Rationale
Own post visibility	Strong (immediate)	User expects to see own post
Follower timeline update	Eventual (< 30s)	Acceptable delay for feed freshness
Like/comment counts	Eventual (< 5s)	Tolerable for social proof
Unfollow propagation	Strong (immediate)	Privacy expectation

Cursor-based pagination:

1// Request2GET /feed?cursor=eyJ0cyI6MTY0...&limit=2034// Response5{6  "posts": [...],7  "next_cursor": "eyJ0cyI6MTY0...",8  "has_more": true9}1011// Cursor structure (base64-encoded)12{13  "ts": 1640000000,  // Timestamp of last item14  "pid": "abc123",   // Post ID (for tie-breaking)15  "v": 2             // Cursor version (for migrations)16}

Why cursor-based (not offset-based):

Timeline changes between requests (new posts arrive)
Offset pagination causes duplicates or missed posts
Cursor is stable: “posts older than X” always returns consistent results

Stories Service

Architecture

Stories have fundamentally different requirements than feed posts:

Property	Posts	Stories
Lifetime	Permanent	24 hours
Load time target	< 500ms	< 200ms
Caching strategy	CDN + Redis	Aggressive prefetch
Ranking	Complex ML	Recency + engagement

Stories architecture: aggressive client-side prefetch with TTL-synced caching. Ring ordering determines story tray sequence.

Story Ring Ordering

The “Stories ring” (horizontal tray at top) orders accounts by engagement signals:

Ordering factors:

Accounts with unseen stories (always first)
DM interaction frequency
Profile visit frequency
Comment/like history
Story view history (accounts you consistently view)

Data model:

1# Story metadata (expires with TTL)2SETEX story:{story_id} 86400 '{3  "author_id": "123",4  "media_url": "https://...",5  "created_at": 1640000000,6  "viewers": [],7  "reply_enabled": true8}'910# User's active stories (sorted set, auto-cleanup)11ZADD user:{user_id}:stories {created_at} {story_id}12ZREMRANGEBYSCORE user:{user_id}:stories -inf {now - 86400}1314# Story ring ordering per viewer15ZADD user:{viewer_id}:story_ring {engagement_score} {author_id}

Prefetch Strategy

Client-side behavior:

1On app open:21. Fetch story ring ordering (lightweight API call)32. Prefetch first 3 story authors' media (background)43. As user views stories, prefetch next 2 authors ahead56On story view:71. Preload all segments of current story82. Preload first segment of next story93. Mark current story as viewed (async)

Why aggressive prefetch:

Stories UX is tap-tap-tap: any loading spinner breaks flow
Media is small (compressed images/short videos)
Users view multiple stories in sequence: sequential access pattern

TTL and Expiration

Server-side:

Redis keys set with 24-hour TTL
Background job cleans up S3 media at TTL+1 hour (grace period for in-flight views)

Client-side:

Local cache respiration synced with server TTL
Client computes ttl_remaining = story.created_at + 86400 - now()
Evict from local cache when TTL expires

Direct Messages Service

Architecture

Instagram DMs handle real-time messaging with E2E encryption support.

DM flow: mutation manager ensures durability before ACK. MQTT delivers real-time push. Client shows optimistic UI immediately.

MQTT for Real-time

Meta has used MQTT for mobile messaging since Lucy Zhang’s 2011 “Building Facebook Messenger” post — the original argument was that a persistent, lightweight pub/sub session beats HTTP polling on both latency and battery on mobile networks. Instagram DMs ride the same family of infrastructure.

Property	MQTT	WebSocket
Protocol overhead	2-byte fixed header minimum	2–14 bytes per frame
Power consumption	Lower on mobile vs. HTTP polling (Meta’s 2011 post argues the win comes from one persistent session and tiny keepalives; not separately quantified)	Higher (app must run its own keepalive cadence)
Reconnection	Built-in session resumption + persistent sessions	Application-defined
QoS levels	At-most-once / at-least-once / exactly-once	Application-defined

MQTT topic structure:

1# User's DM inbox (subscribe on connect)2/u/{user_id}/inbox34# Thread-specific updates5/t/{thread_id}/messages67# Typing indicators8/t/{thread_id}/typing

Direct’s Mutation Manager (DMM)

Instagram’s engineering team built a dedicated mutation manager (DMM) for Direct to handle:

Optimistic UI: Show sent message immediately, reconcile with server response
Offline support: Queue messages when offline, sync when reconnected
Ordering guarantees: Preserve message order even with network jitter
Retry logic: Automatic retry with exponential backoff

Client-side queue:

1interface QueuedMessage {2  localId: string // Client-generated UUID3  threadId: string4  content: string5  timestamp: number6  status: "pending" | "sent" | "failed"7  retryCount: number8}910// Persisted to IndexedDB/SQLite11// Survives app restarts

Cassandra Data Model

DMs use Cassandra for high write throughput and partition-local queries. Instagram has been a heavy Cassandra user since the early 2010s; in 2018 the team published Rocksandra, a RocksDB-backed pluggable storage engine for Cassandra that cut p99 read latency by ~10x by replacing the default LSM-tree implementation with RocksDB and avoiding JVM GC pauses on the read path. The Cassandra schema below describes the logical model; the physical engine in Instagram’s deployment is Rocksandra, not stock Cassandra.

1-- Thread metadata2CREATE TABLE threads (3    thread_id UUID PRIMARY KEY,4    participant_ids SET<UUID>,5    created_at TIMESTAMP,6    last_message_at TIMESTAMP,7    last_message_preview TEXT8);910-- Messages partitioned by thread11CREATE TABLE messages (12    thread_id UUID,13    message_id TIMEUUID,14    sender_id UUID,15    content TEXT,16    media_url TEXT,17    created_at TIMESTAMP,18    PRIMARY KEY (thread_id, message_id)19) WITH CLUSTERING ORDER BY (message_id DESC);2021-- User's inbox (materialized view for fast inbox loading)22CREATE TABLE user_inbox (23    user_id UUID,24    thread_id UUID,25    last_message_at TIMESTAMP,26    unread_count INT,27    PRIMARY KEY (user_id, last_message_at)28) WITH CLUSTERING ORDER BY (last_message_at DESC);

Explore and Recommendations

System Scale

Per Meta’s own engineering posts, Instagram’s Explore recommender:

Serves hundreds of millions of daily visitors at sub-second latency.
Selects from a candidate pool of billions of items.
Runs 1,000+ production ML models across the surface, with the ranking funnel split into retrieval, early-stage ranking, and late-stage ranking.

Three-Stage Recommendation Pipeline

Three-stage pipeline: retrieval narrows billions to thousands; early ranking to hundreds; late ranking produces final set with diversity constraints.

Two Towers Model (Retrieval)

Meta’s Explore architecture post describes the same two-tower retrieval shape: independent user and item towers produce embeddings that get compared via dot product, with item embeddings precomputed and indexed in an Approximate Nearest Neighbor service for online lookup.

Architecture:

1User Tower:                     Item Tower:2[User features]                 [Item features]3      ↓                              ↓4   Dense layers                  Dense layers5      ↓                              ↓6User embedding (128d)           Item embedding (128d)7      ↓                              ↓8      └────── Dot product ──────────┘9                   ↓10           Similarity score

User features:

Account-level embeddings (topical interests)
Recent engagement history
Social graph signals
Demographic signals (age bucket, region)

Item features:

Content embeddings (visual + text)
Creator features
Engagement statistics
Content category

Multi-Task Ranking

The late-stage ranker predicts multiple objectives simultaneously:

1Outputs:2- P(like)        weight: 1.03- P(comment)     weight: 2.0  (higher engagement)4- P(save)        weight: 3.0  (strong intent signal)5- P(share)       weight: 3.06- P(follow)      weight: 5.0  (acquisition metric)7- P(hide)        weight: -10.0 (negative signal)89Final score = Σ(weight × probability)

Model Training

Continual learning:

Models fine-tuned hourly with new engagement data
Base model retrained weekly with full dataset
Feature store updated in real-time

Scale:

1,000+ models running in production
Custom ML infrastructure (PyTorch-based)
GPU clusters for inference at <100ms p99

API Design

Photo Upload

1POST /api/v1/media/upload2Content-Type: multipart/form-data34Request:5- file: <binary>6- media_type: "image" | "video"7- filter_id: "clarendon" | "gingham" | ... (optional)89Response (200 OK):10{11  "media_id": "abc123",12  "urls": {13    "1080": "https://cdn.instagram.com/abc123/1080.jpg",14    "640": "https://cdn.instagram.com/abc123/640.jpg",15    "320": "https://cdn.instagram.com/abc123/320.jpg",16    "150": "https://cdn.instagram.com/abc123/150.jpg"17  },18  "expires_at": "2024-01-02T00:00:00Z"  // Media must be posted within 24h19}

Create Post

1POST /api/v1/posts23Request:4{5  "media_ids": ["abc123", "def456"],  // Up to 10 for carousel6  "caption": "Summer vibes 🌴",7  "location_id": "loc_789",           // Optional8  "tagged_users": ["user_111"],       // Optional9  "alt_text": "Beach sunset"          // Accessibility10}1112Response (201 Created):13{14  "post_id": "post_xyz",15  "permalink": "https://instagram.com/p/xyz",16  "created_at": "2024-01-01T12:00:00Z"17}1819Errors:20- 400: Invalid media_id (expired or not found)21- 400: Caption too long (> 2200 characters)22- 403: Tagged user has blocked you23- 429: Rate limited (> 25 posts/day)

Feed

1GET /api/v1/feed?cursor={cursor}&limit=2023Response (200 OK):4{5  "posts": [6    {7      "post_id": "post_xyz",8      "author": {9        "user_id": "user_123",10        "username": "photographer",11        "profile_pic_url": "https://...",12        "is_verified": true13      },14      "media": [15        {16          "type": "image",17          "url": "https://cdn.instagram.com/...",18          "width": 1080,19          "height": 1080,20          "alt_text": "Beach sunset"21        }22      ],23      "caption": "Summer vibes 🌴",24      "like_count": 1234,25      "comment_count": 56,26      "created_at": "2024-01-01T12:00:00Z",27      "viewer_has_liked": false,28      "viewer_has_saved": false29    }30  ],31  "next_cursor": "eyJ0cyI6MTY0...",32  "has_more": true33}

Stories

1GET /api/v1/stories/feed23Response (200 OK):4{5  "story_ring": [6    {7      "user_id": "user_123",8      "username": "friend1",9      "profile_pic_url": "https://...",10      "has_unseen": true,11      "latest_story_ts": "2024-01-01T11:00:00Z"12    }13  ],14  "stories": {15    "user_123": [16      {17        "story_id": "story_abc",18        "media_url": "https://...",19        "media_type": "image",20        "created_at": "2024-01-01T11:00:00Z",21        "expires_at": "2024-01-02T11:00:00Z",22        "seen": false,23        "reply_enabled": true24      }25    ]26  }27}

Direct Messages

1POST /api/v1/direct/threads/{thread_id}/messages23Request:4{5  "text": "Hey, nice photo!",6  "reply_to_story_id": "story_abc"  // Optional7}89Response (201 Created):10{11  "message_id": "msg_xyz",12  "thread_id": "thread_123",13  "created_at": "2024-01-01T12:00:00Z",14  "status": "sent"15}

Data Modeling

PostgreSQL Schema (Core Entities)

1-- Users2CREATE TABLE users (3    id BIGINT PRIMARY KEY,4    username VARCHAR(30) UNIQUE NOT NULL,5    email VARCHAR(255) UNIQUE,6    phone VARCHAR(20) UNIQUE,7    full_name VARCHAR(100),8    bio TEXT,9    profile_pic_url TEXT,10    is_private BOOLEAN DEFAULT false,11    is_verified BOOLEAN DEFAULT false,12    follower_count INT DEFAULT 0,13    following_count INT DEFAULT 0,14    post_count INT DEFAULT 0,15    created_at TIMESTAMPTZ DEFAULT NOW(),16    updated_at TIMESTAMPTZ DEFAULT NOW()17);1819CREATE INDEX idx_users_username ON users(username);2021-- Posts22CREATE TABLE posts (23    id BIGINT PRIMARY KEY,24    author_id BIGINT NOT NULL REFERENCES users(id),25    caption TEXT,26    location_id BIGINT REFERENCES locations(id),27    like_count INT DEFAULT 0,28    comment_count INT DEFAULT 0,29    is_archived BOOLEAN DEFAULT false,30    created_at TIMESTAMPTZ DEFAULT NOW(),31    deleted_at TIMESTAMPTZ32);3334CREATE INDEX idx_posts_author ON posts(author_id, created_at DESC);3536-- Post Media (supports carousel)37CREATE TABLE post_media (38    id BIGINT PRIMARY KEY,39    post_id BIGINT NOT NULL REFERENCES posts(id),40    media_type VARCHAR(10) NOT NULL,  -- 'image', 'video'41    url TEXT NOT NULL,42    width INT,43    height INT,44    alt_text TEXT,45    position SMALLINT DEFAULT 0,46    created_at TIMESTAMPTZ DEFAULT NOW()47);4849CREATE INDEX idx_post_media_post ON post_media(post_id);5051-- Comments52CREATE TABLE comments (53    id BIGINT PRIMARY KEY,54    post_id BIGINT NOT NULL REFERENCES posts(id),55    author_id BIGINT NOT NULL REFERENCES users(id),56    parent_id BIGINT REFERENCES comments(id),  -- For replies57    content TEXT NOT NULL,58    like_count INT DEFAULT 0,59    created_at TIMESTAMPTZ DEFAULT NOW(),60    deleted_at TIMESTAMPTZ61);6263CREATE INDEX idx_comments_post ON comments(post_id, created_at DESC);6465-- Likes (polymorphic)66CREATE TABLE likes (67    id BIGINT PRIMARY KEY,68    user_id BIGINT NOT NULL REFERENCES users(id),69    target_type VARCHAR(10) NOT NULL,  -- 'post', 'comment', 'story'70    target_id BIGINT NOT NULL,71    created_at TIMESTAMPTZ DEFAULT NOW(),72    UNIQUE(user_id, target_type, target_id)73);7475CREATE INDEX idx_likes_target ON likes(target_type, target_id);

The social graph is the read-heaviest surface in the system: every feed read, every notification, every Stories ring computation hits it. Meta describes its graph store as TAO (USENIX ATC 2013) — a read-optimized, geo-distributed graph cache layered over MySQL that exposes only two primitives: typed objects (nodes — users, posts, comments) and typed associations (directed edges — FOLLOWS, LIKES, AUTHORED, time-stamped and optionally carrying data). Most reads in TAO are single-edge or per-object lookups served from leader/follower cache tiers; writes go through the leader to keep cache invalidation correct. The Cassandra-backed schema below is the open-stack equivalent of the same shape: dual edge tables to make both directions of traversal partition-local, plus a per-user activity stream.

1-- Follows (partitioned by follower for "who do I follow" queries)2CREATE TABLE follows (3    follower_id UUID,4    following_id UUID,5    created_at TIMESTAMP,6    PRIMARY KEY (follower_id, following_id)7);89-- Followers (partitioned by following for "who follows me" queries)10CREATE TABLE followers (11    following_id UUID,12    follower_id UUID,13    created_at TIMESTAMP,14    PRIMARY KEY (following_id, follower_id)15);1617-- Activity feed (for "activity" tab)18CREATE TABLE activity (19    user_id UUID,20    activity_id TIMEUUID,21    actor_id UUID,22    activity_type TEXT,  -- 'like', 'comment', 'follow', 'mention'23    target_type TEXT,24    target_id UUID,25    created_at TIMESTAMP,26    PRIMARY KEY (user_id, activity_id)27) WITH CLUSTERING ORDER BY (activity_id DESC);

ID Generation (Instagram’s Approach)

Instagram’s original sharding and ID generation post defines a 64-bit, time-sortable ID minted inside PL/pgSQL on each logical shard:

1-- PL/pgSQL function for globally unique, time-sorted IDs2CREATE OR REPLACE FUNCTION instagram_id() RETURNS BIGINT AS $$3DECLARE4    epoch BIGINT := 1314220021721;  -- Custom epoch (Sep 2011)5    seq_id BIGINT;6    now_millis BIGINT;7    shard_id INT := 1;  -- Set per logical shard8    result BIGINT;9BEGIN10    SELECT nextval('instagram_id_seq') % 1024 INTO seq_id;11    SELECT FLOOR(EXTRACT(EPOCH FROM NOW()) * 1000) INTO now_millis;1213    result := (now_millis - epoch) << 23;  -- 41 bits for timestamp14    result := result | (shard_id << 10);    -- 13 bits for shard15    result := result | (seq_id);            -- 10 bits for sequence1617    RETURN result;18END;19$$ LANGUAGE PLPGSQL;

ID structure (64 bits):

Bits	Purpose	Range
41	Milliseconds since epoch	~69 years
13	Shard ID	8,192 shards
10	Sequence	1,024 IDs/ms/shard

Why this matters:

IDs are time-sorted: no separate timestamp index needed
IDs encode shard: routing without lookup
IDs are unique across shards: no coordination needed

Infrastructure

Cloud-Agnostic Concepts

Component	Purpose	Requirements
Object Storage	Media files	High durability, CDN integration
Relational DB	Users, posts, metadata	ACID, sharding support
Wide-column DB	Social graph, activity	High write throughput, partition-local queries
Cache	Timeline, hot data	Sub-ms latency, cluster support
Message Queue	Async processing	At-least-once delivery, partitioning
Search Index	Discovery	Full-text, faceted search
CDN	Media delivery	Global PoPs, cache efficiency
Push Gateway	Real-time notifications	MQTT/WebSocket support

AWS Reference Architecture

Component	Service	Configuration
API Gateway	ALB + API Gateway	Auto-scaling, WAF protection
Compute	EKS (Kubernetes)	Spot instances for workers
Primary DB	RDS PostgreSQL	Multi-AZ, read replicas
Social Graph	Amazon Keyspaces or self-managed Cassandra	Multi-region
Cache	ElastiCache Redis Cluster	Cluster mode, 6+ nodes
Object Storage	S3 + CloudFront	Intelligent tiering
Message Queue	Amazon MSK (Kafka) or SQS	For fan-out workers
Search	OpenSearch Service	3+ data nodes
Push	Amazon MQ (MQTT) or IoT Core	Managed MQTT broker

Multi-Region Deployment

Instagram’s Migration (AWS → Facebook)

Instagram migrated its production stack from AWS to Facebook’s data centers in 2013–2014, an internal project nicknamed “Instagration”. The team started at 8 engineers and grew to ~20 over roughly a year, and used a custom networking layer (“Neti”) to bridge EC2/VPC and Facebook’s internal network.

Before (AWS):

A Postgres-on-EC2 sharded fleet with read replicas
Memcached for hot caching
S3 for media storage

After (Facebook infrastructure):

Higher per-server efficiency on Facebook hardware (Wired’s coverage cites a roughly 3:1 consolidation ratio versus the prior EC2 footprint).
Shared access to Facebook’s caching, monitoring, and storage stacks.
Private long-haul network between data centers.
No user-visible disruption during the cutover.

Frontend Considerations

Feed Virtualization

Instagram’s feed is an infinite scroll of variable-height items:

1// Virtualized list configuration2const FeedList = () => {3  return (4    <VirtualizedList5      data={posts}6      renderItem={({ item }) => <PostCard post={item} />}7      estimatedItemSize={600}  // Average post height8      overscanCount={3}        // Render 3 items above/below viewport9      onEndReached={loadMore}10      onEndReachedThreshold={0.5}11    />12  );13};

Why virtualization:

Feed can have 1000+ posts
Each post has heavy media (images/video)
Without virtualization: memory explosion, jank

Image Loading Strategy

1// Progressive image loading2const PostImage = ({ post }) => {3  const [loaded, setLoaded] = useState(false);45  return (6    <div className="post-image">7      {/* Blur placeholder (tiny, inline) */}8      <img9        src={post.thumbnail_blur}  // 10x10 base6410        className={loaded ? 'hidden' : 'blur'}11      />1213      {/* Full image (lazy loaded) */}14      <img15        src={post.media_url}16        loading="lazy"17        onLoad={() => setLoaded(true)}18        className={loaded ? 'visible' : 'hidden'}19      />20    </div>21  );22};

Stories Ring Interaction

1// Horizontal scroll with snap points2const StoriesRing = ({ stories }) => {3  return (4    <div className="stories-ring" style={{5      display: 'flex',6      overflowX: 'scroll',7      scrollSnapType: 'x mandatory',8      WebkitOverflowScrolling: 'touch'  // Smooth iOS scroll9    }}>10      {stories.map(story => (11        <div12          key={story.id}13          style={{ scrollSnapAlign: 'start' }}14        >15          <StoryAvatar story={story} />16        </div>17      ))}18    </div>19  );20};

Optimistic Updates

1// Like button with optimistic UI2const LikeButton = ({ post }) => {3  const [optimisticLiked, setOptimisticLiked] = useState(post.viewer_has_liked);4  const [optimisticCount, setOptimisticCount] = useState(post.like_count);56  const handleLike = async () => {7    // Optimistic update (immediate feedback)8    setOptimisticLiked(!optimisticLiked);9    setOptimisticCount(prev => optimisticLiked ? prev - 1 : prev + 1);1011    try {12      await api.toggleLike(post.id);13    } catch (error) {14      // Rollback on failure15      setOptimisticLiked(post.viewer_has_liked);16      setOptimisticCount(post.like_count);17      showError('Failed to like post');18    }19  };2021  return (22    <button onClick={handleLike}>23      <HeartIcon filled={optimisticLiked} />24      <span>{formatCount(optimisticCount)}</span>25    </button>26  );27};

Conclusion

Instagram’s architecture demonstrates several principles that recur in any photo- or short-video sharing platform at this scale:

Architectural decisions:

Hybrid fan-out bounds per-post write amplification while keeping cached reads cheap. The follower threshold is a tunable knob — Instagram and Twitter both keep theirs unpublished — rather than a magic number.
Tiered blob storage matches the read distribution: a Haystack-shaped hot tier for fresh uploads, an f4-shaped warm tier (Reed-Solomon (10,4), ~2.1x effective replication) for the long tail, and a cold archive for everything else.
A purpose-built graph store (TAO-style objects and associations) keeps the dominant read shape — single-hop edge lookups — partition-local and cache-resident.
Separate storage strategies for ephemeral (Stories) and persistent (Posts) content match each surface’s access pattern and lifetime budget.
Three-stage recommendation funnel (retrieval → early ranking → late ranking, as described in Meta’s Explore architecture post) enables personalization across billions of items inside an inference budget measured in milliseconds.
MQTT for real-time keeps a persistent, low-overhead channel open on mobile, which has been Meta’s preferred shape for chat infrastructure since 2011.

Optimizations this design achieves:

Feed load: p99 < 500ms through cached timelines + celebrity merge
Stories load: p99 < 200ms through aggressive prefetch
Upload processing: < 10s for immediate user feedback
Global delivery: 95%+ CDN cache hit rate exploiting power-law distribution

Known limitations:

Hybrid fan-out requires maintaining two read paths and a merge step.
The high-fan-out follower threshold is a tunable heuristic; the right value drifts as social-graph distributions shift.
Continual / hourly fine-tuning of the ranker introduces a small staleness window around fast-moving content.
Multi-region replication is asynchronous, so reads that race a write can briefly see stale state.

Alternative approaches not chosen:

Pure push (write amplification at celebrity scale)
Pure pull (read latency unacceptable)
Single-region (latency for global users)

Appendix

Prerequisites

Distributed systems fundamentals (CAP theorem, eventual consistency)
Database concepts (sharding, replication, indexing)
Caching strategies (write-through, write-behind, cache invalidation)
CDN and content delivery concepts
Basic ML concepts (embeddings, neural networks)

Summary

Hybrid fan-out — push to follower timeline caches for the long tail of accounts; pull-merge above a tunable follower threshold to bound write amplification.
Image processing pipeline — variants generated for the 320–1080 px supported range; filters offloaded to GPU on-device.
Tiered photo storage — Haystack-style hot tier (in-memory needle index, ~1 disk seek per read), f4-style warm tier (Reed-Solomon (10,4), ~2.1x effective replication), cold archive for the long tail.
Social graph storage — TAO-style objects + associations layered over a sharded relational store, with leader/follower caches keeping cross-region reads fast.
Stories architecture — 24-hour TTL with aggressive prefetch targeting sub-200 ms perceived load.
Feed ranking — multi-task neural networks fed by tens-to-hundreds of thousands of features, fine-tuned continually on engagement events.
Explore recommendation — three-stage funnel (Two Towers retrieval → early ranking → late ranking) over 1,000+ production ML models.
MQTT for real-time — persistent low-overhead channel for DMs, notifications, and presence; Meta’s preferred mobile chat substrate since 2011.
Upload-time safety + privacy — strip GPS/serial EXIF before serving variants; run PhotoDNA / PDQ / GIFCT perceptual-hash matching inline before a CDN URL becomes addressable.

References

Finding a Needle in Haystack: Facebook’s Photo Storage — hot-tier blob store with in-memory needle index (OSDI 2010).
f4: Facebook’s Warm BLOB Storage System — warm-tier Reed-Solomon (10,4) erasure coding (OSDI 2014).
TAO: Facebook’s Distributed Data Store for the Social Graph — objects + associations graph store (USENIX ATC 2013).
Sharding & IDs at Instagram — original 64-bit ID generation design (Instagram Engineering).
What Powers Instagram — early architecture overview (Instagram Engineering).
Migrating from AWS to Facebook — the “Instagration” project (Instagram Engineering).
How Facebook Moved 20 Billion Instagram Photos Without You Noticing — secondary coverage of the migration (Wired, 2014).
Making Direct Messages Reliable and Fast — DM architecture and the Mutation Manager (Instagram Engineering).
Building Facebook Messenger — origin of MQTT use for Meta’s mobile chat (Engineering at Meta, 2011).
Scaling the Instagram Explore Recommendations System — three-stage funnel and Two Towers architecture (Engineering at Meta, 2023).
Journey to 1000 Models — ML infrastructure at scale (Engineering at Meta, 2025).
Powered by AI: Instagram’s Explore Recommender System — earlier deep-dive on Explore retrieval (Meta AI).
Instagram Video Processing and Encoding Reduction — video pipeline optimization (Engineering at Meta, 2022).
Introducing mcrouter — caching infrastructure (Engineering at Meta, 2014).
Image resolution of photos you share on Instagram — supported widths and resize behavior (Instagram Help Center).
Instagram now has 3 billion monthly active users — Sept 2025 announcement (CNBC).
MQTT Version 5.0 — protocol specification (OASIS).
Timelines at Scale — Raffi Krikorian on Twitter’s hybrid fan-out home timeline (QCon 2013, InfoQ).
Open-sourcing a 10x reduction in Apache Cassandra tail latency — Rocksandra: RocksDB-backed Cassandra storage engine (Instagram Engineering, 2018).
Open-sourcing photo- and video-matching technology — Meta’s PDQ image hash and TMK+PDQF video hash for safety scanning (2019).
PhotoDNA — Microsoft’s perceptual hash service for CSAM detection.
GIFCT Hash Sharing Database — industry-shared terror-content hash bank.
NCMEC Hash Sharing — National Center for Missing & Exploited Children hash list.