Design Instagram: Photo Sharing at Scale
Instagram crossed 3 billion monthly active users in September 2025, serving photo and short-video traffic at a scale where every architectural choice — fan-out shape, image variant count, TTL on ephemeral media, ranking model topology — is dictated by power-law follower distributions, mobile network constraints, and ML inference budgets measured in milliseconds. This design walks through the image upload pipeline, the hybrid fan-out that bounds write amplification, the Stories TTL architecture, the MQTT-based real-time messaging path, and the multi-stage Explore recommender, citing the Instagram and Meta engineering posts that document each subsystem.
Note
Meta stopped disclosing per-app daily active user (DAU) numbers in April 2024, so any DAU figure in this article is an industry estimate, not an official metric. Treat the order-of-magnitude as load-shaping context, not a published number.
Abstract
Instagram’s architecture is structured around three load-shaping problems that any photo-sharing platform faces once it crosses a few hundred million users:
- Write amplification vs. read latency. A post from a celebrity with tens of millions of followers would, under naive write-time fan-out, force tens of millions of timeline cache updates. A hybrid fan-out — push for “normal” accounts, pull-merge for high-fan-out accounts — bounds the per-post write cost while keeping cached reads in the low-millisecond range.
- Ephemeral vs. persistent content. Stories (24-hour TTL) and feed posts (permanent) have different read patterns and different lifetime budgets. Stories ride aggressive client prefetch and TTL-synced caching; posts ride tiered object storage with CDN caching.
- Cold start vs. engagement optimization. New sessions need value immediately; returning sessions need personalization. Meta’s Instagram Explore recommender uses Two Towers retrieval to narrow billions of candidates to thousands, then a multi-task neural ranker to produce the final ordering, and now spans over 1,000 production ML models.
Core mechanisms covered below:
- Hybrid fan-out — push to follower timeline caches for the long tail of accounts; pull-merge for high-fan-out accounts at read time.
- Multi-resolution image pipeline — variants generated for the 320–1080 px supported width range, with filtering offloaded to GPU on-device.
- Stories architecture — 24-hour TTL with aggressive prefetch targeting sub-200 ms perceived load.
- Feed ranking — multi-task neural networks fine-tuned continually on engagement events.
- MQTT for real-time — pioneered for Facebook Messenger in 2011 and now powers Instagram DMs, notifications, and presence over a 2-byte minimum-overhead binary protocol.
- Privacy + safety pipeline at upload time — EXIF metadata sanitization (drop GPS, serial numbers, raw timestamps before any public CDN URL exists) and perceptual-hash-based CSAM/violence/terror screening against industry banks (Microsoft PhotoDNA, Meta’s open-sourced PDQ + TMK+PDQF hashes, the GIFCT hash database).
Requirements
Functional Requirements
| Requirement | Priority | Notes |
|---|---|---|
| Photo/video upload | Core | Multiple resolutions, filters, up to 10 items per post |
| Feed (home timeline) | Core | Ranked, personalized, infinite scroll |
| Stories | Core | 24-hour ephemeral, ring UI, reply capability |
| Follow/unfollow | Core | Social graph management |
| Likes and comments | Core | Real-time counts, threaded comments |
| Direct messages | Core | Real-time chat, media sharing |
| Explore page | Core | Content discovery, personalized recommendations |
| Search | Core | Users, hashtags, locations |
| Notifications | Core | Likes, comments, follows, DM alerts |
| Reels | Extended | Short-form video (separate video pipeline) |
| Shopping | Out of scope | E-commerce integration |
| Ads | Out of scope | Separate ad-tech stack |
Non-Functional Requirements
| Requirement | Target | Rationale |
|---|---|---|
| Upload availability | 99.9% | Brief maintenance acceptable |
| Feed availability | 99.99% | Core engagement driver |
| Feed load latency | p99 < 500ms | User experience threshold |
| Stories load latency | p99 < 200ms | Tap-and-swipe UX requires instant response |
| Image processing time | < 10s | User waits for upload confirmation |
| DM delivery latency | p99 < 500ms | Real-time conversation expectation |
| Notification delivery | < 2s | Engagement driver |
| Feed freshness | < 30s for non-celebrities | Balance freshness vs. ranking quality |
Scale Estimation
These numbers blend Meta’s published headline figures (MAU, ML model count) with order-of-magnitude estimates that are common in system-design write-ups but not officially disclosed (DAU, uploads/day, average image size). Treat them as load-shaping context for capacity decisions, not as a published specification.
Monthly active users: 3,000M (Meta, Sept 2025 announcement)Daily active users: ~500M (industry estimate; Meta no longer discloses)Photos uploaded daily: ~100M (estimate; older "95M+" figures from circa 2018)Upload traffic (estimate) 100M uploads/day ≈ 1,150 uploads/second average Peak (3x) ≈ 3,500 uploads/second Avg compressed payload ≈ 2 MB per image Daily ingestion ≈ 200 TB/day at the wirePer-image storage (estimate) Original 2 MB Resolution variants 4 widths (1080 / 640 / 320 / 150 px) Aspect variants ×2 (square + original aspect) Total variants ≈ 8 derived files Total stored per image ≈ 5 MB across variants Daily storage growth ≈ 500 TB/dayFeed reads (estimate) 500M DAU × 20 sessions ≈ 10B feed reads/day Average rate ≈ 115K reads/second Peak ≈ 350K+ reads/secondSocial graph (estimate) Average followers/user ~150 (skewed by power law) Accounts with > 1M followers ~50,000 Graph edges ~10^11CDN efficiency. Photo and video traffic follows a strong power-law: a small fraction of media accounts for most reads. Assuming a 95% edge cache hit rate (which is what an Instagram-shape CDN should be tuned for), the origin only sees the cold tail:
Reads to origin ≈ 5% × 350K rps peak ≈ 17K rpsHot content served almost entirely from CDN edgeCold tail dominates origin egress and storage I/ODesign Paths
Path A: Push-First Fan-out
Best when:
- Smaller scale (<100M users)
- Read latency is critical
- Most users have similar follower counts (no extreme outliers)
Architecture:
On post creation, push the post ID to every follower’s timeline cache. Reads are O(1) cache lookups.
Trade-offs:
- ✅ Extremely fast reads (pre-computed timelines)
- ✅ Simple read path (single cache lookup)
- ❌ Massive write amplification for popular accounts
- ❌ Wasted writes for inactive followers
- ❌ Storage explosion (N copies per post where N = followers)
Real-world example: Early Twitter (~2010–2012) is the canonical illustration of push-only running into a celebrity wall: a single high-fan-out tweet had to be written into millions of follower timelines, which is why Twitter moved to a hybrid model later in that period.
Path B: Pull-Only Fan-out
Best when:
- Read latency tolerance is higher
- Storage cost is primary concern
- Feed freshness can be slightly stale
Architecture:
On feed request, query the social graph for followed users, fetch their recent posts, merge and rank.
Trade-offs:
- ✅ No write amplification
- ✅ Minimal storage (posts stored once)
- ✅ Always fresh (computed at read time)
- ❌ High read latency (multiple DB queries)
- ❌ Expensive computation per request
- ❌ Difficult to rank effectively (limited time for ML)
Real-world example: Early News Feed implementations on social platforms (pre-2010) used pull-only computation and abandoned it as following counts grew, since per-request graph traversal blew the read latency budget.
Path C: Hybrid Fan-out (Instagram Model)
Best when:
- Massive scale with power-law follower distribution
- Sub-second read latency required
- High-fan-out accounts exist (>1M followers)
Architecture:
- Long-tail accounts (low follower count, commonly modeled at < 5–10K followers in public write-ups): push the new post id into each follower’s timeline cache on write.
- High-fan-out accounts (above the threshold): keep posts in a per-author store and merge them in at read time.
- Inactive followers: skip fan-out for them and compute on demand if they return.
Note
Instagram and Twitter both use hybrid fan-out, but neither has officially published its production threshold. Raffi Krikorian’s Timelines at Scale (QCon 2013) is the canonical engineering talk on Twitter’s home-timeline architecture — push-on-write into Redis-backed materialised timelines for the long tail, pull-and-merge for high-fan-out accounts at read time — and the 5–10K follower band shows up consistently in community write-ups of the architecture. Treat the threshold as a tunable knob, not a magic number: the right value is the point where the marginal cost of a fan-out write exceeds the marginal cost of a read-time merge.
Trade-offs:
- ✅ Bounded write amplification (capped at the threshold)
- ✅ Fast reads for most users (cache hit + small merge)
- ✅ Handles celebrity scale without storage explosion
- ❌ Two code paths to maintain
- ❌ Merge logic adds complexity
- ❌ Posts from above-threshold accounts pay a small read-time merge cost
Path Comparison
| Factor | Push-First | Pull-Only | Hybrid |
|---|---|---|---|
| Read latency | O(1) | O(following × posts) | O(1) + O(celebrities) |
| Write amplification | O(followers) | O(1) | O(min(followers, threshold)) |
| Storage per post | O(followers) | O(1) | O(min(followers, threshold)) |
| Code complexity | Low | Low | Medium |
| Freshness | Immediate | Immediate | Immediate (regular), slight delay (celebrity merge) |
| Best scale | <100M users | <10M users | Billions |
This Article’s Focus
The rest of this article assumes Path C (Hybrid Fan-out) as the production design, because:
- At Instagram’s scale (3B MAU as of Sept 2025) the per-post write cost has to be bounded.
- The top of the follower distribution (Cristiano Ronaldo’s account is in the 650M-follower range) makes pure push impractical — a single post would dominate the cluster’s write capacity.
- The hybrid model is the design Instagram and Twitter both publicly describe, even if exact thresholds are not.
High-Level Design
Component Overview
Service Responsibilities
| Service | Responsibility | Data Store | Key Operations |
|---|---|---|---|
| Upload Service | Media ingestion, validation, processing | S3, PostgreSQL | Resize, filter, generate variants |
| Post Service | Post CRUD, metadata management | PostgreSQL | Create, update, delete, soft-delete |
| Feed Service | Timeline generation, ranking | Redis, PostgreSQL | Fan-out, merge, rank |
| Stories Service | Ephemeral content management | Redis (TTL), S3 | Create, expire, ring ordering |
| Social Service | Follow graph management | Cassandra | Follow, unfollow, follower lists |
| Search Service | User/hashtag/location search | Elasticsearch | Index, query, autocomplete |
| Explore Service | Content discovery, recommendations | ML platform | Candidate retrieval, ranking |
| DM Service | Real-time messaging | Cassandra, Redis | Send, receive, sync |
| Notification Service | Push and in-app notifications | PostgreSQL, Redis | Queue, dedupe, deliver |
Image Upload Service
Upload Flow
Image Processing Pipeline
Input validation mirrors what Instagram documents in its image resolution help center page:
- Supported feed widths land in the 320–1080 px band; sub-320 px uploads get upscaled to 320 px and >1080 px uploads get downscaled to 1080 px.
- Supported source formats: JPEG, PNG, HEIC (HEIC is normalized to JPEG/HEIF for delivery).
- Maximum upload size is generous on the wire (tens of MB), but client-side compression typically lands the post in the 2–5 MB range before the server sees it.
Metadata sanitization (EXIF stripping). Camera-generated JPEG/HEIC files carry EXIF blocks that frequently include precise GPS coordinates, device serial numbers, and capture timestamps. Any photo-sharing platform must strip privacy-sensitive EXIF tags before serving variants from the CDN — public-share variants typically retain only orientation, color profile, and a sanitized capture timestamp; GPS, serial numbers, and thumbnail-embedded EXIF are dropped. The original-resolution copy may keep the full EXIF in cold storage for the uploader’s own archive view, but it is never served to followers. Internally this is a small re-encode step that runs alongside variant generation; it is cheap relative to the resize ladder but mandatory for compliance.
Safety scanning (hash matching at the edge of the pipeline). Before a media id becomes addressable from a public CDN URL, every upload runs through perceptual-hash matching against the industry CSAM hash banks (Microsoft’s PhotoDNA, Meta’s PDQ image hash and TMK+PDQF video hash — both open-sourced in 2019 — and the NCMEC hash list) plus internal classifiers for nudity, violence, and policy-violating content. Hash matches block the post and trigger a NCMEC report; classifier hits route into human review queues. The hashing step adds milliseconds at the most (PDQ is a 256-bit hash with sub-millisecond compute), so the pipeline can run it inline with variant generation rather than as a deferred audit. Meta also runs perceptual-hash matching against terror content via the GIFCT shared hash database.
Resolution variants generated. The exact ladder is implementation-specific; the shape below is representative of what a power-law CDN footprint requires:
| Variant | Dimensions | Use Case |
|---|---|---|
| Original | Up to 1080px | Full-screen view |
| Large | 1080 × 1080 | Feed (high-DPI devices) |
| Medium | 640 × 640 | Feed (standard devices) |
| Small | 320 × 320 | Grid view, thumbnails |
| Thumbnail | 150 × 150 | Notifications, search results |
Filter processing. On modern devices the heavy lifting is done on-device via GPU shaders (Metal on iOS, OpenGL ES / Vulkan on Android), so the server pipeline only has to deal with already-baked pixels for filtered uploads. Server-side filtering is a fallback path (for example, web uploads). Conceptually a filter is:
output = Blend( Adjust( LUT(input) ) ) LUT 3D color look-up table (per-filter asset) Adjust brightness / contrast / saturation / warmth Blend vignette, frame, grain overlayProcessing time budget:
| Operation | Target | Notes |
|---|---|---|
| Upload to storage | < 2s | Depends on connection |
| Generate variants | < 3s | Parallel processing |
| Filter application | < 1s | GPU-accelerated |
| CDN propagation | < 5s | Edge cache warming |
| Total | < 10s | User-perceived upload time |
Storage Strategy
Object storage layout:
s3://instagram-media/ /{user_id}/ /{media_id}/ original.jpg # Raw upload 1080.jpg # Full resolution 640.jpg # Medium 320.jpg # Small 150.jpg # Thumbnail metadata.json # EXIF, dimensions, filter appliedCDN caching rules:
| Content Type | Cache Duration | Cache Key |
|---|---|---|
| Original | 1 year | {media_id}/original |
| Variants | 1 year | {media_id}/{size} |
| Profile pictures | 1 hour | {user_id}/profile |
| Stories media | 24 hours | {story_id}/media |
Storage tiering (power-law optimization): photo workloads at this scale follow a textbook hot/warm/cold split, and Meta has published the two papers that shape the canonical design — Haystack for hot blobs (OSDI 2010) and f4 for warm blobs (OSDI 2014). The conceptual ladder used by any photo-sharing platform of this shape is:
| Tier | Inspired by | Storage shape | Replication | Access pattern |
|---|---|---|---|---|
| Hot | Haystack | Append-only volume files; in-memory needle index; one disk seek per read | 3x replicated | Recent uploads, top of the long-tail |
| Warm | f4 | Reed-Solomon (10,4) erasure coding inside a cell; XOR coding across regions | Effective replication ~2.1x (vs. 3.6x in Haystack) | Older content, access rate has decayed |
| Cold | Archive | Deep archive on cheap media | Geo-redundant | Rarely accessed, kept for durability + recall |
Hot tier (SSD-backed Haystack-style volumes): Last 7 days of uploads, frequently accessedWarm tier (f4-style erasure-coded cells): 7 days - 1 year, moderate accessCold tier (deep archive): > 1 year, rare accessMigration policy:- Content accessed > 10x/day stays hot- Content accessed < 1x/week moves to warm- Content not accessed in 90 days moves to coldNote
Haystack’s central optimization is keeping the (volume, offset, size) index for every needle in main memory, so a hot read is at most one disk seek. f4 trades a small read-amplification penalty for a large storage win — its production Reed-Solomon (10,4) configuration tolerates four simultaneous failures inside a cell while cutting effective replication from 3.6x to 2.1x. The hot-vs-warm decision is a function of access rate and age, not file type.
Feed Generation Service
Hybrid Fan-out Implementation
Timeline Cache Structure
Redis data model:
# Timeline cache (sorted set by timestamp)ZADD timeline:{user_id} {timestamp} {post_id}# Keep last 800 posts per timelineZREMRANGEBYRANK timeline:{user_id} 0 -801# Post metadata cache (hash)HSET post:{post_id} author_id "{user_id}" media_url "{cdn_url}" caption "{text}" like_count {count} created_at {timestamp}# Celebrity posts (separate sorted set per celebrity)ZADD celebrity:{user_id}:posts {timestamp} {post_id}Timeline composition at read:
def get_feed(user_id, cursor=None, limit=20): # 1. Get cached timeline posts cached_posts = redis.zrevrange( f"timeline:{user_id}", start=cursor or 0, end=(cursor or 0) + limit * 2 # Fetch extra for ranking ) # 2. Get followed celebrities celebrities = get_followed_celebrities(user_id) # 3. Fetch recent celebrity posts (last 24h) celebrity_posts = [] for celeb_id in celebrities: posts = redis.zrevrangebyscore( f"celebrity:{celeb_id}:posts", max=now(), min=now() - 86400, # 24 hours limit=5 ) celebrity_posts.extend(posts) # 4. Merge and rank all_posts = cached_posts + celebrity_posts ranked_posts = ranking_service.rank(user_id, all_posts) return ranked_posts[:limit]Feed Ranking
Instagram’s ranking system uses deep neural networks fed by tens-to-hundreds of thousands of dense and sparse features (the Explore engineering post describes the same shape for the discovery surface).
Signal categories:
| Category | Signals | Weight (approx) |
|---|---|---|
| Relationship | DM history, profile visits, comments, tags | High |
| Interest | Content type engagement, hashtag affinity | High |
| Timeliness | Post age, time since last seen | Medium |
| Popularity | Like velocity, comment rate, share count | Medium |
| Creator | Posting frequency, content quality score | Low |
Ranking model architecture:
Input: User embeddings + Post embeddings + Context features ↓Feature extraction (10K–100K+ dense + sparse features) ↓Multi-task neural network ↓Outputs: - P(like) - P(comment) - P(save) - P(share) - P(time_spent > 10s) ↓Weighted combination → Final scoreModel training:
- Trained on billions of engagement events
- Fine-tuned hourly with recent interactions
- A/B tested continuously (10+ experiments running at any time)
Consistency and Pagination
Consistency model:
| Operation | Consistency | Rationale |
|---|---|---|
| Own post visibility | Strong (immediate) | User expects to see own post |
| Follower timeline update | Eventual (< 30s) | Acceptable delay for feed freshness |
| Like/comment counts | Eventual (< 5s) | Tolerable for social proof |
| Unfollow propagation | Strong (immediate) | Privacy expectation |
Cursor-based pagination:
// RequestGET /feed?cursor=eyJ0cyI6MTY0...&limit=20// Response{ "posts": [...], "next_cursor": "eyJ0cyI6MTY0...", "has_more": true}// Cursor structure (base64-encoded){ "ts": 1640000000, // Timestamp of last item "pid": "abc123", // Post ID (for tie-breaking) "v": 2 // Cursor version (for migrations)}Why cursor-based (not offset-based):
- Timeline changes between requests (new posts arrive)
- Offset pagination causes duplicates or missed posts
- Cursor is stable: “posts older than X” always returns consistent results
Stories Service
Architecture
Stories have fundamentally different requirements than feed posts:
| Property | Posts | Stories |
|---|---|---|
| Lifetime | Permanent | 24 hours |
| Load time target | < 500ms | < 200ms |
| Caching strategy | CDN + Redis | Aggressive prefetch |
| Ranking | Complex ML | Recency + engagement |
Story Ring Ordering
The “Stories ring” (horizontal tray at top) orders accounts by engagement signals:
Ordering factors:
- Accounts with unseen stories (always first)
- DM interaction frequency
- Profile visit frequency
- Comment/like history
- Story view history (accounts you consistently view)
Data model:
# Story metadata (expires with TTL)SETEX story:{story_id} 86400 '{ "author_id": "123", "media_url": "https://...", "created_at": 1640000000, "viewers": [], "reply_enabled": true}'# User's active stories (sorted set, auto-cleanup)ZADD user:{user_id}:stories {created_at} {story_id}ZREMRANGEBYSCORE user:{user_id}:stories -inf {now - 86400}# Story ring ordering per viewerZADD user:{viewer_id}:story_ring {engagement_score} {author_id}Prefetch Strategy
Client-side behavior:
On app open:1. Fetch story ring ordering (lightweight API call)2. Prefetch first 3 story authors' media (background)3. As user views stories, prefetch next 2 authors aheadOn story view:1. Preload all segments of current story2. Preload first segment of next story3. Mark current story as viewed (async)Why aggressive prefetch:
- Stories UX is tap-tap-tap: any loading spinner breaks flow
- Media is small (compressed images/short videos)
- Users view multiple stories in sequence: sequential access pattern
TTL and Expiration
Server-side:
- Redis keys set with 24-hour TTL
- Background job cleans up S3 media at TTL+1 hour (grace period for in-flight views)
Client-side:
- Local cache respiration synced with server TTL
- Client computes
ttl_remaining = story.created_at + 86400 - now() - Evict from local cache when TTL expires
Direct Messages Service
Architecture
Instagram DMs handle real-time messaging with E2E encryption support.
MQTT for Real-time
Meta has used MQTT for mobile messaging since Lucy Zhang’s 2011 “Building Facebook Messenger” post — the original argument was that a persistent, lightweight pub/sub session beats HTTP polling on both latency and battery on mobile networks. Instagram DMs ride the same family of infrastructure.
| Property | MQTT | WebSocket |
|---|---|---|
| Protocol overhead | 2-byte fixed header minimum | 2–14 bytes per frame |
| Power consumption | Lower on mobile vs. HTTP polling (Meta’s 2011 post argues the win comes from one persistent session and tiny keepalives; not separately quantified) | Higher (app must run its own keepalive cadence) |
| Reconnection | Built-in session resumption + persistent sessions | Application-defined |
| QoS levels | At-most-once / at-least-once / exactly-once | Application-defined |
MQTT topic structure:
# User's DM inbox (subscribe on connect)/u/{user_id}/inbox# Thread-specific updates/t/{thread_id}/messages# Typing indicators/t/{thread_id}/typingDirect’s Mutation Manager (DMM)
Instagram’s engineering team built a dedicated mutation manager (DMM) for Direct to handle:
- Optimistic UI: Show sent message immediately, reconcile with server response
- Offline support: Queue messages when offline, sync when reconnected
- Ordering guarantees: Preserve message order even with network jitter
- Retry logic: Automatic retry with exponential backoff
Client-side queue:
interface QueuedMessage { localId: string // Client-generated UUID threadId: string content: string timestamp: number status: "pending" | "sent" | "failed" retryCount: number}// Persisted to IndexedDB/SQLite// Survives app restartsCassandra Data Model
DMs use Cassandra for high write throughput and partition-local queries. Instagram has been a heavy Cassandra user since the early 2010s; in 2018 the team published Rocksandra, a RocksDB-backed pluggable storage engine for Cassandra that cut p99 read latency by ~10x by replacing the default LSM-tree implementation with RocksDB and avoiding JVM GC pauses on the read path. The Cassandra schema below describes the logical model; the physical engine in Instagram’s deployment is Rocksandra, not stock Cassandra.
-- Thread metadataCREATE TABLE threads ( thread_id UUID PRIMARY KEY, participant_ids SET<UUID>, created_at TIMESTAMP, last_message_at TIMESTAMP, last_message_preview TEXT);-- Messages partitioned by threadCREATE TABLE messages ( thread_id UUID, message_id TIMEUUID, sender_id UUID, content TEXT, media_url TEXT, created_at TIMESTAMP, PRIMARY KEY (thread_id, message_id)) WITH CLUSTERING ORDER BY (message_id DESC);-- User's inbox (materialized view for fast inbox loading)CREATE TABLE user_inbox ( user_id UUID, thread_id UUID, last_message_at TIMESTAMP, unread_count INT, PRIMARY KEY (user_id, last_message_at)) WITH CLUSTERING ORDER BY (last_message_at DESC);Explore and Recommendations
System Scale
Per Meta’s own engineering posts, Instagram’s Explore recommender:
- Serves hundreds of millions of daily visitors at sub-second latency.
- Selects from a candidate pool of billions of items.
- Runs 1,000+ production ML models across the surface, with the ranking funnel split into retrieval, early-stage ranking, and late-stage ranking.
Three-Stage Recommendation Pipeline
Two Towers Model (Retrieval)
Meta’s Explore architecture post describes the same two-tower retrieval shape: independent user and item towers produce embeddings that get compared via dot product, with item embeddings precomputed and indexed in an Approximate Nearest Neighbor service for online lookup.
Architecture:
User Tower: Item Tower:[User features] [Item features] ↓ ↓ Dense layers Dense layers ↓ ↓User embedding (128d) Item embedding (128d) ↓ ↓ └────── Dot product ──────────┘ ↓ Similarity scoreUser features:
- Account-level embeddings (topical interests)
- Recent engagement history
- Social graph signals
- Demographic signals (age bucket, region)
Item features:
- Content embeddings (visual + text)
- Creator features
- Engagement statistics
- Content category
Multi-Task Ranking
The late-stage ranker predicts multiple objectives simultaneously:
Outputs:- P(like) weight: 1.0- P(comment) weight: 2.0 (higher engagement)- P(save) weight: 3.0 (strong intent signal)- P(share) weight: 3.0- P(follow) weight: 5.0 (acquisition metric)- P(hide) weight: -10.0 (negative signal)Final score = Σ(weight × probability)Model Training
Continual learning:
- Models fine-tuned hourly with new engagement data
- Base model retrained weekly with full dataset
- Feature store updated in real-time
Scale:
- 1,000+ models running in production
- Custom ML infrastructure (PyTorch-based)
- GPU clusters for inference at <100ms p99
API Design
Photo Upload
POST /api/v1/media/uploadContent-Type: multipart/form-dataRequest:- file: <binary>- media_type: "image" | "video"- filter_id: "clarendon" | "gingham" | ... (optional)Response (200 OK):{ "media_id": "abc123", "urls": { "1080": "https://cdn.instagram.com/abc123/1080.jpg", "640": "https://cdn.instagram.com/abc123/640.jpg", "320": "https://cdn.instagram.com/abc123/320.jpg", "150": "https://cdn.instagram.com/abc123/150.jpg" }, "expires_at": "2024-01-02T00:00:00Z" // Media must be posted within 24h}Create Post
POST /api/v1/postsRequest:{ "media_ids": ["abc123", "def456"], // Up to 10 for carousel "caption": "Summer vibes 🌴", "location_id": "loc_789", // Optional "tagged_users": ["user_111"], // Optional "alt_text": "Beach sunset" // Accessibility}Response (201 Created):{ "post_id": "post_xyz", "permalink": "https://instagram.com/p/xyz", "created_at": "2024-01-01T12:00:00Z"}Errors:- 400: Invalid media_id (expired or not found)- 400: Caption too long (> 2200 characters)- 403: Tagged user has blocked you- 429: Rate limited (> 25 posts/day)Feed
GET /api/v1/feed?cursor={cursor}&limit=20Response (200 OK):{ "posts": [ { "post_id": "post_xyz", "author": { "user_id": "user_123", "username": "photographer", "profile_pic_url": "https://...", "is_verified": true }, "media": [ { "type": "image", "url": "https://cdn.instagram.com/...", "width": 1080, "height": 1080, "alt_text": "Beach sunset" } ], "caption": "Summer vibes 🌴", "like_count": 1234, "comment_count": 56, "created_at": "2024-01-01T12:00:00Z", "viewer_has_liked": false, "viewer_has_saved": false } ], "next_cursor": "eyJ0cyI6MTY0...", "has_more": true}Stories
GET /api/v1/stories/feedResponse (200 OK):{ "story_ring": [ { "user_id": "user_123", "username": "friend1", "profile_pic_url": "https://...", "has_unseen": true, "latest_story_ts": "2024-01-01T11:00:00Z" } ], "stories": { "user_123": [ { "story_id": "story_abc", "media_url": "https://...", "media_type": "image", "created_at": "2024-01-01T11:00:00Z", "expires_at": "2024-01-02T11:00:00Z", "seen": false, "reply_enabled": true } ] }}Direct Messages
POST /api/v1/direct/threads/{thread_id}/messagesRequest:{ "text": "Hey, nice photo!", "reply_to_story_id": "story_abc" // Optional}Response (201 Created):{ "message_id": "msg_xyz", "thread_id": "thread_123", "created_at": "2024-01-01T12:00:00Z", "status": "sent"}Data Modeling
PostgreSQL Schema (Core Entities)
-- UsersCREATE TABLE users ( id BIGINT PRIMARY KEY, username VARCHAR(30) UNIQUE NOT NULL, email VARCHAR(255) UNIQUE, phone VARCHAR(20) UNIQUE, full_name VARCHAR(100), bio TEXT, profile_pic_url TEXT, is_private BOOLEAN DEFAULT false, is_verified BOOLEAN DEFAULT false, follower_count INT DEFAULT 0, following_count INT DEFAULT 0, post_count INT DEFAULT 0, created_at TIMESTAMPTZ DEFAULT NOW(), updated_at TIMESTAMPTZ DEFAULT NOW());CREATE INDEX idx_users_username ON users(username);-- PostsCREATE TABLE posts ( id BIGINT PRIMARY KEY, author_id BIGINT NOT NULL REFERENCES users(id), caption TEXT, location_id BIGINT REFERENCES locations(id), like_count INT DEFAULT 0, comment_count INT DEFAULT 0, is_archived BOOLEAN DEFAULT false, created_at TIMESTAMPTZ DEFAULT NOW(), deleted_at TIMESTAMPTZ);CREATE INDEX idx_posts_author ON posts(author_id, created_at DESC);-- Post Media (supports carousel)CREATE TABLE post_media ( id BIGINT PRIMARY KEY, post_id BIGINT NOT NULL REFERENCES posts(id), media_type VARCHAR(10) NOT NULL, -- 'image', 'video' url TEXT NOT NULL, width INT, height INT, alt_text TEXT, position SMALLINT DEFAULT 0, created_at TIMESTAMPTZ DEFAULT NOW());CREATE INDEX idx_post_media_post ON post_media(post_id);-- CommentsCREATE TABLE comments ( id BIGINT PRIMARY KEY, post_id BIGINT NOT NULL REFERENCES posts(id), author_id BIGINT NOT NULL REFERENCES users(id), parent_id BIGINT REFERENCES comments(id), -- For replies content TEXT NOT NULL, like_count INT DEFAULT 0, created_at TIMESTAMPTZ DEFAULT NOW(), deleted_at TIMESTAMPTZ);CREATE INDEX idx_comments_post ON comments(post_id, created_at DESC);-- Likes (polymorphic)CREATE TABLE likes ( id BIGINT PRIMARY KEY, user_id BIGINT NOT NULL REFERENCES users(id), target_type VARCHAR(10) NOT NULL, -- 'post', 'comment', 'story' target_id BIGINT NOT NULL, created_at TIMESTAMPTZ DEFAULT NOW(), UNIQUE(user_id, target_type, target_id));CREATE INDEX idx_likes_target ON likes(target_type, target_id);Cassandra Schema (Social Graph)
The social graph is the read-heaviest surface in the system: every feed read, every notification, every Stories ring computation hits it. Meta describes its graph store as TAO (USENIX ATC 2013) — a read-optimized, geo-distributed graph cache layered over MySQL that exposes only two primitives: typed objects (nodes — users, posts, comments) and typed associations (directed edges — FOLLOWS, LIKES, AUTHORED, time-stamped and optionally carrying data). Most reads in TAO are single-edge or per-object lookups served from leader/follower cache tiers; writes go through the leader to keep cache invalidation correct. The Cassandra-backed schema below is the open-stack equivalent of the same shape: dual edge tables to make both directions of traversal partition-local, plus a per-user activity stream.
-- Follows (partitioned by follower for "who do I follow" queries)CREATE TABLE follows ( follower_id UUID, following_id UUID, created_at TIMESTAMP, PRIMARY KEY (follower_id, following_id));-- Followers (partitioned by following for "who follows me" queries)CREATE TABLE followers ( following_id UUID, follower_id UUID, created_at TIMESTAMP, PRIMARY KEY (following_id, follower_id));-- Activity feed (for "activity" tab)CREATE TABLE activity ( user_id UUID, activity_id TIMEUUID, actor_id UUID, activity_type TEXT, -- 'like', 'comment', 'follow', 'mention' target_type TEXT, target_id UUID, created_at TIMESTAMP, PRIMARY KEY (user_id, activity_id)) WITH CLUSTERING ORDER BY (activity_id DESC);ID Generation (Instagram’s Approach)
Instagram’s original sharding and ID generation post defines a 64-bit, time-sortable ID minted inside PL/pgSQL on each logical shard:
-- PL/pgSQL function for globally unique, time-sorted IDsCREATE OR REPLACE FUNCTION instagram_id() RETURNS BIGINT AS $$DECLARE epoch BIGINT := 1314220021721; -- Custom epoch (Sep 2011) seq_id BIGINT; now_millis BIGINT; shard_id INT := 1; -- Set per logical shard result BIGINT;BEGIN SELECT nextval('instagram_id_seq') % 1024 INTO seq_id; SELECT FLOOR(EXTRACT(EPOCH FROM NOW()) * 1000) INTO now_millis; result := (now_millis - epoch) << 23; -- 41 bits for timestamp result := result | (shard_id << 10); -- 13 bits for shard result := result | (seq_id); -- 10 bits for sequence RETURN result;END;$$ LANGUAGE PLPGSQL;ID structure (64 bits):
| Bits | Purpose | Range |
|---|---|---|
| 41 | Milliseconds since epoch | ~69 years |
| 13 | Shard ID | 8,192 shards |
| 10 | Sequence | 1,024 IDs/ms/shard |
Why this matters:
- IDs are time-sorted: no separate timestamp index needed
- IDs encode shard: routing without lookup
- IDs are unique across shards: no coordination needed
Infrastructure
Cloud-Agnostic Concepts
| Component | Purpose | Requirements |
|---|---|---|
| Object Storage | Media files | High durability, CDN integration |
| Relational DB | Users, posts, metadata | ACID, sharding support |
| Wide-column DB | Social graph, activity | High write throughput, partition-local queries |
| Cache | Timeline, hot data | Sub-ms latency, cluster support |
| Message Queue | Async processing | At-least-once delivery, partitioning |
| Search Index | Discovery | Full-text, faceted search |
| CDN | Media delivery | Global PoPs, cache efficiency |
| Push Gateway | Real-time notifications | MQTT/WebSocket support |
AWS Reference Architecture
| Component | Service | Configuration |
|---|---|---|
| API Gateway | ALB + API Gateway | Auto-scaling, WAF protection |
| Compute | EKS (Kubernetes) | Spot instances for workers |
| Primary DB | RDS PostgreSQL | Multi-AZ, read replicas |
| Social Graph | Amazon Keyspaces or self-managed Cassandra | Multi-region |
| Cache | ElastiCache Redis Cluster | Cluster mode, 6+ nodes |
| Object Storage | S3 + CloudFront | Intelligent tiering |
| Message Queue | Amazon MSK (Kafka) or SQS | For fan-out workers |
| Search | OpenSearch Service | 3+ data nodes |
| Push | Amazon MQ (MQTT) or IoT Core | Managed MQTT broker |
Multi-Region Deployment
Instagram’s Migration (AWS → Facebook)
Instagram migrated its production stack from AWS to Facebook’s data centers in 2013–2014, an internal project nicknamed “Instagration”. The team started at 8 engineers and grew to ~20 over roughly a year, and used a custom networking layer (“Neti”) to bridge EC2/VPC and Facebook’s internal network.
Before (AWS):
- A Postgres-on-EC2 sharded fleet with read replicas
- Memcached for hot caching
- S3 for media storage
After (Facebook infrastructure):
- Higher per-server efficiency on Facebook hardware (Wired’s coverage cites a roughly 3:1 consolidation ratio versus the prior EC2 footprint).
- Shared access to Facebook’s caching, monitoring, and storage stacks.
- Private long-haul network between data centers.
- No user-visible disruption during the cutover.
Frontend Considerations
Feed Virtualization
Instagram’s feed is an infinite scroll of variable-height items:
// Virtualized list configurationconst FeedList = () => { return ( <VirtualizedList data={posts} renderItem={({ item }) => <PostCard post={item} />} estimatedItemSize={600} // Average post height overscanCount={3} // Render 3 items above/below viewport onEndReached={loadMore} onEndReachedThreshold={0.5} /> );};Why virtualization:
- Feed can have 1000+ posts
- Each post has heavy media (images/video)
- Without virtualization: memory explosion, jank
Image Loading Strategy
// Progressive image loadingconst PostImage = ({ post }) => { const [loaded, setLoaded] = useState(false); return ( <div className="post-image"> {/* Blur placeholder (tiny, inline) */} <img src={post.thumbnail_blur} // 10x10 base64 className={loaded ? 'hidden' : 'blur'} /> {/* Full image (lazy loaded) */} <img src={post.media_url} loading="lazy" onLoad={() => setLoaded(true)} className={loaded ? 'visible' : 'hidden'} /> </div> );};Stories Ring Interaction
// Horizontal scroll with snap pointsconst StoriesRing = ({ stories }) => { return ( <div className="stories-ring" style={{ display: 'flex', overflowX: 'scroll', scrollSnapType: 'x mandatory', WebkitOverflowScrolling: 'touch' // Smooth iOS scroll }}> {stories.map(story => ( <div key={story.id} style={{ scrollSnapAlign: 'start' }} > <StoryAvatar story={story} /> </div> ))} </div> );};Optimistic Updates
// Like button with optimistic UIconst LikeButton = ({ post }) => { const [optimisticLiked, setOptimisticLiked] = useState(post.viewer_has_liked); const [optimisticCount, setOptimisticCount] = useState(post.like_count); const handleLike = async () => { // Optimistic update (immediate feedback) setOptimisticLiked(!optimisticLiked); setOptimisticCount(prev => optimisticLiked ? prev - 1 : prev + 1); try { await api.toggleLike(post.id); } catch (error) { // Rollback on failure setOptimisticLiked(post.viewer_has_liked); setOptimisticCount(post.like_count); showError('Failed to like post'); } }; return ( <button onClick={handleLike}> <HeartIcon filled={optimisticLiked} /> <span>{formatCount(optimisticCount)}</span> </button> );};Conclusion
Instagram’s architecture demonstrates several principles that recur in any photo- or short-video sharing platform at this scale:
Architectural decisions:
- Hybrid fan-out bounds per-post write amplification while keeping cached reads cheap. The follower threshold is a tunable knob — Instagram and Twitter both keep theirs unpublished — rather than a magic number.
- Tiered blob storage matches the read distribution: a Haystack-shaped hot tier for fresh uploads, an f4-shaped warm tier (Reed-Solomon (10,4), ~2.1x effective replication) for the long tail, and a cold archive for everything else.
- A purpose-built graph store (TAO-style objects and associations) keeps the dominant read shape — single-hop edge lookups — partition-local and cache-resident.
- Separate storage strategies for ephemeral (Stories) and persistent (Posts) content match each surface’s access pattern and lifetime budget.
- Three-stage recommendation funnel (retrieval → early ranking → late ranking, as described in Meta’s Explore architecture post) enables personalization across billions of items inside an inference budget measured in milliseconds.
- MQTT for real-time keeps a persistent, low-overhead channel open on mobile, which has been Meta’s preferred shape for chat infrastructure since 2011.
Optimizations this design achieves:
- Feed load: p99 < 500ms through cached timelines + celebrity merge
- Stories load: p99 < 200ms through aggressive prefetch
- Upload processing: < 10s for immediate user feedback
- Global delivery: 95%+ CDN cache hit rate exploiting power-law distribution
Known limitations:
- Hybrid fan-out requires maintaining two read paths and a merge step.
- The high-fan-out follower threshold is a tunable heuristic; the right value drifts as social-graph distributions shift.
- Continual / hourly fine-tuning of the ranker introduces a small staleness window around fast-moving content.
- Multi-region replication is asynchronous, so reads that race a write can briefly see stale state.
Alternative approaches not chosen:
- Pure push (write amplification at celebrity scale)
- Pure pull (read latency unacceptable)
- Single-region (latency for global users)
Appendix
Prerequisites
- Distributed systems fundamentals (CAP theorem, eventual consistency)
- Database concepts (sharding, replication, indexing)
- Caching strategies (write-through, write-behind, cache invalidation)
- CDN and content delivery concepts
- Basic ML concepts (embeddings, neural networks)
Summary
- Hybrid fan-out — push to follower timeline caches for the long tail of accounts; pull-merge above a tunable follower threshold to bound write amplification.
- Image processing pipeline — variants generated for the 320–1080 px supported range; filters offloaded to GPU on-device.
- Tiered photo storage — Haystack-style hot tier (in-memory needle index, ~1 disk seek per read), f4-style warm tier (Reed-Solomon (10,4), ~2.1x effective replication), cold archive for the long tail.
- Social graph storage — TAO-style objects + associations layered over a sharded relational store, with leader/follower caches keeping cross-region reads fast.
- Stories architecture — 24-hour TTL with aggressive prefetch targeting sub-200 ms perceived load.
- Feed ranking — multi-task neural networks fed by tens-to-hundreds of thousands of features, fine-tuned continually on engagement events.
- Explore recommendation — three-stage funnel (Two Towers retrieval → early ranking → late ranking) over 1,000+ production ML models.
- MQTT for real-time — persistent low-overhead channel for DMs, notifications, and presence; Meta’s preferred mobile chat substrate since 2011.
- Upload-time safety + privacy — strip GPS/serial EXIF before serving variants; run PhotoDNA / PDQ / GIFCT perceptual-hash matching inline before a CDN URL becomes addressable.
References
- Finding a Needle in Haystack: Facebook’s Photo Storage — hot-tier blob store with in-memory needle index (OSDI 2010).
- f4: Facebook’s Warm BLOB Storage System — warm-tier Reed-Solomon (10,4) erasure coding (OSDI 2014).
- TAO: Facebook’s Distributed Data Store for the Social Graph — objects + associations graph store (USENIX ATC 2013).
- Sharding & IDs at Instagram — original 64-bit ID generation design (Instagram Engineering).
- What Powers Instagram — early architecture overview (Instagram Engineering).
- Migrating from AWS to Facebook — the “Instagration” project (Instagram Engineering).
- How Facebook Moved 20 Billion Instagram Photos Without You Noticing — secondary coverage of the migration (Wired, 2014).
- Making Direct Messages Reliable and Fast — DM architecture and the Mutation Manager (Instagram Engineering).
- Building Facebook Messenger — origin of MQTT use for Meta’s mobile chat (Engineering at Meta, 2011).
- Scaling the Instagram Explore Recommendations System — three-stage funnel and Two Towers architecture (Engineering at Meta, 2023).
- Journey to 1000 Models — ML infrastructure at scale (Engineering at Meta, 2025).
- Powered by AI: Instagram’s Explore Recommender System — earlier deep-dive on Explore retrieval (Meta AI).
- Instagram Video Processing and Encoding Reduction — video pipeline optimization (Engineering at Meta, 2022).
- Introducing mcrouter — caching infrastructure (Engineering at Meta, 2014).
- Image resolution of photos you share on Instagram — supported widths and resize behavior (Instagram Help Center).
- Instagram now has 3 billion monthly active users — Sept 2025 announcement (CNBC).
- MQTT Version 5.0 — protocol specification (OASIS).
- Timelines at Scale — Raffi Krikorian on Twitter’s hybrid fan-out home timeline (QCon 2013, InfoQ).
- Open-sourcing a 10x reduction in Apache Cassandra tail latency — Rocksandra: RocksDB-backed Cassandra storage engine (Instagram Engineering, 2018).
- Open-sourcing photo- and video-matching technology — Meta’s PDQ image hash and TMK+PDQF video hash for safety scanning (2019).
- PhotoDNA — Microsoft’s perceptual hash service for CSAM detection.
- GIFCT Hash Sharing Database — industry-shared terror-content hash bank.
- NCMEC Hash Sharing — National Center for Missing & Exploited Children hash list.