Design a Social Feed (Facebook/Instagram) — Sujeet Jaiswal

A social feed at Facebook or Instagram scale is fundamentally three problems wedged together: distribute new posts to the right followers, rank the resulting candidate set so the most relevant items surface first, and serve the result in well under a second across hundreds of millions of concurrent sessions. None of the three is independently hard; the friction is that every reasonable solution to one makes the other two worse. This article walks through the production-tested decomposition that Facebook (Multifeed), Instagram (1000-model recommendation stack), and Twitter (Redis timeline cache) converged on, and the trade-offs that make it work.

High-level system overview: client → edge → API gateway → Multifeed (aggregator + leaf) → multi-stage ranking; post pipeline writes through TAO and fans out via Kafka into the feed cache.

Abstract

Three interconnected problems define the system: content distribution (getting posts into follower feeds), personalised ranking (ordering relevance), and low-latency retrieval (sub-second loads). The architecture below is a synthesis of public Meta, Instagram, and Twitter engineering descriptions; specific scale numbers are date-stamped throughout because production architectures evolve quickly.

Core architectural decisions:

Decision	Choice	Rationale
Fan-out strategy	Hybrid push/pull	Push pre-computes feeds for typical accounts; pull avoids write storms for high-follower accounts. The per-producer push-vs-pull decision was formalised in Yahoo Research’s Feeding Frenzy (SIGMOD 2010).
Content storage	TAO-style graph store	Objects + associations fit social data and admit single-shard queries on `(id1, atype)` association lists (TAO paper, USENIX ATC ‘13).
Feed ranking	Multi-stage ML funnel	Inventory → lightweight Pass 0 → heavy multitask NN → contextual rerank (News Feed ranking, 2021).
Caching	Two-tier (leader/follower) with leases	Followers absorb reads; leaders serialise invalidations; lease-get throttles thundering herds (Scaling Memcache, NSDI ‘13).
Consistency	Eventual (~minutes); strong for writer’s own writes	Acceptable for social content; Meta’s Polaris reports ~10 nines (99.99999999%) cache consistency within 5 minutes for TAO-class caches.

Key trade-offs accepted:

Eventual consistency for follower feeds — most users tolerate a few seconds to minutes of staleness; the writer always sees their own write through read-after-write routing.
Higher write amplification for typical users in exchange for O(1) reads.
A second code path for high-follower accounts in exchange for bounded write fan-out.
Periodic ranker retraining (model staleness between deploys) in exchange for training cost and stability.

What this design optimises:

p99 feed load well under one second globally.
No data loss for user-generated posts (durable persistence before fan-out).
Personalised ranking with thousands of features per candidate (transparency.meta.com).
Linear horizontal scaling of stateless tiers; sharded scaling of stateful ones.

Requirements

Functional Requirements

Requirement	Priority	Notes
Home feed generation	Core	Aggregated posts from followed accounts.
Post creation	Core	Text, images, video, with privacy scope.
Feed ranking	Core	Personalised relevance ordering.
Real-time feed updates	Core	New posts surface without full refresh.
Engagement actions	Core	Like, comment, share, save.
Following/followers	Core	Asymmetric directed social graph.
Feed pagination	Core	Cursor-based infinite scroll.
Post visibility	Extended	Public, friends-only, custom audience.
Stories / ephemeral content	Extended	24-hour expiring content.
Algorithmic vs. chronological toggle	Extended	User-selectable feed type.

Non-Functional Requirements

Requirement	Target	Rationale
Availability	99.99% (4 nines)	Revenue-critical; user retention.
Feed load latency	p99 < 500 ms	Above this, scroll fluidity degrades.
Post publish latency	p99 < 2 s	Acceptable for async fan-out.
Feed freshness	< 1 minute	Balance between freshness and cost.
Ranking model latency	p99 < 100 ms	Real-time personalisation budget.
Data durability	11 nines	No user content loss (mirrors S3 SLO).

Scale Estimation

These are illustrative numbers in the Facebook/Instagram class. Treat them as design inputs rather than published numbers from any specific platform.

Users:

Monthly Active Users (MAU): ≈ 2 billion.
Daily Active Users (DAU): ≈ 1 billion.
Peak concurrent users: ≈ 200 million.

Traffic:

Feed loads per user per day: ≈ 20.
Daily feed requests: 1 B × 20 = 20 B/day ≈ 230 K RPS average.
Peak multiplier (3×) → ≈ 700 K RPS.
Posts per user per day (mean, heavy power-law tail): ≈ 0.5.
Daily posts: 500 M/day ≈ 5.8 K posts/sec.

Storage:

Average post metadata: ≈ 2 KB.
Average media per post (after compression, CDN-served): ≈ 500 KB.
Daily metadata: 500 M × 2 KB ≈ 1 TB/day.
Daily media: 500 M × 500 KB ≈ 250 TB/day.
Social graph edges: 2 B users × 500 mean connections ≈ 1 trillion edges.
Graph storage at ≈ 100 B/edge: ≈ 100 TB.

Fan-out estimation:

Mean follower count: 500.
Posts requiring fan-out: 500 M/day.
Naive push writes: 500 M × 500 = 2.5 × 10¹¹ writes/day.
Without celebrity carve-out: top 1% (20 M users) × ≈ 50 K followers × 0.5 posts ≈ 5 × 10¹⁴ writes/day — operationally infeasible.

Important

The “infeasible” number is the load-bearing argument for hybrid fan-out. Any single-strategy design either burns RAM on dead writes or burns CPU on read-time aggregation. Hybrid is not an optimisation; it is the only design that survives the long tail.

Design Paths

Path A: Fan-out on Write (Push Model)

Best when:

Follower distribution is roughly uniform (no celebrities).
Reads dominate writes.
Real-time freshness is critical.
Operational simplicity is preferred over storage efficiency.

Architecture:

Push fan-out: persist the post, then for every follower write the post id into a pre-computed feed cache.

Key characteristics:

Pre-computed feeds in Redis or Memcached.
O(1) feed reads, O(followers) writes per post.
Feed cache stores post IDs in time- or score-sorted order.

Trade-offs:

Pro	Con
Single cache lookup on read.	Write amplification scales with follower count.
Predictable, low read latency.	Storage cost from duplicating post references across feeds.
Simple feed retrieval logic.	Delivery is delayed by fan-out lag for very high-follower accounts.

Production reference. Twitter’s home-timeline service is the canonical large push-fanout deployment. As of 2014, the Timeline Service ran across roughly 10,000 Redis instances totalling about 105 TB of RAM and serving ≈ 39 M QPS, with each user’s home timeline capped at 800 tweets to bound memory (Yao Yue talk, summarised on High Scalability; cap detail in antirez’s Twitter internals notes). To fit the timeline encoding into RAM, Twitter built a “Hybrid List” data structure that was upstreamed into Redis as the now-standard quicklist. By 2017, Twitter’s cache fleet had been re-segmented and reported ranges of 10–50 M QPS across hundreds-to-thousands of instances per cluster (Twitter Engineering, 2017). The full pipeline behind the cache: tweets are persisted to Twitter’s Manhattan KV store, then a Heron topology (the post-Storm streaming engine) consumes the write log and runs the fan-out, writing the new tweet id into each follower’s pre-computed Redis timeline.

Path B: Fan-out on Read (Pull Model)

Best when:

Many high-follower accounts exist (write storms would dominate).
Writes dominate reads.
Storage cost is the primary constraint.
Read latency budget can tolerate aggregation work.

Architecture:

Pull fan-out flow: on each request, fetch the followee set, query each followee's recent posts, merge and sort, then rank. — Pull fan-out: fetch followees, gather their recent posts, merge, then rank — all on the read path.

Key characteristics:

No write amplification per post.
Feed computed on demand from author indices.
Read complexity is O(followees × posts/followee) before pruning.

Trade-offs:

Pro	Con
No write amplification.	Higher read latency.
Always uses the freshest content.	More compute and IO per request.
Lower storage requirements.	Complex ranking is harder to amortise per read.

Production reference. Facebook’s Multifeed is a pull-style read path: when a user requests their feed, an aggregator queries every leaf in a replica, retrieves potential candidates, then ranks and filters down to roughly 45 items. The aggregator/leaf separation was operationalised by 2010 (the family is sometimes called “Aggregator-Leaf-Tailer”) and Facebook published the disaggregated redesign in March 2015, reporting ≈ 40 % reduction in total memory + CPU consumption versus the prior co-located layout.

Path C: Hybrid Model (Industry Standard)

Best when:

Follower distribution has a heavy tail (most users small, some accounts very large).
Both reads and writes need to be optimised.
Some additional system complexity is acceptable.
Facebook/Instagram/Twitter scale.

Architecture:

Hybrid fan-out: push for typical accounts; mark celebrity posts and merge them on read.

Key characteristics:

Threshold-based routing on follower count (commonly described in the 10 K – 100 K range; the actual number is per-platform and unpublished).
Pre-computed feeds plus on-demand celebrity merging at read time.
A graceful fallback: if the cached feed degrades, the read path can fall back toward pull.

Trade-offs:

Pro	Con
Optimal latency for the typical user.	Two fan-out code paths to build and operate.
Bounds write amplification for celebrities.	More complex feed-read logic (cache + pull merge).
Threshold can be tuned per cohort.	Edge cases when accounts cross the threshold.

Production reference. Twitter’s current home timeline is hybrid: regular accounts push through the Heron fan-out into per-follower Redis timelines; very high-follower accounts skip write fan-out and are merged into the requester’s timeline at read time, with Manhattan as the durable source of truth (Manhattan deployments at scale; Twitter infrastructure 2017). The theoretical underpinning is older: Yahoo’s Feeding Frenzy proved that the global cost-minimising policy is a per-producer/consumer decision driven by the producer’s update rate vs. the consumer’s view rate — exactly the celebrity-vs-typical split, generalised. Discord’s large-server architecture is structurally similar but uses a different substrate: each guild is a single Elixir process on the BEAM VM, and “relays” sit between the guild process and per-user session processes — each relay handling fan-out to up to ~15,000 sessions, with “passive” connections skipped to cut fan-out work by roughly 90 % when users are not viewing the guild (Maxjourney: Pushing Discord’s Limits). The shared idea across these systems is the same: avoid quadratic broadcast cost by routing the long tail through a pull-style merge.

Path Comparison

Factor	Fan-out Write	Fan-out Read	Hybrid
Read latency	Low (single cache lookup)	Higher (aggregate + rank)	Low for typical, moderate for heavy
Write cost	O(followers)	O(1)	O(followers) below threshold; O(1) above
Celebrity handling	Infeasible at scale	Native	Carved out
Storage cost	High (duplicated refs)	Low	Medium
Operational complexity	Low	Medium	High
Production examples	Twitter (push tier today)	Facebook Multifeed	Twitter overall, Instagram, Discord

This Article’s Focus

The rest of this article goes deep on Path C (Hybrid) because:

It is the architecture every modern high-scale feed has converged to.
Reasoning about why it exists is the best way to understand fan-out trade-offs.
The mechanics (graph store, ranker funnel, cache leasing) are reusable building blocks.

High-Level Design

Component Overview

Component tier flow: client → edge/CDN → API gateway → aggregator (Multifeed) + ranking pipeline; post path goes through post + graph + engagement services into a TAO/Redis/MySQL/Kafka/S3 data layer with a separate ML platform. — Component tier flow: a stateless edge fronts a Multifeed-style read tier; the write tier persists to TAO and fans out via Kafka.

Feed Generation Service (Multifeed)

The feed-generation layer follows Facebook’s Multifeed shape with disaggregated tiers:

Aggregator tier:

CPU-intensive query fan-out and ranking.
Stateless and horizontally scalable.
Issues parallel queries to leaf servers and merges results.
Calls into the ranking service / model server for scoring.

Leaf tier:

Memory-intensive; stores in-RAM indices of recent author actions.
Indexes posts by author, sorted by time.
Sharded by author user_id so that an author’s posts live on a single leaf.

Tailer:

Real-time pipeline (e.g. Kafka) that updates leaf indices as posts are created.
Owns index rebuild from persistent storage on startup or after loss.

Why disaggregate. In Multifeed’s pre-2015 layout, aggregator and leaf code shared a host. CPU-bound aggregation contended with memory-bound leaf indices; servers had to be over-provisioned on whichever resource was tightest. Splitting them into separate pools allowed independent scaling and reduced total memory + CPU consumption by ≈ 40 %, and let Facebook tune CPU-to-RAM ratios per tier instead of per host (2015 redesign post).

Design decisions:

Decision	Choice	Rationale
Tier separation	Disaggregated aggregator / leaf	Independent scaling, ≈ 40 % efficiency gain.
Leaf sharding	By author `user_id`	Co-locates an author’s posts for range queries.
Index structure	Time-sorted skiplist / sorted run	Fast time-range queries, O(log n) inserts.
Memory management	LRU eviction, flash overflow	Keep hot recent posts in RAM; spill cold to SSD.

The social graph (follows, friends, blocks) and content relationships sit behind a TAO-style API (TAO paper, USENIX ATC ‘13). TAO is a thin caching layer on top of sharded MySQL that exposes a graph API and aggressively caches association lists; the published deployment serves on the order of a billion reads per second and millions of writes per second.

Data model:

Objects — typed nodes identified by a 64-bit id: (id) → (otype, key→value). Examples: user, post, comment, page.
Associations — typed directed edges keyed on (id1, atype, id2): (id1, atype, id2) → (time, key→value). The 32-bit time field is the secondary sort key.

Why this shape.

Two storage shapes (objects, associations) keep the schema and the cache key space small.
Associations are stored on the shard owning id1, so the most common access pattern — “give me id1’s associations of type atype, ordered by recency” — is single-shard.
The time-ordered association list is exactly what a feed retrieval needs: latest posts by author, latest likes by user, etc.

TAO data model: Object table holds typed nodes with id, type, and a data blob; Association table holds (id1, atype, id2) edges with a time secondary sort key; both source and target are objects. — TAO data model: objects are typed nodes; associations are typed (id1, atype, id2) edges sorted by time on the id1 shard.

Fan-out Service

Owns post distribution to follower feeds.

Routing logic:

1interface FanoutRouter {2  routePost(post: Post, author: User): FanoutStrategy3}45type FanoutStrategy =6  | { type: "push"; followers: string[] }7  | { type: "pull"; authorId: string }8  | { type: "hybrid"; pushTo: string[]; markForPull: boolean }

Threshold-based routing (illustrative):

Authors with < 10 K followers: full push fan-out.
Authors with 10 K – 1 M followers: hybrid — push to active followers, mark post for read-time pull for the rest.
Authors with > 1 M followers: pull-only — index the post on the author’s timeline and merge into followers’ feeds at read time.

Note

The exact threshold values used at Facebook, Instagram, or Twitter are not publicly documented. Treat the numbers above as design defaults. What is documented: the shape — a single follower-count threshold is the standard switch.

Ranking Service

The ranker is structured as a multi-stage funnel that progressively narrows the candidate set to fit a tight serving budget. Meta has published two such funnels in detail:

Facebook News Feed (2021): Inventory → Pass 0 (lightweight model, ≈ 500 candidates) → Pass 1 (heavy multitask neural network) → Pass 2 (contextual reranking for diversity and integrity), evaluating thousands of signals per candidate (News Feed ranking, 2021; Transparency Center).
Instagram Explore (2023): Sourcing (retrieval) → early-stage ranking (lightweight) → late-stage ranking (multi-task multi-label, MTML) → final blending and policy filters, with retrieval relying on two-tower neural networks and approximate nearest-neighbour search (Scaling Instagram Explore, 2023). By 2025, the platform was operating ≈ 1,000 models across surfaces (Journey to 1000 models).
Pinterest Home Feed: Retrieval (graph-walk candidates from Pixie, an in-memory random-walk service over a 7 B-node Pin/board graph) → unified two-tower pre-ranker on a root-leaf inference fleet → heavy ranker (Scorpion) → final blend; per-user candidate queues are materialised in HBase by the Smartfeed worker so the read path stays cheap.
LinkedIn: many feed candidate sources (jobs, companies, people-you-may-know, profile-also-viewed) are powered by Browsemaps, a horizontal item-to-item collaborative-filtering platform — co-occurrences computed offline on Hadoop, served online via a key-value store. Heavy ranking on top is described in LiRank.

Multi-stage ranking funnel: candidate retrieval narrows billions of items to roughly five hundred via a lightweight model, then a heavy multitask neural network reranks them, and a final pass applies diversity and integrity rules. — Multi-stage ranking: each stage narrows the candidate set so the heaviest model only runs on a small slate.

Why a funnel. The most expensive model is the most accurate, but you cannot afford to run it on every candidate. A funnel lets you spend a sub-millisecond budget per item early (cheap features, simple model), reserve the milliseconds-per-item budget for a small, already-good slate, and apply the most expensive policy logic only on the final ≈ 50 items the user will actually see.

API Design

Feed Endpoints

Get Home Feed

Endpoint: GET /api/v1/feed

Query Parameters:

Parameter	Type	Description
cursor	string	Pagination cursor (opaque)
limit	int	Items per page (default: 20, max: 50)
refresh	boolean	Force fresh feed generation

Response (200 OK):

1{2  "posts": [3    {4      "id": "post_abc123",5      "author": {6        "id": "user_456",7        "username": "johndoe",8        "displayName": "John Doe",9        "avatarUrl": "https://cdn.example.com/avatars/456.jpg",10        "isVerified": true11      },12      "content": {13        "type": "image",14        "text": "Beautiful sunset!",15        "media": [16          {17            "id": "media_789",18            "type": "image",19            "url": "https://cdn.example.com/posts/789.jpg",20            "thumbnailUrl": "https://cdn.example.com/posts/789_thumb.jpg",21            "width": 1080,22            "height": 1350,23            "altText": "Sunset over the ocean"24          }25        ]26      },27      "engagement": {28        "likeCount": 1542,29        "commentCount": 89,30        "shareCount": 23,31        "viewCount": 12450,32        "isLiked": false,33        "isSaved": false34      },35      "ranking": {36        "score": 0.89,37        "reason": "friend_interaction"38      },39      "createdAt": "2024-02-03T10:30:00Z",40      "visibility": "public"41    }42  ],43  "pagination": {44    "nextCursor": "eyJ0IjoxNzA2ODg2NDAwfQ",45    "hasMore": true46  },47  "meta": {48    "feedType": "ranked",49    "generatedAt": "2024-02-03T12:00:00Z"50  }51}

Create Post

Endpoint: POST /api/v1/posts

Request:

1{2  "content": {3    "text": "Hello world!",4    "mediaIds": ["upload_123", "upload_456"]5  },6  "visibility": "public",7  "allowComments": true,8  "allowSharing": true,9  "location": {10    "latitude": 37.7749,11    "longitude": -122.4194,12    "name": "San Francisco, CA"13  }14}

Response (201 Created):

1{2  "id": "post_new789",3  "author": {4    "id": "user_123",5    "username": "currentuser"6  },7  "content": {8    "text": "Hello world!",9    "media": [...]10  },11  "createdAt": "2024-02-03T12:05:00Z",12  "visibility": "public",13  "fanoutStatus": "pending"14}

Real-time Feed Updates

For real-time updates of feed-level events (new post available, engagement counts), this design uses Server-Sent Events (SSE) because feed updates are unidirectional (server → client) and SSE inherits HTTP/2 multiplexing, native browser auto-reconnect, and proxy/CDN compatibility.

Endpoint: GET /api/v1/feed/stream

Event types:

1event: new_post2data: {"postId": "post_xyz", "authorId": "user_456", "preview": "Check out..."}34event: engagement_update5data: {"postId": "post_abc", "likeCount": 1543, "commentCount": 90}67event: post_removed8data: {"postId": "post_old", "reason": "author_deleted"}

Tip

Real-world Meta apps don’t use SSE for the long-lived mobile connection. They use MQTT over a persistent TCP connection for messaging and notification delivery — a binary publish/subscribe protocol designed for unreliable mobile networks and battery-constrained devices (Building Facebook Messenger, Engineering at Meta, 2011). For a feed-update fan-in into a web client, SSE is a pragmatic choice; for a mobile app, MQTT is what production sends down the wire.

Engagement Endpoints

Like / Unlike Post

Endpoint: POST /api/v1/posts/{id}/like

Response (200 OK):

1{2  "liked": true,3  "likeCount": 1543,4  "timestamp": "2024-02-03T12:10:00Z"5}

Get Comments

Endpoint: GET /api/v1/posts/{id}/comments

Query Parameters:

Parameter	Type	Description
cursor	string	Pagination cursor
limit	int	Comments per page (default: 20)
sort	string	”top” (engagement), “newest”, “oldest”

Response (200 OK):

1{2  "comments": [3    {4      "id": "comment_123",5      "author": {6        "id": "user_789",7        "username": "commenter",8        "avatarUrl": "..."9      },10      "text": "Great post!",11      "likeCount": 45,12      "replyCount": 3,13      "isLiked": false,14      "createdAt": "2024-02-03T10:35:00Z",15      "replies": []16    }17  ],18  "pagination": {19    "nextCursor": "...",20    "hasMore": true21  },22  "totalCount": 8923}

Error Responses

Code	Error	When
400	`INVALID_CONTENT`	Post content violates rules
401	`UNAUTHORIZED`	Missing or invalid token
403	`FORBIDDEN`	User blocked or content restricted
404	`POST_NOT_FOUND`	Post doesn’t exist or is deleted
429	`RATE_LIMITED`	Too many requests
503	`FEED_UNAVAILABLE`	Feed generation temporarily unavailable

Rate limits (illustrative):

Endpoint	Limit	Window
Feed load	60	per minute
Post creation	10	per hour
Like/comment	100	per minute
Media upload	50	per hour

Data Modelling

TAO Schema (Objects and Associations)

Objects Table

1CREATE TABLE objects (2    id BIGINT PRIMARY KEY,3    type VARCHAR(50) NOT NULL,4    data BLOB,5    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,6    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,7    INDEX idx_type_created (type, created_at)8);

Object types:

Type	Data fields
user	username, display_name, avatar_url, bio, follower_count
post	author_id, content, visibility, like_count, comment_count
comment	post_id, author_id, text, like_count
media	post_id, url, type, dimensions, alt_text

Associations Table

1CREATE TABLE associations (2    id1 BIGINT NOT NULL,3    assoc_type VARCHAR(50) NOT NULL,4    id2 BIGINT NOT NULL,5    time BIGINT NOT NULL,6    data BLOB,7    PRIMARY KEY (id1, assoc_type, id2),8    INDEX idx_id1_type_time (id1, assoc_type, time DESC)9);

Association types:

Type	id1	id2	data
follows	follower_id	followee_id	created_at
authored	user_id	post_id	-
liked	user_id	post_id	created_at
commented	user_id	comment_id	-
tagged	post_id	user_id	-

Sharding strategy:

Shard on id1 (the source object).
Co-locates a user’s follows, likes, and authored posts on a single shard.
Single-shard query for the most common pattern: “give me id1’s atype associations sorted by recency.”

Feed Cache Schema (Redis)

1ZADD feed:{user_id} {score} {post_id}23ZREMRANGEBYRANK feed:{user_id} 0 -50145HSET feed:meta:{user_id}6    last_generated 17068864000007    version 428    type "ranked"910ZADD celebrity_posts:{author_id} {timestamp} {post_id}1112ZADD user:engaged:{user_id} {timestamp} {post_id}13EXPIRE user:engaged:{user_id} 604800

Per-feed caps (the trim above keeps the last 500 entries) directly mirror Twitter’s published 800-tweet timeline cap; the precise number is a memory-vs-pagination-depth knob (antirez notes).

Post Index (Leaf Servers)

In-memory index structure on leaf servers:

1interface PostIndex {2  authorIndex: Map<UserId, SortedSet<PostId, Timestamp>>34  visibilityIndex: Map<Visibility, Set<PostId>>56  recentPosts: SortedSet<PostId, Timestamp>7}89interface SortedSet<K, S> {10  add(key: K, score: S): void11  range(start: S, end: S, limit: number): K[]12  remove(key: K): void13}

MySQL Persistent Storage

1CREATE TABLE users (2    id BIGINT PRIMARY KEY AUTO_INCREMENT,3    username VARCHAR(50) UNIQUE NOT NULL,4    display_name VARCHAR(100),5    email VARCHAR(255) UNIQUE NOT NULL,6    password_hash VARCHAR(255) NOT NULL,7    avatar_url TEXT,8    bio TEXT,9    follower_count INT DEFAULT 0,10    following_count INT DEFAULT 0,11    post_count INT DEFAULT 0,12    is_verified BOOLEAN DEFAULT FALSE,13    is_celebrity BOOLEAN DEFAULT FALSE,14    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,15    INDEX idx_username (username),16    INDEX idx_email (email)17) ENGINE=InnoDB;1819CREATE TABLE posts (20    id BIGINT PRIMARY KEY AUTO_INCREMENT,21    author_id BIGINT NOT NULL,22    content_text TEXT,23    content_type ENUM('text', 'image', 'video', 'link') NOT NULL,24    visibility ENUM('public', 'friends', 'private') DEFAULT 'public',25    like_count INT DEFAULT 0,26    comment_count INT DEFAULT 0,27    share_count INT DEFAULT 0,28    view_count INT DEFAULT 0,29    is_deleted BOOLEAN DEFAULT FALSE,30    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,31    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,32    INDEX idx_author_created (author_id, created_at DESC),33    INDEX idx_created (created_at DESC),34    FOREIGN KEY (author_id) REFERENCES users(id)35) ENGINE=InnoDB;3637CREATE TABLE follows (38    follower_id BIGINT NOT NULL,39    followee_id BIGINT NOT NULL,40    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,41    PRIMARY KEY (follower_id, followee_id),42    INDEX idx_followee (followee_id),43    FOREIGN KEY (follower_id) REFERENCES users(id),44    FOREIGN KEY (followee_id) REFERENCES users(id)45) ENGINE=InnoDB;4647CREATE TABLE post_likes (48    user_id BIGINT NOT NULL,49    post_id BIGINT NOT NULL,50    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,51    PRIMARY KEY (user_id, post_id),52    INDEX idx_post (post_id),53    FOREIGN KEY (user_id) REFERENCES users(id),54    FOREIGN KEY (post_id) REFERENCES posts(id)55) ENGINE=InnoDB;

Database Selection Matrix

Data type	Store	Rationale
Social graph	TAO (MySQL-backed)	Optimised for graph queries, association lists
User profiles	MySQL	ACID, moderate scale
Posts	MySQL + TAO cache	Durability + fast graph queries
Feed cache	Redis	Sub-ms reads; sorted sets fit ranked feeds
Post index	Leaf servers (memory)	Ultra-fast candidate retrieval
Media	S3 + CDN	Cost-effective, globally distributed
ML features	Feature store	Consistent features for training and serving
Analytics	ClickHouse	Time-series, large-aggregation queries

Low-Level Design

Feed Generation Algorithm

Candidate Retrieval

1class CandidateRetriever {2  private readonly leafClient: LeafClient3  private readonly taoClient: TAOClient4  private readonly redis: RedisCluster56  async getCandidates(userId: string): Promise<CandidateSet> {7    const followees = await this.taoClient.getAssociations(userId, "follows", { limit: 5000 })89    const postPromises = followees.map((followee) =>10      this.leafClient.getAuthorPosts(followee.id2, {11        since: Date.now() - 7 * 24 * 60 * 60 * 1000,12        limit: 50,13      }),14    )1516    const postsByAuthor = await Promise.all(postPromises)1718    const celebrityFollowees = followees.filter((f) => f.data.isCelebrity)19    const celebrityPosts = await this.getCelebrityPosts(celebrityFollowees.map((f) => f.id2))2021    const allPosts = [...postsByAuthor.flat(), ...celebrityPosts]2223    const eligible = await this.filterEligible(userId, allPosts)2425    return {26      candidates: eligible,27      source: {28        fromCache: postsByAuthor.length,29        fromCelebrity: celebrityPosts.length,30      },31    }32  }3334  private async filterEligible(userId: string, posts: Post[]): Promise<Post[]> {35    const [blocked, hidden, seen] = await Promise.all([36      this.taoClient.getAssociations(userId, "blocks"),37      this.redis.smembers(`hidden:${userId}`),38      this.redis.smembers(`seen:${userId}`),39    ])4041    const blockedSet = new Set(blocked.map((b) => b.id2))42    const hiddenSet = new Set(hidden)43    const seenSet = new Set(seen)4445    return posts.filter(46      (post) =>47        !blockedSet.has(post.authorId) &&48        !hiddenSet.has(post.id) &&49        !seenSet.has(post.id) &&50        this.checkVisibility(userId, post),51    )52  }53}

Multi-Stage Ranking

1class FeedRanker {2  private readonly featureStore: FeatureStore3  private readonly modelServer: ModelServer45  async rank(userId: string, candidates: Post[]): Promise<RankedPost[]> {6    const stage1 = await this.firstPassRank(userId, candidates)7    const top500 = stage1.slice(0, 500)89    const stage2 = await this.secondPassRank(userId, top500)10    const top100 = stage2.slice(0, 100)1112    const final = await this.finalRerank(userId, top100)1314    return final.slice(0, 50)15  }1617  private async firstPassRank(userId: string, candidates: Post[]): Promise<RankedPost[]> {18    const features = candidates.map((post) => ({19      postId: post.id,20      recency: this.computeRecency(post.createdAt),21      authorAffinity: this.getAuthorAffinity(userId, post.authorId),22      contentType: post.contentType,23      engagementVelocity: this.getEngagementVelocity(post),24    }))2526    const scores = features.map(27      (f) =>28        0.3 * f.recency +29        0.4 * f.authorAffinity +30        0.2 * f.engagementVelocity +31        0.1 * this.contentTypeBoost(f.contentType),32    )3334    return this.sortByScore(candidates, scores)35  }3637  private async secondPassRank(userId: string, candidates: Post[]): Promise<RankedPost[]> {38    const features = await this.featureStore.getBatch(39      candidates.map((c) => ({40        userId,41        postId: c.id,42        authorId: c.authorId,43      })),44    )4546    const scores = await this.modelServer.predict("feed_ranking_v2", features)4748    return this.sortByScore(candidates, scores)49  }5051  private async finalRerank(userId: string, candidates: RankedPost[]): Promise<RankedPost[]> {52    const diversified = this.applyDiversity(candidates, {53      maxPerAuthor: 2,54      contentTypeMix: { image: 0.4, video: 0.3, text: 0.3 },55      maxAds: 3,56    })5758    const filtered = await this.applyIntegrityFilters(diversified)5960    return filtered61  }6263  private applyDiversity(posts: RankedPost[], rules: DiversityRules): RankedPost[] {64    const result: RankedPost[] = []65    const authorCounts = new Map<string, number>()66    const typeCounts = new Map<string, number>()6768    for (const post of posts) {69      const authorCount = authorCounts.get(post.authorId) || 070      if (authorCount >= rules.maxPerAuthor) continue7172      const typeCount = typeCounts.get(post.contentType) || 073      const typeRatio = typeCount / (result.length + 1)74      const maxRatio = rules.contentTypeMix[post.contentType] || 0.575      if (typeRatio > maxRatio && result.length > 10) continue7677      result.push(post)78      authorCounts.set(post.authorId, authorCount + 1)79      typeCounts.set(post.contentType, typeCount + 1)80    }8182    return result83  }84}

The narrowing factors above (≈ 500 / ≈ 100 / ≈ 50) are deliberate proxies for the Meta-published structure — Pass 0 outputs roughly 500 candidates per request before the heavy multitask network runs (Engineering at Meta, 2021; Transparency Center).

Dedup, freshness, and integrity live in the rerank stage, not the heavy ranker. Three concerns share that budget:

Dedup — the seen set in candidate retrieval drops items already impressed in the last N days; applyDiversity caps per-author and per-content-type density so a single creator cannot monopolise the slate. Near-duplicate detection (same image hash, same article URL across reposts) is typically a separate retrieval-time filter against a rolling content fingerprint.
Freshness vs. ranking — pure recency wins on novelty but loses on relevance, while pure relevance starves new content. Production rerankers add a recency boost (decay function on createdAt) and reserve a small slate quota for unseen-and-recent items so the feed does not stagnate around the user’s strongest historical affinities.
Integrity / abuse — applyIntegrityFilters runs after the heavy ranker so spam, policy violations, and known low-quality patterns can never appear regardless of model score. Meta’s Transparency Center enumerates the categories applied at this stage; the engineering invariant is that integrity is a hard filter, not a soft signal mixed into the score.

Fan-out Pipeline

1class FanoutService {2  private readonly CELEBRITY_THRESHOLD = 10_0003  private readonly BATCH_SIZE = 100045  async fanoutPost(post: Post, author: User): Promise<FanoutResult> {6    const followerCount = author.followerCount78    if (followerCount < this.CELEBRITY_THRESHOLD) {9      return this.pushFanout(post, author)10    } else if (followerCount < 1_000_000) {11      return this.hybridFanout(post, author)12    } else {13      return this.pullOnly(post, author)14    }15  }1617  private async pushFanout(post: Post, author: User): Promise<FanoutResult> {18    const followers = await this.getFollowers(author.id)1920    const batches = this.chunk(followers, this.BATCH_SIZE)21    let written = 02223    for (const batch of batches) {24      const pipeline = this.redis.pipeline()2526      for (const followerId of batch) {27        pipeline.zadd(`feed:${followerId}`, post.createdAt.getTime(), post.id)28        pipeline.zremrangebyrank(`feed:${followerId}`, 0, -501)29      }3031      await pipeline.exec()32      written += batch.length3334      this.metrics.increment("fanout.writes", batch.length)35    }3637    return {38      strategy: "push",39      followersReached: written,40      duration: Date.now() - post.createdAt.getTime(),41    }42  }4344  private async hybridFanout(post: Post, author: User): Promise<FanoutResult> {45    const activeFollowers = await this.getActiveFollowers(author.id, 7)4647    const pushResult = await this.pushToFollowers(post, activeFollowers)4849    await this.redis.zadd(`celebrity_posts:${author.id}`, post.createdAt.getTime(), post.id)50    await this.redis.expire(`celebrity_posts:${author.id}`, 7 * 24 * 60 * 60)5152    return {53      strategy: "hybrid",54      pushed: pushResult.count,55      markedForPull: author.followerCount - activeFollowers.length,56    }57  }5859  private async pullOnly(post: Post, author: User): Promise<FanoutResult> {60    await this.redis.zadd(`celebrity_posts:${author.id}`, post.createdAt.getTime(), post.id)6162    await this.leafClient.indexPost(post)6364    return {65      strategy: "pull",66      indexed: true,67    }68  }69}

Cache Consistency (TAO + Lease Pattern)

TAO uses a two-tier cache (regional follower caches in front of a per-region leader cache) and serialises writes through the leader so invalidations can be ordered. Two operational hazards make the naive read-through dangerous:

Thundering herd. When a hot key expires or is evicted, every concurrent reader cache-misses simultaneously and stampedes the database.
Stale set. Two readers can race: reader A reads a stale value and starts to set it back into the cache; meanwhile a writer invalidates the key; reader A’s set arrives last and reinstates the stale value.

Facebook’s memcache lease mechanism addresses both with one tool — a 64-bit lease token returned with the miss:

Memcache hands out a lease for a key at most once every 10 seconds; the second concurrent miss receives a “wait” hint and retries instead of stampeding the DB.
A set only succeeds if the client presents a still-valid lease token. Any concurrent delete for that key invalidates outstanding leases, so a reader that raced a writer cannot reinstate stale data.

Cache lease sequence: client A misses, receives a lease token, and is alone responsible for refilling the cache; concurrent client B receives a wait hint and retries; client A sets value with token; on retry, B hits the now-populated cache. — Cache lease coordination: a single-flight reader fills the cache; concurrent readers back off and retry instead of stampeding the database.

1class TAOCache {2  private readonly leaseTimeout = 10_00034  async get(id1: string, assocType: string): Promise<Association[] | null> {5    const followerResult = await this.followerCache.get(this.cacheKey(id1, assocType))67    if (followerResult) {8      this.metrics.increment("cache.hit.follower")9      return followerResult10    }1112    const leaderResult = await this.leaderCache.get(this.cacheKey(id1, assocType))1314    if (leaderResult) {15      this.metrics.increment("cache.hit.leader")16      await this.followerCache.set(this.cacheKey(id1, assocType), leaderResult, { ttl: 300 })17      return leaderResult18    }1920    return this.fetchWithLease(id1, assocType)21  }2223  private async fetchWithLease(id1: string, assocType: string): Promise<Association[]> {24    const leaseKey = `lease:${id1}:${assocType}`2526    const acquired = await this.redis.set(leaseKey, "1", "NX", "PX", this.leaseTimeout)2728    if (!acquired) {29      await this.sleep(100)30      return this.get(id1, assocType)31    }3233    try {34      const data = await this.mysql.query(35        `SELECT * FROM associations36         WHERE id1 = ? AND assoc_type = ?37         ORDER BY time DESC`,38        [id1, assocType],39      )4041      await Promise.all([42        this.leaderCache.set(this.cacheKey(id1, assocType), data, { ttl: 3600 }),43        this.followerCache.set(this.cacheKey(id1, assocType), data, { ttl: 300 }),44      ])4546      return data47    } finally {48      await this.redis.del(leaseKey)49    }50  }5152  async invalidate(id1: string, assocType: string): Promise<void> {53    const key = this.cacheKey(id1, assocType)54    await Promise.all([this.followerCache.del(key), this.leaderCache.del(key)])55  }56}

Note

Meta’s Polaris service (2022) measures TAO-class cache consistency end-to-end and reports ≈ 10 nines (99.99999999%) of writes converging within five minutes. The number is best read as: leasing handles the common case, asynchronous monitoring catches the rare-but-real residual, and an out-of-band repair loop forces convergence.

Engagement Counter (Eventual Consistency)

Engagement counts (likes, comments) are the canonical hot-key write path: a single popular post can take tens of thousands of like increments per second across replicas. The standard pattern is write-behind with in-process aggregation — increment the counter immediately in Redis for read consistency, batch the durable updates to MySQL on a fixed flush interval (DDIA, Ch. 11 — Stream Processing, “Materialised views and write-behind”).

1class EngagementService {2  private readonly FLUSH_INTERVAL = 50003  private pendingUpdates = new Map<string, EngagementDelta>()45  async incrementLike(postId: string): Promise<void> {6    await this.redis.hincrby(`post:${postId}`, "like_count", 1)78    this.bufferUpdate(postId, { likes: 1 })9  }1011  private bufferUpdate(postId: string, delta: EngagementDelta): void {12    const existing = this.pendingUpdates.get(postId) || {13      likes: 0,14      comments: 0,15      shares: 0,16    }1718    this.pendingUpdates.set(postId, {19      likes: existing.likes + (delta.likes || 0),20      comments: existing.comments + (delta.comments || 0),21      shares: existing.shares + (delta.shares || 0),22    })23  }2425  @Scheduled(FLUSH_INTERVAL)26  async flushToMySQL(): Promise<void> {27    const updates = new Map(this.pendingUpdates)28    this.pendingUpdates.clear()2930    const queries = Array.from(updates.entries()).map(([postId, delta]) =>31      this.mysql.query(32        `UPDATE posts SET33          like_count = like_count + ?,34          comment_count = comment_count + ?,35          share_count = share_count + ?36         WHERE id = ?`,37        [delta.likes, delta.comments, delta.shares, postId],38      ),39    )4041    await Promise.all(queries)42  }43}

Warning

Write-behind makes counters approximate during incidents. If a process crashes between the Redis increment and the next flush, the in-memory delta is lost. Production deployments persist the delta to an append-only log (e.g. Kafka) before acknowledging the user’s request, then have the flusher consume from the log idempotently.

Frontend Considerations

Feed Virtualisation

Infinite scroll with rich media demands list virtualisation; a feed with thousands of in-DOM nodes will lose hit-test latency and tank scroll FPS.

1interface VirtualFeedConfig {2  containerHeight: number3  estimatedItemHeight: number4  overscan: number5}67class VirtualFeed {8  private heightCache = new Map<string, number>()9  private offsetCache: number[] = []1011  calculateVisibleRange(scrollTop: number, posts: Post[]): { start: number; end: number } {12    let start = this.binarySearchOffset(scrollTop - this.config.overscan * 300)1314    let accumulatedHeight = this.offsetCache[start] || 015    let end = start1617    while (18      end < posts.length &&19      accumulatedHeight < scrollTop + this.config.containerHeight + this.config.overscan * 30020    ) {21      accumulatedHeight += this.getItemHeight(posts[end])22      end++23    }2425    return { start, end }26  }2728  private getItemHeight(post: Post): number {29    if (this.heightCache.has(post.id)) {30      return this.heightCache.get(post.id)!31    }3233    let estimate = 10034    if (post.content.media?.length > 0) {35      estimate += 40036    }37    if (post.content.text?.length > 200) {38      estimate += 5039    }4041    return estimate42  }4344  onItemRendered(postId: string, actualHeight: number): void {45    this.heightCache.set(postId, actualHeight)46    this.rebuildOffsetCache()47  }48}

Optimistic Engagement Updates

1class FeedStore {2  private posts = new Map<string, Post>()3  private pendingLikes = new Set<string>()45  async likePost(postId: string): Promise<void> {6    const post = this.posts.get(postId)7    if (!post || this.pendingLikes.has(postId)) return89    this.pendingLikes.add(postId)10    this.updatePost(postId, {11      engagement: {12        ...post.engagement,13        likeCount: post.engagement.likeCount + 1,14        isLiked: true,15      },16    })1718    this.notifySubscribers(postId)1920    try {21      await this.api.likePost(postId)22    } catch (error) {23      this.updatePost(postId, {24        engagement: {25          ...post.engagement,26          likeCount: post.engagement.likeCount,27          isLiked: false,28        },29      })30      this.notifySubscribers(postId)31    } finally {32      this.pendingLikes.delete(postId)33    }34  }35}

Real-time Feed Updates

1class FeedStreamManager {2  private eventSource: EventSource | null = null3  private reconnectAttempt = 045  connect(userId: string): void {6    this.eventSource = new EventSource(`/api/v1/feed/stream?userId=${userId}`)78    this.eventSource.addEventListener("new_post", (event) => {9      const data = JSON.parse(event.data)10      this.onNewPost(data)11    })1213    this.eventSource.addEventListener("engagement_update", (event) => {14      const data = JSON.parse(event.data)15      this.onEngagementUpdate(data)16    })1718    this.eventSource.onerror = () => {19      this.scheduleReconnect()20    }21  }2223  private onNewPost(data: NewPostEvent): void {24    this.feedStore.setPendingPosts(data.count)25    this.ui.showNewPostsIndicator()26  }2728  private onEngagementUpdate(data: EngagementUpdateEvent): void {29    this.feedStore.updateEngagement(data.postId, data)30  }3132  loadNewPosts(): void {33    const pending = this.feedStore.getPendingPosts()34    this.feedStore.prependPosts(pending)35    this.feedStore.clearPendingPosts()36    this.ui.scrollToTop()37  }38}

Tip

Don’t auto-insert new posts into the visible scroll position. Surface a “New posts” indicator and prepend on user click — auto-insertion at the top causes layout shifts and breaks the user’s scroll context.

Prefetching Strategy

1class FeedPrefetcher {2  private readonly PREFETCH_THRESHOLD = 534  onScroll(visibleRange: { start: number; end: number }, totalPosts: number): void {5    const postsRemaining = totalPosts - visibleRange.end67    if (postsRemaining < this.PREFETCH_THRESHOLD && !this.loading) {8      this.prefetchNextPage()9    }10  }1112  private async prefetchNextPage(): Promise<void> {13    this.loading = true1415    try {16      const cursor = this.feedStore.getNextCursor()17      const response = await this.api.getFeed({ cursor, limit: 20 })1819      this.feedStore.appendPosts(response.posts)20      this.feedStore.setNextCursor(response.pagination.nextCursor)21    } finally {22      this.loading = false23    }24  }25}

Infrastructure

Cloud-Agnostic Components

Component	Purpose	Options
API Gateway	Auth, rate limiting	Kong, Envoy, Nginx
Graph Store	Social graph	Custom TAO-like, Neo4j, DGraph
KV Cache	Feed cache, sessions	Redis, KeyDB, Dragonfly
Message Queue	Fan-out, events	Kafka, Pulsar, NATS
Relational DB	Users, posts	MySQL, PostgreSQL, CockroachDB
Object Store	Media files	MinIO, Ceph, S3-compatible
ML Serving	Ranking models	TensorFlow Serving, Triton, Seldon
Feature Store	ML features	Feast, Tecton, custom

AWS Reference Architecture

Service configurations (illustrative):

Service	Configuration	Rationale
Aggregator	c5.4xlarge (16 vCPU, 32 GB)	CPU-bound ranking computation
Leaf	r5.8xlarge (32 vCPU, 256 GB)	Memory-intensive index storage
Fan-out workers	Fargate Spot	Cost-effective async processing
ElastiCache Redis	r6g.2xlarge cluster mode	Sub-ms feed cache reads
Aurora MySQL	db.r6g.4xlarge Multi-AZ	Durability; read replicas for scale
DynamoDB	On-demand mode	TAO-style graph store, auto-scaling
SageMaker	p3.2xlarge (1 GPU)	Neural network inference

Multi-Region Deployment

Multi-region considerations:

Single primary region for writes; reads served locally from per-region replicas.
Cross-region replication lag in the 50–200 ms range is normal for asynchronous replication and acceptable for social-feed staleness.
Read-after-write needs sticky routing of the writer back to the primary for a short window, otherwise the writer can read a stale follower replica.
Failover to a secondary region should be planned and rehearsed; the primary’s loss of write availability is the dominant tail risk.

Failure Modes

Failure	Detection	Mitigation
Fan-out worker lag (queue depth)	Kafka consumer lag, fan-out latency p99	Auto-scale workers; degrade to “feed slightly stale” instead of refusing.
Hot key on a celebrity’s post	Per-key request rate, leaf CPU	Lease-get to single-flight; replicate hot keys; rate-limit per-author.
Leaf shard loss	Leaf health checks, query failure rate	Replica failover; rebuild from MySQL via Tailer; serve degraded results.
Ranking model regression	Calibration / NE drift, online A/B metrics	Pin previous model; roll forward only when calibration recovers.
Feed cache stampede after eviction	Cache miss rate spike, DB load	Lease-get; staggered TTLs with jitter; serve stale during refresh.
Cross-region replication lag	Replica lag metric, read-after-write hits	Route writers and their reads to primary briefly; hide stale data in UI.
Engagement counter loss on crash	Reconciliation diff Redis ↔ MySQL	Persist deltas to Kafka before ack; idempotent flusher consumes the log.
Celebrity post visibility lag	Pull-merge latency at read time	Pre-warm celebrity post indices; serve while merging asynchronously.

Conclusion

Designing a social feed at this scale is less about discovering a clever algorithm and more about acknowledging which trade-off you cannot avoid:

Sub-second feed reads require pre-computation, which requires write amplification.
Bounded write amplification requires a pull-style escape valve for high-follower accounts.
Personalisation at this latency requires a multi-stage funnel so the heaviest model only sees a handful of items.
Cache correctness under invalidations and stampedes requires explicit coordination — TAO’s leasing is one production-grade answer.
Eventual consistency is the operational reality; the engineering work is bounding how eventual.

Known limitations of the design above:

Sub-minute eventual consistency for follower feeds.
Pull-merged celebrity posts pay a small additional read-time cost.
Periodic ranker retraining means model staleness between deploys.
Two fan-out paths and a multi-tier cache add operational surface.

Future enhancements:

Real-time ranker updates from streaming features (online learning) for short-feedback-loop signals.
Federated or on-device personalisation for privacy-sensitive ranking signals.
Graph neural networks for richer content/audience embeddings during retrieval.
Edge-resident feed assembly for further read-latency reductions on mobile.

Appendix

Prerequisites

Distributed systems fundamentals (caching, sharding, replication).
ML basics (training vs serving, feature engineering).
Graph storage concepts.
Message queue patterns (pub/sub, fan-out).

Terminology

Term	Definition
Fan-out	Distributing a post to multiple follower feeds.
TAO	Facebook’s distributed graph store (The Associations and Objects).
Aggregator	Service that queries and combines data from multiple sources.
Leaf server	Memory-intensive server holding indexed recent activity.
Tailer	Stream consumer that updates leaf indices as posts arrive.
Candidate retrieval	First stage of ranking — gathering potential items.
Affinity	Strength of relationship between user and content/author.
Lease (memcache)	A token issued on a cache miss that arbitrates concurrent writes and rate-limits stampedes.

Summary

Hybrid fan-out (push for typical, pull for high-follower accounts) is the only design that survives the long tail of follower distributions.
TAO-style graph storage with objects + associations admits efficient, single-shard recency queries on association lists.
Multi-stage ML ranking (≈ billions → ~500 → ~50) lets the heaviest model run on only the items that survive cheap pre-filters.
Two-tier regional caching with leasing (per Memcache NSDI ‘13) handles invalidations and stampedes; Polaris reports the residual at ≈ 10 nines.
Eventual consistency (~minutes) is acceptable for social feeds; engagement counts use write-behind with a durable log to bound loss.
The architecture absorbs ~700 K RPS feed loads through stateless aggregator scaling, sharded leaf storage, and cache-first reads.

References

Real-world implementations:

Serving Facebook Multifeed: Efficiency, Performance Gains Through Redesign (2015) — Disaggregated aggregator/leaf, ≈ 40 % efficiency gain.
TAO: The Power of the Graph (2013) — Facebook’s graph store overview.
News Feed ranking, powered by machine learning (2021) — ML ranking funnel and signals.
Cache made consistent — Polaris (2022) — 10-nines cache consistency monitoring.
Scaling Instagram Explore Recommendations (2023) — Two-tower retrieval + late-stage MTML.
Journey to 1000 models: Scaling Instagram’s recommendation system (2025) — Model registry + funnel at scale.
Building Facebook Messenger (2011) — MQTT for mobile real-time delivery.
Maxjourney: Pushing Discord’s Limits with a Million+ Online Users — Elixir/BEAM relays for large-server fan-out.
How Twitter Uses Redis to Scale (2014, summarised) — 105 TB / 39 M QPS / 10 K+ instances; 800-tweet timeline cap.
The Infrastructure Behind Twitter: Scale (2017) — Updated cache fleet sizing.

Academic / primary papers:

TAO: Facebook’s Distributed Data Store for the Social Graph — USENIX ATC ‘13
Scaling Memcache at Facebook — NSDI ‘13
Feeding Frenzy: Selectively Materializing Users’ Event Feeds — Silberstein et al., SIGMOD ‘10 — formal push/pull/hybrid analysis on Yahoo’s PNUTS.
Twitter Heron: Stream Processing at Scale — SIGMOD ‘15 — Twitter’s post-Storm streaming engine; powers timeline fan-out.
Strong consistency in Manhattan — Twitter’s KV store of record for tweets and timelines.
Pixie: A System for Recommending 3+ Billion Items to 200+ Million Users in Real-Time — WWW ‘18 — Pinterest’s graph-walk candidate generator.
Building a smarter home feed (Smartfeed) — Pinterest Engineering — HBase-backed per-user candidate queues.
Modernizing Home Feed Pre-Ranking Stage — Pinterest Engineering — unified two-tower pre-ranker on a root-leaf inference fleet.
The Browsemaps: Collaborative Filtering at LinkedIn — RSWeb ‘14 — Hadoop-batch / KV-served item-to-item CF platform.
LiRank: Industrial Large Scale Ranking Models — LinkedIn’s ranking framework.

Books:

Kleppmann, Designing Data-Intensive Applications — Ch. 1 on home-timeline fan-out as the canonical example; Ch. 11 on stream-based write-behind.

Related articles in this site:

Design Real-Time Chat and Messaging — WebSocket / MQTT-based messaging.
Design a Notification System — Multi-channel notification delivery.