WhatsApp: 2 Million Connections Per Server with Erlang
How WhatsApp scaled from zero to 1 billion users on Erlang/BEAM and FreeBSD — with fewer than 50 engineers, ~800 servers, and a custom protocol that compressed messages to 20 bytes — by pushing per-server density to limits most teams never attempt.
Abstract
WhatsApp’s architecture is a case study in radical simplicity at extreme scale. The mental model:
- One Erlang process per connection, one connection per user. The BEAM VM’s lightweight processes (~300 bytes each) and per-process garbage collection made 2 million concurrent connections per server viable on commodity hardware. This is the foundation everything else builds on.
- Vertical density before horizontal sprawl. WhatsApp pushed each server to 2+ million connections before adding servers. Operational complexity scales with node count, not core count — so fewer, larger machines reduce the surface area for failure.
- Custom everything where it matters. ejabberd was rewritten. XMPP was replaced with FunXMPP (50-70% bandwidth reduction). The BEAM VM itself was patched (timer wheels, GC throttling, pg2 replacement). Standard components were kept only when they didn’t bottleneck.
- Store-and-forward with aggressive deletion. Messages are transient — deleted from servers after confirmed delivery. 98% cache hit rate; 50% of messages read within 60 seconds. The server is a router, not a database.
- Small team, high autonomy. 32 engineers served 450 million users at acquisition. Individual teams of 1-3 engineers owned entire subsystems. Erlang’s fault tolerance and OTP supervision reduced operational burden to the point where this ratio was sustainable.
Context
The System
WhatsApp is a mobile messaging platform that handles text, photos, video, voice messages, voice calls, and video calls. Every message passes through WhatsApp’s servers for routing and store-and-forward delivery, but messages are not retained after delivery confirmation.
Tech stack at time of Facebook acquisition (February 2014):
- Language: Erlang/OTP R16B01 (custom patched)
- OS: FreeBSD 9.2
- Database: Mnesia (in-memory, ~2TB across 16 partitions), MySQL (persistent user data)
- Protocol: FunXMPP (custom binary, evolved from XMPP)
- Web server: YAWS (Erlang-based, for multimedia)
- Hosting: SoftLayer bare metal, dual datacenter (California + Washington D.C.)
The Trigger
WhatsApp launched in 2009 as a status-broadcasting app, pivoting to messaging within months. Growth was organic — no advertising, no marketing budget. The trigger for each scaling phase was raw user growth on a fixed, small engineering team:
Key metrics at milestones:
| Metric | Oct 2011 | Aug 2012 | Dec 2013 | Mar 2014 | Nov 2014 |
|---|---|---|---|---|---|
| Monthly active users | ~100M | ~150M | 400M | 465M | 600M+ |
| Messages/day | 1B | 10B | 18B | ~40B | 50B+ |
| Total servers | ~100 | ~200 | ~400 | ~550 | ~800 |
| Total CPU cores | — | — | — | 11,000+ | — |
| Engineers | ~20 | ~25 | ~30 | 32 | ~35 |
| Concurrent connections | — | — | — | 140M | — |
Constraints
- Team size: Founders deliberately kept the team small. Jan Koum’s philosophy: “I want to do one thing and do it well.”
- Budget: Pre-acquisition, WhatsApp was venture-funded. SoftLayer hosting cost approximately $2 million/month for 700+ servers.
- Target devices: Must work on low-end phones over 2G networks in developing markets — bandwidth and battery efficiency were non-negotiable.
- Availability: Global 24/7 service with no maintenance windows. Any downtime affected hundreds of millions of users.
- Organizational: Individual engineering teams of 1-3 engineers with high autonomy. No dedicated operations staff.
The Starting Point: ejabberd and Erlang
Why Erlang
WhatsApp’s co-founders chose Erlang indirectly — they chose ejabberd, an open-source XMPP (Extensible Messaging and Presence Protocol) server written in Erlang. Eugene Fooksman, an early WhatsApp engineer, cited the reasons: “openness, great reviews by developers, ease of start and the promise of Erlang’s long-term suitability for large communication system.”
The choice proved prescient for reasons the team only fully appreciated later:
-
Lightweight processes: Each BEAM process costs ~300 bytes of initial overhead. A million connections require ~300MB of process memory — leaving the remaining 99+ GB of RAM for application state, message buffers, and caching.
-
Per-process garbage collection: Unlike JVM-based systems where GC pauses affect the entire runtime, BEAM collects garbage per-process. A single slow process doesn’t stall two million others. This is the architectural property that made WhatsApp’s per-server density possible.
-
Preemptive scheduling: BEAM’s reduction-based scheduler preempts long-running processes after a fixed number of function calls (reductions). No single process can starve others — critical when one user sends a 10MB video while another sends a 1-byte acknowledgment.
-
OTP supervision trees: When a process crashes, its supervisor restarts it according to a defined strategy. WhatsApp relied on this for self-healing: individual connection handlers crash and restart without affecting other users.
-
Hot code loading: Erlang supports loading new code modules without stopping the running system. WhatsApp used this for bug fixes and updates without downtime — essential for a 24/7 global service where “maintenance windows” don’t exist.
Rick Reed, who joined WhatsApp in 2011 after 12 years building high-performance messaging systems in C++ at Yahoo, described his initial reaction: Erlang achieved scalability goals on single hosts that his Yahoo team “only dreamed about.” Anton Lavrik, lead of WhatsApp’s Erlang platform team, put it more directly: with C++, “developers have to implement half of Erlang by themselves” to achieve similar reliability.
Why FreeBSD
The founders had extensive FreeBSD experience from Yahoo, where it was the standard server OS. WhatsApp benchmarked FreeBSD against Linux under realistic messaging load — FreeBSD’s networking stack handled their workload better.
FreeBSD kernel tuning for 2+ million connections (from WhatsApp’s 2012 blog post):
| Parameter | Value | Purpose |
|---|---|---|
kern.maxfiles | 3,000,000 | System-wide file descriptor limit |
kern.maxfilesperproc | 2,700,000 | Per-process file descriptor limit |
kern.ipc.maxsockets | 2,400,000 | Maximum socket count |
net.inet.tcp.tcbhashsize | 524,288 | TCP hash table entries for connection lookup |
Post-acquisition: WhatsApp migrated from FreeBSD on SoftLayer bare metal to Linux on Facebook’s data center infrastructure between 2017 and 2019. This was an organizational decision — Facebook’s container orchestration and monitoring tooling required Linux — not a technical judgment against FreeBSD.
The ejabberd Rewrite
WhatsApp didn’t just configure ejabberd — they spent years rewriting it. According to Fooksman, the team spent “the next few years re-writing and modifying quite a few parts of ejabberd,” including:
- Protocol replacement: Switched from standard XMPP to a proprietary binary protocol (FunXMPP)
- Codebase restructuring: Redesigned core components for their specific access patterns
- BEAM VM patches: Modified the Erlang runtime itself to eliminate bottlenecks
- Storage layer: Replaced ejabberd’s default storage with Mnesia-based and file system-based backends
By 2014, the codebase retained ejabberd’s ancestry but bore little resemblance to the original.
The 2 Million Connections Milestone
Server Specifications (January 2012)
WhatsApp published detailed server specifications when they achieved 2+ million concurrent TCP connections on a single server:
| Component | Specification |
|---|---|
| CPU | Intel Xeon X5675 @ 3.07GHz, 24 logical cores |
| RAM | 103 GB |
| OS | FreeBSD 8.2-STABLE (64-bit) |
| Erlang | R14B03, 24 SMP threads, kernel polling enabled |
Peak observed load: 2,277,845 open sockets at 37.9% user CPU, 13.6% system CPU, 41.9% idle. 35 GB active memory with 27 GB free. The server had significant headroom remaining.
Why 2 Million Mattered
WhatsApp’s approach to scaling was deliberately vertical-first. Rick Reed described this as a consequence of a key insight: operational complexity scales with the number of nodes, not the number of cores per node. A cluster of 100 servers at 2 million connections each is operationally simpler than 1,000 servers at 200,000 each — fewer failure domains, fewer network partitions, fewer inter-node messages.
The BEAM VM’s SMP (Symmetric Multi-Processing) scalability made this viable. Doubling cores on a single machine nearly doubled throughput because BEAM schedulers map 1:1 to cores with minimal cross-scheduler contention. WhatsApp confirmed this with benchmark data showing near-linear scaling up to 24 cores.
By 2014, WhatsApp reduced the target to ~1 million connections per server. This was deliberate — users were sending more messages, using more features (photos, video, voice), and each connection consumed more resources. The team chose headroom over density.
BEAM VM Tuning and Custom Patches
WhatsApp didn’t just use Erlang/OTP — they patched the runtime itself. Rick Reed detailed these modifications at Erlang Factory 2014. Several patches were later upstreamed to mainline OTP; others became obsolete as OTP evolved.
Timer Wheel Contention
Problem: Erlang’s timer system used a single timer wheel with a global lock. At millions of concurrent connections, each with keepalive timers and message timeouts, lock contention on the timer wheel became a CPU bottleneck.
Fix: WhatsApp implemented multiple timer wheels with independent locks, distributing timer operations across wheels to eliminate contention.
Garbage Collection Throttling
Problem: When a process’s message queue grows faster than it can process messages, Erlang’s default behavior triggers GC on each message receive. With a queue of millions of messages (during traffic spikes), GC overhead dominates CPU time.
Fix: Added GC throttling when the message queue exceeds a threshold. The process defers GC until it can make progress on the queue, trading temporary memory growth for CPU availability.
Distribution Buffer Sizing
Problem: The default inter-node distribution receive buffer was 4KB — far too small for WhatsApp’s inter-cluster message volumes. Small buffers caused frequent blocking on inter-node communication.
Fix: Increased the default to 256KB and made it configurable.
pg2 Replacement (The February 2014 Outage)
Problem: In February 2014 — days after the Facebook acquisition announcement — a backend router dropped a VLAN, causing mass node disconnects and reconnects across the cluster. The pg2 module (Erlang’s process group module) entered a state with n^3 messaging behavior during the reconnection storm. Message queues went from zero to 4 million within seconds. The outage lasted 2 hours and 10 minutes.
Fix: WhatsApp initially patched pg2 with denormalized group member lists, but eventually built an entirely new pg module from scratch. The new module was upstreamed to mainline Erlang/OTP, replacing pg2 entirely. This is one of the most significant contributions from WhatsApp back to the Erlang ecosystem.
Mnesia Transaction Manager
Problem: Mnesia’s mnesia_tm dispatched all async_dirty transactions through a single process, serializing operations that could safely run in parallel.
Fix: Dispatched async_dirty transactions to separate per-table processes, enabling concurrent record updates across different tables.
ETS Hash Improvements
Problem: Erlang Term Storage (ETS) used phash2 hashing that could produce collisions under WhatsApp’s workload, and the main/name tables didn’t scale well with thousands of ETS tables.
Fix: Modified hash seeding to avoid phash2 collisions, and improved table management scaling. The hash salt patch was contributed upstream via pull request #2979.
Scheduler Binding
WhatsApp enabled scheduler binding to CPU cores via the +stbt flag, which pins each BEAM scheduler to a specific CPU core. Reed reported this reduced context switching by approximately 4x, providing a significant throughput improvement with a one-line configuration change.
The gen_industry Dispatch Pattern
The Bottleneck
Standard Erlang gen_server is single-threaded — one process handles all incoming messages sequentially. For high-throughput services processing millions of operations per second, a single dispatch process becomes a bottleneck.
WhatsApp built a three-tier dispatch hierarchy:
gen_server → gen_factory → gen_industry
-
gen_server: Standard OTP behavior. Single process, sequential dispatch. Works for low-throughput services.
-
gen_factory: Custom behavior. A single dispatch process distributes work across multiple worker processes. Eliminates the processing bottleneck but the dispatch process itself can become saturated at high fan-in.
-
gen_industry: Custom behavior with multiple dispatch processes feeding multiple workers. Parallelizes both ingestion and dispatch. At extreme fan-in from many nodes, this was necessary to prevent the dispatch layer from becoming the bottleneck.
Data Partitioning
WhatsApp’s services partitioned data 2-32 ways (most services used 32-way partitioning):
- Consistent hashing maps a record to a partition
- Each partition maps to a Mnesia fragment
- Each fragment maps to a factory worker process
- All access to a single record routes to a single Erlang process
This design serializes access to individual records without transactions — the owning process is the lock. Maximum ~8 concurrent processes access any single ETS or Mnesia fragment, controlling lock contention at the storage layer.
FunXMPP: The Custom Binary Protocol
Why Replace XMPP
Standard XMPP is XML-based. A simple text message generates ~180 bytes of protocol overhead — XML open/close tags, namespace declarations, attributes. On 2G networks in developing markets, this overhead was unacceptable. WhatsApp needed a protocol optimized for resource-constrained mobile devices on slow, unreliable networks.
How FunXMPP Works
FunXMPP replaced XML structure with binary encoding:
-
Token replacement: Reserved XMPP words (message, from, body, etc.) are replaced by single bytes. For example, “message” becomes
0x59, “@s.whatsapp.net” becomes0x91. -
Structural compression: Instead of XML opening/closing tags, a single byte (e.g.,
0xF8) indicates a structure. The parser counts items rather than matching tag names. -
Result: A typical message shrinks from ~180 bytes to ~20 bytes — a 50-70% bandwidth reduction.
This made WhatsApp usable on feature phones over 2G connections where every byte mattered. The protocol efficiency was a direct contributor to WhatsApp’s dominance in developing markets.
Storage Architecture
Mnesia: The In-Memory Layer
WhatsApp used Mnesia, Erlang’s built-in distributed database, as its primary metadata store:
| Configuration | Value |
|---|---|
| Total RAM | ~2TB across 16 partitions |
| Total records | ~18 billion |
| Node topology | Island architecture: 2 nodes per island (primary/secondary) |
| Account table | 512 fragments |
| Cache hit rate | 98% |
What Mnesia stored:
- User-to-server routing tables (which server handles which user)
- Offline message queues (messages waiting for delivery)
- User profiles and group memberships
- Multimedia metadata
What Mnesia did not store:
- Message content (transient — deleted after delivery confirmation)
- Bulk media files (stored on filesystem)
- Persistent user data (MySQL shards)
Mnesia Limitations and Workarounds
Mnesia has known scalability constraints:
ram_copiesanddisc_copiestables require the full dataset in memorydisc_copiestables read the entire table from disk into memory on node startup — slow for large tablesdisc_only_copiestables are limited to 2GB each (DETS limitation)
WhatsApp worked around these with:
- Island architecture: Each 2-node island manages a subset of data, keeping per-island datasets manageable
- UFS2 filesystem on FreeBSD: Bulk data storage outside Mnesia
- MySQL shards: Persistent user data that doesn’t need Mnesia’s in-memory speed
- async_dirty transactions: Avoids Mnesia’s full transaction protocol (which couples nodes)
- Library directory across multiple drives: Increased I/O throughput for Mnesia’s disk operations
Store-and-Forward Message Flow
WhatsApp’s message delivery follows a store-and-forward model optimized for speed:
- Recipient online, same cluster: Direct Erlang process-to-process message delivery (sub-millisecond)
- Recipient online, different cluster: Inter-cluster forwarding via distribution connections (sub-second)
- Recipient offline: Message stored in Mnesia offline queue, replicated to backup node
- Recipient reconnects: Queued messages delivered in order
- Delivery confirmed: Message deleted from server
Messages are retained for up to 30 days if the recipient remains offline. Over 50% of messages are read within 60 seconds of storage, which explains the 98% cache hit rate — the working set is overwhelmingly recent messages.
Asynchronous Optimization
WhatsApp’s Erlang code favored asynchronous patterns to maximize throughput:
handle_castoverhandle_call: Fire-and-forget messages avoid blocking the sender- Timeouts instead of monitors: Reduces the number of inter-process links and associated overhead
nosuspendon casts: If the target process’s mailbox is full, the send fails immediately rather than blocking- Large distribution buffers: 256KB (up from 4KB default) absorbs network bursts without blocking sender processes
- Separate inter-node message queues: Isolated per destination node, preventing a slow remote node from blocking messages to healthy nodes
Multimedia System
Architecture
WhatsApp rebuilt its multimedia handling system in Erlang in 2012, replacing an earlier non-Erlang system. Rick Reed described this at Erlang Factory 2013.
The multimedia flow separates content from metadata:
- Client uploads media to HTTP server (YAWS, Erlang-based) over a separate connection
- HTTP server stores the file and returns a hash/unique ID
- Sender transmits the hash to the recipient via the messaging server
- Recipient downloads media from the HTTP server using the hash
This separation means the messaging server never handles large binary payloads — it only routes small metadata messages. Media servers scale independently of chat servers.
Scale (Late 2014)
| Metric | Daily Volume | Peak (New Year’s Eve 2014) |
|---|---|---|
| Photos | 600 million | 2 billion |
| Voice messages | 200 million | — |
| Videos | 100 million | 360 million |
| Peak outbound bandwidth | — | 146 Gb/s |
By late 2014, WhatsApp operated approximately 350 dedicated multimedia (MMS) servers — nearly half the total server fleet — reflecting the growing dominance of media in messaging traffic.
Multi-Cluster Architecture
Pre-Acquisition Topology
WhatsApp ran dual datacenters on SoftLayer bare metal:
- Main clusters: Multiple clusters in each datacenter handling chat connections
- MMS clusters: Dedicated multimedia clusters in each datacenter
- Shared global cluster: Cross-datacenter shared state
- wandist connections: Inter-cluster distribution links
Island Architecture
Backend nodes were organized into “islands” — small, redundant clusters:
- Each island manages a subset of data partitions
- 2+ nodes per island (primary and secondary)
- One-way replication: primary handles all reads/writes; secondary is passive failover
- Islands operate independently, isolating failures
This design meant a node failure affected only the users mapped to that island’s partitions. The blast radius of any single failure was a fraction of the total user base.
Post-Acquisition Evolution
After the Facebook acquisition, the architecture evolved significantly:
- Data center migration (2017-2019): 3-year migration from SoftLayer to Facebook-owned data centers. Presented at CodeBEAM SF 2019 by Igors Istocniks. Migration proceeded per phone number prefix (country code): make the prefix read-only, accelerate database replay repairs, move traffic, reconcile — under 5 minutes per prefix.
- OS migration: FreeBSD to Linux (required by Facebook’s container orchestration and monitoring infrastructure)
- Scale expansion: Backend split into 40+ distinct clusters by function, scaled to 40,000+ Erlang nodes
- Over 1 billion concurrent connections maintained across the expanded infrastructure
WARTS: WhatsApp’s Runtime System
Post-acquisition, WhatsApp formalized their custom Erlang/OTP fork as WARTS (WhatsApp’s Runtime System), open-sourced on GitHub. WARTS focuses on performance, security, and tooling enhancements for Linux, reflecting the post-migration environment.
End-to-End Encryption and Multi-Device
End-to-End Encryption (April 2016)
WhatsApp integrated the Signal Protocol for end-to-end encryption across all message types — text, photos, video, voice messages, voice calls, and video calls. After this change, WhatsApp’s servers could no longer read message content.
The server architecture remained Erlang/BEAM-based. The encryption added computational overhead on the client side but did not fundamentally change the server’s role as a message router and store-and-forward relay.
Multi-Device Support (July 2021)
Multi-device support required architectural changes to the server:
- Per-device identity keys: Previously, each user had one identity key. Now each device gets its own.
- Client-fanout: Messages are encrypted N times for N devices, with the sender performing the encryption
- Device mapping: Server maintains an account-to-device-identity mapping
- History sync: Encrypted message bundles transferred during device linking
- Voice/video calls: Generate random 32-byte SRTP (Secure Real-time Transport Protocol) master secrets per device
Options Considered
WhatsApp’s key architectural decisions involved explicit trade-offs. Understanding what they considered — and rejected — reveals why the final architecture took its shape.
Language Choice: Erlang vs Alternatives
Option 1: C++
Approach: Build a custom messaging server in C++, the standard choice for high-performance servers at the time (2009).
Pros:
- Maximum control over memory and CPU usage
- Mature networking libraries
- Founders had C++ experience (Yahoo)
Cons:
- “Developers have to implement half of Erlang by themselves” (Anton Lavrik) — supervision, hot code loading, per-connection isolation
- Manual memory management at scale of millions of connections is error-prone
- No built-in concurrency model matching one-process-per-connection
Why not chosen: The reliability and concurrency features that C++ developers must build from scratch are built into Erlang’s runtime. At WhatsApp’s team size, building these primitives was not feasible.
Option 2: Java/JVM
Approach: Use a JVM-based server framework.
Pros:
- Large ecosystem and talent pool
- Garbage collection handles memory management
Cons:
- JVM garbage collection is global — stop-the-world pauses affect all connections
- Per-connection memory overhead significantly higher than BEAM processes
- Thread-per-connection model doesn’t scale to millions without complex async frameworks
Why not chosen: JVM’s global GC was the dealbreaker. Discord’s later experience with Cassandra’s JVM GC pauses (covered in a separate case study) validates this concern.
Option 3: Erlang (Chosen)
Why chosen: ejabberd provided a working XMPP server to build upon, Erlang’s concurrency model matched the problem (one process per connection), and the runtime’s fault tolerance reduced operational burden for a tiny team.
Scaling Strategy: Vertical-First vs Horizontal-First
Option: Horizontal-first (many small servers)
Pros: Simpler per-server capacity planning; commodity hardware Cons: More nodes = more operational complexity, more network partitions, more inter-node coordination
Chosen: Vertical-first (fewer large servers)
Why chosen: Operational complexity scales with node count, not core count. BEAM’s SMP scalability meant doubling cores nearly doubled throughput. 100 servers at 2M connections was operationally simpler than 1,000 servers at 200K.
Architecture: Microservices vs Monolith
WhatsApp deliberately avoided the microservices pattern:
Why not microservices: At WhatsApp’s team size (32 engineers), microservices would have added deployment complexity, inter-service latency, and operational overhead that a small team couldn’t absorb. Erlang’s process model already provides service isolation within a single BEAM node — each subsystem runs in its own supervision tree, crashes independently, and can be upgraded independently via hot code loading.
Decision factor matrix:
| Factor | Erlang/ejabberd | C++ Custom | Java/JVM | Microservices |
|---|---|---|---|---|
| Time to working prototype | Weeks | Months | Months | N/A |
| Per-connection memory | ~300 bytes | Variable | ~KB range | N/A |
| GC impact on other connections | None (per-process) | N/A (manual) | Global STW pauses | Varies |
| Hot code reload | Built-in | Not viable | Possible (complex) | Via rolling deploy |
| Fault isolation | Per-process | Manual | Per-thread (limited) | Per-service |
| Operational complexity for 3 engineers | Low (OTP supervision) | Very High | High | Very High |
Outcome
Metrics at Key Milestones
| Metric | Early 2014 (Acquisition) | Late 2014 | 2016 | Current |
|---|---|---|---|---|
| Monthly active users | 450M | 600M+ | 1B | 3B+ |
| Messages/day | ~40B | 50B+ | — | 100B+ |
| Servers | ~550 | ~800 | — | 40,000+ nodes |
| Engineers | 32 | ~35 | ~50 | 1,000+ |
| Concurrent connections | 140M | — | — | 1B+ |
| Users per engineer | 14M | 17M | 20M | — |
Performance Numbers (2014)
| Metric | Value |
|---|---|
| Peak logins/second | 230,000 |
| Peak inbound messages/second | 342,000 |
| Peak outbound messages/second | 712,000 |
| Erlang inter-node messages/second | 70+ million |
| Connections per second | 440,000 |
Cost Efficiency
Pre-acquisition, WhatsApp ran ~700 servers on SoftLayer at approximately 0.004/user/month — orders of magnitude below competing platforms that required larger teams and more infrastructure.
Timeline
| Date | Event |
|---|---|
| Feb 2009 | WhatsApp Inc. incorporated |
| Aug 2009 | WhatsApp 2.0 launched (messaging pivot) |
| Jan 2011 | 1 million connections per server achieved |
| Jan 2012 | 2+ million connections per server (“1 million is so 2011”) |
| Oct 2011 | 1 billion messages/day |
| Aug 2012 | 10 billion messages/day |
| Dec 2013 | 18 billion messages/day, 400M MAU |
| Feb 2014 | Facebook acquisition ($19B); pg2 outage |
| Mar 2014 | Rick Reed’s “Billion with a B” talk; 465M MAU |
| Nov 2014 | 600M+ MAU, 800 servers |
| Feb 2016 | 1 billion MAU |
| Apr 2016 | End-to-end encryption (Signal Protocol) |
| 2017-2019 | Data center migration (SoftLayer → Facebook DCs) |
| Jul 2021 | Multi-device support |
| 2024 | 3B+ MAU, 40,000+ Erlang nodes, 100B+ messages/day |
Lessons Learned
Technical Lessons
1. Per-Process GC Changes the Scaling Game
The insight: BEAM’s per-process garbage collection is the single architectural property that enabled WhatsApp’s per-server density. On a JVM, 2 million connections sharing a single heap would produce catastrophic GC pauses. On BEAM, each connection’s GC is independent — a process handling a heavy media message doesn’t stall two million idle connections.
How it applies elsewhere: When evaluating runtimes for high-connection-count workloads (WebSocket servers, IoT gateways, chat systems), the GC model matters more than raw throughput benchmarks. A runtime that’s 2x slower per-request but has isolated GC may support 10x more connections.
Warning signs you need isolated GC:
- Latency spikes correlate with GC pauses, not load
- Tail latency (p99/p999) degrades non-linearly with connection count
- Adding connections degrades performance for existing connections
2. Vertical Density Reduces Operational Complexity
The insight: WhatsApp pushed each server to its connection limit before adding servers. This kept the cluster small — 550 servers for 450 million users. Fewer servers meant fewer failure domains, fewer inter-node messages, fewer network partitions to handle, and fewer servers for a team of 32 to monitor.
How it applies elsewhere: Before horizontally scaling to more nodes, verify that each node is fully utilized. Modern hardware with 128+ cores and 512+ GB RAM can serve workloads that many teams distribute across dozens of smaller instances. The operational cost of each additional node (monitoring, failover, network configuration) is often underestimated.
Warning signs you should scale vertically first:
- Average CPU utilization per node is below 40%
- Most operational incidents involve inter-node communication, not node overload
- Your team size is small relative to node count
3. Custom Protocols Unlock Markets
The insight: FunXMPP’s 50-70% bandwidth reduction over standard XMPP wasn’t a premature optimization — it was a market-access decision. WhatsApp dominated developing markets where 2G was prevalent and per-MB data costs were high. The protocol efficiency made the product viable where competitors couldn’t operate.
How it applies elsewhere: Protocol efficiency matters when your target environment is resource-constrained. This applies to IoT devices, mobile apps in bandwidth-limited regions, and any system where per-message cost (bandwidth, battery, compute) directly affects user experience or viability.
4. Erlang’s “Let It Crash” Reduces Team Size Requirements
The insight: OTP supervision trees meant that individual connection crashes were automatically recovered without operator intervention. This is why 32 engineers could serve 450 million users without a dedicated operations team. The runtime handles the class of failures that would otherwise require human intervention.
How it applies elsewhere: Fault-tolerant runtimes reduce the human operational burden. If your team spends significant time on “restart the process” or “reconnect the client” remediation, evaluate whether your runtime’s error recovery model is the bottleneck.
Process Lessons
1. Start with an Open-Source Base, Then Rewrite
What they learned: Starting with ejabberd gave WhatsApp a functional messaging server in weeks. Over years, they rewrote nearly every component. The initial choice provided time-to-market; the rewrites provided performance at scale.
What they’d do differently: The choice to start with ejabberd was vindicated. The key lesson is that the starting point doesn’t constrain the end state — as long as the underlying runtime (Erlang/BEAM) is sound, everything above it can be replaced incrementally.
2. Patch the Runtime When Necessary
What they learned: WhatsApp didn’t treat Erlang/OTP as a black box. When the timer wheel, GC, or process group modules became bottlenecks, they patched the runtime. Several of these patches were upstreamed, benefiting the entire Erlang ecosystem.
What it requires: Deep understanding of the runtime internals. WhatsApp hired engineers (like Rick Reed) who could profile and modify the BEAM VM. This capability — reading and modifying your runtime’s source code — is rare but transformative at extreme scale.
Organizational Lessons
1. Team Size as a Feature
The insight: WhatsApp’s small team wasn’t a constraint to overcome — it was a deliberate design choice that shaped the architecture. A 32-person team cannot operate microservices, maintain complex deployment pipelines, or staff a 24/7 operations center. So they chose Erlang (self-healing), monolithic deployment (one artifact), and vertical scaling (fewer nodes). Every architectural decision was filtered through “can 1-3 engineers own this?”
How it applies elsewhere: Team size should inform architecture, not the other way around. If your team is small, choose technologies and patterns that minimize operational surface area. The worst outcome is an architecture that requires more operators than you have.
2. Focus Enables Density
The insight: WhatsApp did one thing — messaging. No news feed, no advertising engine, no recommendation system, no content moderation pipeline (pre-acquisition). This single-product focus meant every engineering hour went toward making messaging work better at scale.
Applying This to Your System
When This Pattern Applies
You might face similar challenges if:
- You’re building a high-connection-count server (WebSocket, MQTT, chat, IoT gateway)
- Your connection count is growing faster than your team size
- You need predictable per-connection latency regardless of total connection count
- Your target environment includes resource-constrained clients (mobile, IoT, developing markets)
Checklist for Evaluation
- Does your runtime support per-connection (per-process/per-goroutine/per-fiber) garbage collection?
- Is your per-connection memory overhead measured in bytes or kilobytes? (WhatsApp: ~300 bytes)
- Can your servers handle 10x more connections by scaling vertically before adding nodes?
- Can your current team size sustain the operational load of your current node count?
- Have you measured your protocol overhead vs. payload size for typical messages?
Starting Points
- Measure per-connection overhead: Profile your server’s memory consumption per idle connection and per active connection. If idle connections cost >1KB each, investigate whether your runtime or framework adds unnecessary overhead.
- Profile GC impact on tail latency: Compare p99 latency under GC pressure vs. without. If GC pauses dominate your tail latency, evaluate runtimes with per-unit-of-work GC (BEAM, Go’s goroutine-aware GC).
- Audit operational complexity per node: Count the hours your team spends on per-node operational tasks (monitoring, failover, upgrades). Multiply by node count. If this exceeds available engineering hours, consider larger nodes or simpler deployment.
- Prototype vertical limits: Before adding nodes, push a single server to its limit in a load test. You may discover 5-10x headroom that eliminates the need for horizontal scaling — and the complexity it brings.
Conclusion
WhatsApp’s architecture is a study in what happens when a small team makes consistently correct bets on a runtime’s fundamental properties. Erlang’s per-process garbage collection, lightweight processes, and OTP supervision weren’t just nice features — they were the architectural foundation that made 2 million connections per server, 450 million users with 32 engineers, and a $19 billion valuation possible.
The counterintuitive lesson is that WhatsApp’s architecture was simple. No microservices. No container orchestration (pre-acquisition). No complex caching layers. One Erlang process per user, Mnesia for metadata, delete-after-delivery for messages, and a custom binary protocol for bandwidth efficiency. The complexity lived in the BEAM VM patches and the operational discipline of a small team — not in the system’s architecture.
For engineers building high-connection systems today, WhatsApp’s story suggests a specific investigation: before adding nodes, clusters, or architectural layers, check whether your runtime’s concurrency model and GC strategy are the actual bottleneck. If they are, changing the runtime — as dramatic as that sounds — may be simpler than working around its limitations at scale.
Appendix
Prerequisites
- Familiarity with message broker concepts (store-and-forward, delivery guarantees)
- Basic understanding of Erlang/BEAM concurrency model (processes, message passing, supervision trees)
- Knowledge of XMPP protocol basics (stanzas, presence, roster)
- Understanding of vertical vs. horizontal scaling trade-offs
Terminology
| Term | Definition |
|---|---|
| BEAM | Bogdan/Björn’s Erlang Abstract Machine — the virtual machine that executes Erlang and Elixir code. Provides preemptive scheduling, per-process GC, and hot code loading. |
| OTP | Open Telecom Platform — Erlang’s standard library of behaviors (gen_server, supervisor) and tools for building fault-tolerant systems. |
| ejabberd | Open-source XMPP server written in Erlang. WhatsApp’s starting point, heavily rewritten over subsequent years. |
| XMPP | Extensible Messaging and Presence Protocol — an open XML-based protocol for real-time messaging. WhatsApp’s original protocol before switching to FunXMPP. |
| FunXMPP | WhatsApp’s custom binary protocol evolved from XMPP, replacing XML structure with byte tokens for 50-70% bandwidth reduction. |
| Mnesia | Erlang’s built-in distributed database supporting in-memory and disk-based tables. WhatsApp used it for routing tables, offline queues, and user metadata. |
| ETS | Erlang Term Storage — in-memory key-value tables accessible by multiple Erlang processes. Used for fast concurrent lookups within a single node. |
| gen_server | Standard OTP behavior for implementing a server process that handles synchronous and asynchronous requests. |
| gen_factory | WhatsApp’s custom OTP behavior extending gen_server with worker pool dispatch for parallelized processing. |
| gen_industry | WhatsApp’s custom OTP behavior with multiple dispatch processes feeding multiple workers, parallelizing both ingestion and processing. |
| SMP | Symmetric Multi-Processing — hardware architecture where multiple CPUs share memory. BEAM’s SMP support maps one scheduler per core. |
| WARTS | WhatsApp’s Runtime System — a custom fork of Erlang/OTP focused on performance, security, and tooling for Linux environments. |
| SRTP | Secure Real-time Transport Protocol — encryption protocol for voice and video calls, used by WhatsApp for end-to-end encrypted media streams. |
| YAWS | Yet Another Web Server — Erlang-based HTTP server used by WhatsApp for multimedia upload/download handling. |
| SoftLayer | IBM’s bare-metal cloud hosting provider where WhatsApp ran pre-acquisition. Migrated to Facebook data centers 2017-2019. |
Summary
- WhatsApp started from ejabberd (open-source XMPP/Erlang server) in 2009, spending years rewriting it into a custom messaging platform that served 450 million users with 32 engineers and ~550 servers at the time of Facebook’s $19 billion acquisition
- BEAM’s per-process garbage collection (~300 bytes per process) enabled 2+ million concurrent TCP connections per server — operational complexity scales with node count, not core count, so fewer large servers was deliberately chosen over many small ones
- WhatsApp patched the BEAM VM itself (timer wheels, GC throttling, distribution buffers, pg2 replacement, ETS hash improvements), with several patches upstreamed to mainline Erlang/OTP
- FunXMPP replaced XML-based XMPP with a binary protocol achieving 50-70% bandwidth reduction, making WhatsApp viable on 2G feature phones in developing markets
- Mnesia provided in-memory metadata storage (~2TB, 18 billion records, 98% cache hit rate) while messages were transient — deleted after delivery confirmation, with 50% read within 60 seconds
- Post-acquisition, WhatsApp migrated from FreeBSD/SoftLayer to Linux/Facebook data centers (2017-2019), scaled to 40,000+ Erlang nodes and 1 billion+ concurrent connections, and formalized their runtime fork as WARTS (WhatsApp’s Runtime System)
References
- WhatsApp Blog: “1 million is so 2011” — WhatsApp Engineering, January 2012. Server specs and 2M connection milestone.
- Rick Reed: “That’s ‘Billion’ with a ‘B’: Scaling to the Next Level at WhatsApp” — Erlang Factory SF 2014. BEAM patches, meta-clustering, data storage architecture.
- Rick Reed: “WhatsApp: Half a Billion Unsuspecting FreeBSD Users” — MeetBSD California 2014. FreeBSD infrastructure at 600M+ users.
- Eugene Fooksman Interview — Paolo D’Incau, March 2013. Early architecture decisions and ejabberd origins.
- Anton Lavrik Interview: 20 Years of Open Source Erlang — Erlang Solutions. Erlang vs alternatives, developer tooling at scale.
- How WhatsApp Grew to Nearly 500 Million Users, 11,000 Cores, and 70 Million Messages a Second — High Scalability, 2014. Detailed technical architecture summary from Reed’s 2014 talk.
- The WhatsApp Architecture Facebook Bought for $19 Billion — High Scalability, 2014. Architecture overview at acquisition.
- Igors Istocniks: “How WhatsApp Moved 1.5 Billion Users Across Data Centers” — CodeBEAM SF 2019. Data center migration details.
- Meta Engineering: Introducing Multi-Device for WhatsApp — Meta Engineering Blog, July 2021. Multi-device architecture.
- WhatsApp/WARTS on GitHub — WhatsApp’s Runtime System (custom Erlang/OTP fork).
- Erlang Forums: Did the WhatsApp Patches Make It Into Mainstream Erlang? — Community discussion on upstream contributions.
- Rick Reed: “Scaling to Millions of Simultaneous Connections” — Erlang Factory SF 2012. Initial connection scaling work.
- Rick Reed: “JPGs and 3GPs and AMRs Oh My!” — Erlang Factory SF 2013. Erlang-based multimedia system rebuild.