System Design Problems
21 min read

Design Real-Time Chat and Messaging

A comprehensive system design for real-time chat and messaging covering connection management, message delivery guarantees, ordering strategies, presence systems, group chat fan-out, and offline synchronization. This design addresses sub-second message delivery at WhatsApp/Discord scale (100B+ messages/day) with strong delivery guarantees and mobile-first offline resilience.

Push Notifications

Storage Layer

Message Queue

Core Services

Connection Layer

Edge Layer

Clients

Mobile App

Mobile App

Web Client

CDN

(Static Assets)

Load Balancer

WebSocket Gateway 1

WebSocket Gateway 2

WebSocket Gateway N

Message Service

Presence Service

Sync Service

Fan-out Service

Kafka Cluster

Redis Cluster

(Presence, Sessions)

ScyllaDB/Cassandra

Messages

PostgreSQL

User, Group Metadata

Push Service

APNs

FCM

High-level architecture: WebSocket gateways handle persistent connections, Kafka provides message routing, and fan-out service distributes messages to recipients.

Real-time chat systems solve three interrelated problems: low-latency delivery (messages appear within milliseconds), reliable delivery (no message is ever lost), and ordering consistency (messages appear in the same order for all participants).

Core architectural decisions:

DecisionChoiceRationale
TransportWebSocketFull-duplex, 2-byte overhead after handshake
Delivery guaranteeAt-least-once + client dedupSimpler than exactly-once; idempotency at app layer
Message orderingServer-assigned timestampsSingle source of truth; avoids clock skew issues
Fan-out modelHybrid push/pullPush for small groups, pull for large channels
PresenceHeartbeat + Redis pub/subEphemeral data; no persistence needed
Offline syncClient-side sequence trackingFetch missed messages on reconnect

Key trade-offs accepted:

  • Server dependency for ordering (no P2P) in exchange for correctness guarantees
  • At-least-once delivery requiring client-side deduplication
  • Eventual consistency for presence (acceptable for UX)
  • Higher infrastructure cost for guaranteed delivery (message queue durability)

What this design optimizes:

  • Sub-500ms global message delivery
  • Zero message loss under network partitions
  • Seamless offline-to-online transitions
  • Horizontal scalability to billions of messages/day
RequirementPriorityNotes
1:1 direct messagingCorePrivate conversations between two users
Group messagingCoreUp to 1000 members per group
Message delivery receiptsCoreSent, delivered, read indicators
Typing indicatorsCoreReal-time “user is typing” display
Online/offline presenceCoreShow user availability status
Offline message deliveryCoreQueue and deliver when user reconnects
Message history syncCoreRetrieve past messages across devices
Read receiptsExtendedTrack who has read messages
Media attachmentsExtendedImages, videos, files (out of detailed scope)
End-to-end encryptionExtendedSignal protocol (out of detailed scope)
RequirementTargetRationale
Availability99.99% (4 nines)Communication is critical; 52 min/year downtime max
Message delivery latencyp99 < 500msReal-time feel requires sub-second
Message durability99.9999%No message should ever be lost
Offline sync time< 5s for 1000 messagesFast reconnection experience
Concurrent connections10M per regionMobile-scale concurrent users
Message retention30 days default, configurableStorage cost vs. user expectations

Users:

  • Monthly Active Users (MAU): 500M
  • Daily Active Users (DAU): 200M (40% of MAU)
  • Peak concurrent connections: 50M (25% of DAU)

Traffic:

  • Messages per user per day: 50 (mix of 1:1 and group)
  • Daily messages: 200M × 50 = 10B messages/day
  • Peak messages per second: 10B / 86400 × 3 (peak multiplier) = 350K msgs/sec

Storage:

  • Average message size: 500 bytes (text + metadata)
  • Daily storage: 10B × 500B = 5TB/day
  • 30-day retention: 150TB
  • With replication (3x): 450TB

Connections:

  • WebSocket connections per gateway: 500K (Linux file descriptor limits)
  • Gateway servers needed: 50M / 500K = 100 servers minimum
  • With redundancy (2x): 200 gateway servers

Best when:

  • Infrastructure team can maintain stateful WebSocket servers
  • Low latency is primary requirement
  • Moderate group sizes (< 500 members)
  • Strong consistency for message ordering needed

Architecture:

User BGateway (User B)Message ServiceGatewayUser AUser BGateway (User B)Message ServiceGatewayUser ASend messageRoute messagePersist to DBDeliver to User B's gatewayPush via WebSocketDelivery ACKUpdate statusDelivery confirmationDouble-check delivered

Key characteristics:

  • Each gateway maintains user-to-connection mapping
  • Message service routes directly to recipient’s gateway
  • Synchronous delivery with acknowledgment chain

Trade-offs:

  • Lowest latency (direct routing)
  • Simple mental model
  • Strong ordering guarantees
  • Gateway state management complexity
  • User migration on gateway failure
  • Limited group size due to fan-out cost

Real-world example: WhatsApp uses this approach with TCP persistent connections. Each server maintains ~1-2M connections. Messages route through servers using recipient’s assigned gateway.

Best when:

  • Very large groups/channels (1000+ members)
  • Geographic distribution across regions
  • Tolerance for slightly higher latency (100-500ms)
  • Need for replay and audit capabilities

Architecture:

Deliver

Subscribe

Queue

Publish

Sender

Gateway

Message Service

Kafka

(partitioned by conversation)

Fan-out Worker 1

Fan-out Worker 2

Fan-out Worker N

Gateway Pool

Recipients

Key characteristics:

  • Messages published to Kafka partitioned by conversation
  • Fan-out workers consume and distribute to recipients
  • Decouples send from delivery for reliability

Trade-offs:

  • Handles large fan-out efficiently
  • Built-in replay capability
  • Better failure isolation
  • Higher latency (queue hop)
  • More complex infrastructure
  • Ordering requires partition strategy

Real-world example: Slack uses this approach with Kafka handling 6.5 Gbps peak throughput across 10 clusters. Channel servers use consistent hashing to maintain per-channel ordering.

Best when:

  • Mix of 1:1, small groups, and large channels
  • Need to balance latency vs. resource usage
  • Celebrity/influencer use cases with massive follower counts

Architecture:

< 100 members

>= 100 members

Message Service

Group Size?

Direct Push

(Real-time)

Queue + Fan-out

(Async)

Gateway

Users

Kafka

Fan-out Workers

Gateways

Users

Key characteristics:

  • Small groups use direct push for lowest latency
  • Large groups use async fan-out to avoid write amplification
  • Threshold typically 50-100 members

Trade-offs:

  • Optimal latency for common case (1:1 and small groups)
  • Scales to large channels without overwhelming gateways
  • Flexible resource allocation
  • Two code paths to maintain
  • Threshold tuning required
  • Slightly inconsistent delivery characteristics

Real-world example: Discord uses this approach. Small servers get direct fan-out; large servers (100+ members) route through Kafka for distributed processing.

FactorConnection-CentricQueue-CentricHybrid
Latency (p50)50-100ms100-300ms50-300ms
Max group size~500UnlimitedUnlimited
ComplexityModerateHighHighest
Failure isolationGateway-levelTopic-levelMixed
Replay capabilityLimitedNativeMixed
Production examplesWhatsAppSlackDiscord

This article focuses on Path C (Hybrid) because:

  1. Covers the full spectrum of use cases (1:1 to large channels)
  2. Represents modern production architectures (Discord, Telegram)
  3. Demonstrates trade-off thinking expected in system design interviews
  4. Balances latency optimization with scalability

Storage Layer

Message Queue

Core Services

Gateway Layer

Edge Layer

Client Layer

Mobile/Web App

WebSocket Client

SQLite/IndexedDB

(Local Cache)

Sync Engine

Route 53

(GeoDNS)

NLB

(Layer 4)

WebSocket Gateway

(Stateful)

Connection Manager

Heartbeat Handler

Message Service

User Service

Group Service

Presence Service

Sync Service

Push Notification Service

Kafka Cluster

fanout.messages

push.notifications

Redis Cluster

(Sessions, Presence)

ScyllaDB

Messages

PostgreSQL

Users, Groups

S3

(Media)

Manages persistent connections and routes messages between clients and services.

Responsibilities:

  • WebSocket connection lifecycle (connect, heartbeat, disconnect)
  • Authentication and session validation
  • Message routing to appropriate services
  • Presence event broadcasting
  • Graceful connection migration on shutdown

Design decisions:

DecisionChoiceRationale
ProtocolWebSocket over TLSFull-duplex, minimal overhead, universal support
Session affinityConsistent hashing by user_idPredictable routing, simplifies state management
Heartbeat interval30 secondsBalance between detection speed and overhead
Connection timeout90 seconds3 missed heartbeats before disconnect
Max connections/server500KLinux file descriptor limits with tuning

Scaling approach:

  • Horizontal scaling with consistent hashing
  • User-to-gateway mapping stored in Redis
  • Graceful drain on shutdown (notify clients to reconnect)

Core service for message processing, persistence, and routing.

State per message:

interface Message {
messageId: string // UUID, client-generated for idempotency
conversationId: string // 1:1 or group conversation
senderId: string
content: MessageContent
timestamp: number // Server-assigned Unix timestamp
sequenceNumber: bigint // Per-conversation monotonic sequence
status: MessageStatus // PENDING | SENT | DELIVERED | READ
expiresAt?: number // Optional TTL for ephemeral messages
}
interface MessageContent {
type: "text" | "image" | "file" | "location"
text?: string
mediaUrl?: string
metadata?: Record<string, any>
}
type MessageStatus = "PENDING" | "SENT" | "DELIVERED" | "READ"

Message flow:

  1. Receive: Gateway forwards message with client-generated messageId
  2. Deduplicate: Check messageId in recent message cache (idempotency)
  3. Validate: Verify sender membership in conversation, rate limits
  4. Persist: Write to ScyllaDB with server timestamp and sequence number
  5. Route: Determine delivery path (direct push vs. Kafka)
  6. Acknowledge: Return sequence number to sender
  7. Fan-out: Distribute to recipients via appropriate channel

Handles online/offline status, typing indicators, and last-seen timestamps.

Design decisions:

  • No persistence: Presence reconstructed from heartbeats
  • TTL-based: Status expires automatically on disconnect
  • Pub/Sub distribution: Redis pub/sub for real-time updates
  • Throttled updates: Max 1 presence update per second per user

Data structures:

interface UserPresence {
userId: string
status: "online" | "away" | "offline"
lastSeen: number // Unix timestamp
deviceType: "mobile" | "web" | "desktop"
typingIn?: string // conversationId if typing
typingExpires?: number // Auto-clear after 5 seconds
}

Presence subscription model:

  • Users subscribe to presence of their contacts on connect
  • Changes broadcast via Redis pub/sub to subscribed gateways
  • Gateways filter and forward to relevant connected clients

Handles message history retrieval and offline synchronization.

Sync protocol:

  • Client maintains lastSequenceNumber per conversation
  • On reconnect, client sends list of (conversationId, lastSeq) pairs
  • Server returns all messages with sequence > lastSeq
  • Client merges into local database, deduplicating by messageId

Pagination strategy:

  • Default page size: 50 messages
  • Cursor-based pagination using (conversationId, sequenceNumber)
  • Supports forward (newer) and backward (older) fetching
wss://chat.example.com/ws?token={jwt}&device_id={uuid}

Initial connection message (server → client):

{
"type": "connected",
"connectionId": "conn_abc123",
"serverTime": 1706886400000,
"heartbeatInterval": 30000,
"resumeToken": "resume_xyz789"
}

Send Message:

{
"type": "message.send",
"id": "req_001",
"payload": {
"messageId": "msg_uuid_client_generated",
"conversationId": "conv_abc123",
"content": {
"type": "text",
"text": "Hello, world!"
}
}
}

Typing Indicator:

{
"type": "typing.start",
"payload": {
"conversationId": "conv_abc123"
}
}

Mark Read:

{
"type": "message.read",
"payload": {
"conversationId": "conv_abc123",
"upToSequence": 1542
}
}

Heartbeat:

{
"type": "heartbeat",
"timestamp": 1706886400000
}

Message Acknowledgment:

{
"type": "message.ack",
"id": "req_001",
"payload": {
"messageId": "msg_uuid_client_generated",
"sequenceNumber": 1543,
"timestamp": 1706886400123
}
}

New Message (from another user):

{
"type": "message.new",
"payload": {
"messageId": "msg_xyz789",
"conversationId": "conv_abc123",
"senderId": "user_456",
"content": {
"type": "text",
"text": "Hi there!"
},
"sequenceNumber": 1544,
"timestamp": 1706886401000
}
}

Delivery Receipt:

{
"type": "message.delivered",
"payload": {
"conversationId": "conv_abc123",
"messageIds": ["msg_uuid_1", "msg_uuid_2"],
"deliveredTo": "user_456",
"timestamp": 1706886402000
}
}

Read Receipt:

{
"type": "message.read_receipt",
"payload": {
"conversationId": "conv_abc123",
"readUpToSequence": 1544,
"readBy": "user_456",
"timestamp": 1706886403000
}
}

Presence Update:

{
"type": "presence.update",
"payload": {
"userId": "user_456",
"status": "online",
"typingIn": null
}
}

Typing Indicator:

{
"type": "typing.update",
"payload": {
"conversationId": "conv_abc123",
"userId": "user_456",
"isTyping": true
}
}

Endpoint: POST /api/v1/sync

Request:

{
"conversations": [
{ "conversationId": "conv_abc", "lastSequence": 1500 },
{ "conversationId": "conv_xyz", "lastSequence": 2300 }
],
"limit": 100
}

Response (200 OK):

{
"conversations": [
{
"conversationId": "conv_abc",
"messages": [
{
"messageId": "msg_001",
"senderId": "user_123",
"content": { "type": "text", "text": "Hello" },
"sequenceNumber": 1501,
"timestamp": 1706886400000,
"status": "DELIVERED"
}
],
"hasMore": false
}
],
"serverTime": 1706886500000
}

Endpoint: POST /api/v1/conversations

Request:

{
"type": "direct",
"participantIds": ["user_456"]
}

Response (201 Created):

{
"conversationId": "conv_new123",
"type": "direct",
"participants": [
{ "userId": "user_123", "role": "member" },
{ "userId": "user_456", "role": "member" }
],
"createdAt": "2024-02-03T10:00:00Z"
}

Endpoint: POST /api/v1/groups

Request:

{
"name": "Project Team",
"participantIds": ["user_456", "user_789"],
"settings": {
"onlyAdminsCanPost": false,
"allowMemberInvites": true
}
}

Response (201 Created):

{
"conversationId": "conv_group_abc",
"type": "group",
"name": "Project Team",
"participants": [
{ "userId": "user_123", "role": "admin" },
{ "userId": "user_456", "role": "member" },
{ "userId": "user_789", "role": "member" }
],
"memberCount": 3,
"createdAt": "2024-02-03T10:00:00Z"
}

Endpoint: GET /api/v1/conversations/{id}/messages?before={sequence}&limit=50

Response (200 OK):

{
"messages": [
{
"messageId": "msg_xyz",
"senderId": "user_456",
"content": { "type": "text", "text": "Earlier message" },
"sequenceNumber": 1450,
"timestamp": 1706880000000
}
],
"hasMore": true,
"nextCursor": "seq_1449"
}
CodeErrorWhen
400INVALID_MESSAGEMessage format invalid
401UNAUTHORIZEDInvalid or expired token
403FORBIDDENNot a member of conversation
404CONVERSATION_NOT_FOUNDConversation doesn’t exist
409DUPLICATE_MESSAGEmessageId already processed
429RATE_LIMITEDToo many messages

Rate limit response:

{
"error": "RATE_LIMITED",
"message": "Message rate limit exceeded",
"retryAfter": 5,
"limit": "100 messages per minute"
}

Table design for time-series message access:

CREATE TABLE messages (
conversation_id UUID,
sequence_number BIGINT,
message_id UUID,
sender_id UUID,
content_type TEXT,
content_text TEXT,
content_media_url TEXT,
timestamp TIMESTAMP,
status TEXT,
expires_at TIMESTAMP,
PRIMARY KEY ((conversation_id), sequence_number)
) WITH CLUSTERING ORDER BY (sequence_number DESC);

Why ScyllaDB:

  • Optimized for time-series data (messages ordered by sequence)
  • Partition per conversation enables efficient range queries
  • No garbage collection pauses (C++ implementation)
  • Linear horizontal scaling

Performance characteristics:

  • Read latency p99: 15ms (vs. 40-125ms for Cassandra)
  • Write latency p99: 5ms
  • Partition size recommendation: < 100MB (~200K messages per conversation)

Partition hot-spot mitigation:

  • Very active conversations split into time-bucketed partitions
  • Partition key: (conversation_id, time_bucket)
  • Time bucket: daily for active chats, monthly for archives

Purpose: Prevent duplicate message processing (idempotency)

# Set messageId with 24-hour TTL
SETEX msg:dedup:{message_id} 86400 1
# Check before processing
EXISTS msg:dedup:{message_id}
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
username VARCHAR(50) UNIQUE NOT NULL,
display_name VARCHAR(100),
avatar_url TEXT,
phone_hash VARCHAR(64) UNIQUE,
created_at TIMESTAMPTZ DEFAULT NOW(),
last_active_at TIMESTAMPTZ
);
CREATE TABLE conversations (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
type VARCHAR(20) NOT NULL, -- 'direct' or 'group'
name VARCHAR(100), -- NULL for direct
avatar_url TEXT,
created_by UUID REFERENCES users(id),
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW(),
last_message_at TIMESTAMPTZ,
last_sequence BIGINT DEFAULT 0,
member_count INT DEFAULT 0,
settings JSONB DEFAULT '{}'
);
CREATE TABLE conversation_members (
conversation_id UUID REFERENCES conversations(id) ON DELETE CASCADE,
user_id UUID REFERENCES users(id) ON DELETE CASCADE,
role VARCHAR(20) DEFAULT 'member', -- 'admin', 'member'
joined_at TIMESTAMPTZ DEFAULT NOW(),
last_read_sequence BIGINT DEFAULT 0,
muted_until TIMESTAMPTZ,
PRIMARY KEY (conversation_id, user_id)
);
-- Indexes
CREATE INDEX idx_members_user ON conversation_members(user_id);
CREATE INDEX idx_conversations_updated ON conversations(updated_at DESC);
# User's active connections (set)
SADD user:conn:{user_id} {connection_id}
SREM user:conn:{user_id} {connection_id}
# Connection → Gateway mapping (hash)
HSET conn:{connection_id}
gateway "gateway-1.us-east-1"
user_id "user_123"
device_id "device_abc"
connected_at 1706886400000
# User presence (hash with TTL)
HSET presence:{user_id}
status "online"
last_seen 1706886400000
device_type "mobile"
EXPIRE presence:{user_id} 120
# Typing indicators (with auto-expire)
SETEX typing:{conversation_id}:{user_id} 5 1
Data TypeStoreRationale
MessagesScyllaDBTime-series optimized, low latency, horizontal scale
User profilesPostgreSQLACID, complex queries, moderate scale
Conversation metadataPostgreSQLRelational queries, ACL management
Sessions, presenceRedis ClusterSub-ms latency, TTL support, pub/sub
Message dedup cacheRedisFast lookups, automatic expiry
Media filesS3Object storage, CDN integration
Analytics eventsKafka → ClickHouseHigh-volume time-series analytics

For conversations with < 100 members:

12 collapsed lines
class DirectPushHandler {
private readonly redis: RedisCluster
private readonly messageStore: MessageStore
async deliverMessage(message: Message): Promise<void> {
// 1. Get all members of conversation
const members = await this.getConversationMembers(message.conversationId)
// 2. For each member, find their active connections
const deliveryTasks = members
.filter((m) => m.userId !== message.senderId)
.map(async (member) => {
const connections = await this.redis.smembers(`user:conn:${member.userId}`)
if (connections.length > 0) {
// User is online - push directly
await Promise.all(connections.map((connId) => this.pushToConnection(connId, message)))
return { userId: member.userId, status: "pushed" }
} else {
// User is offline - queue for push notification
await this.queuePushNotification(member.userId, message)
return { userId: member.userId, status: "queued" }
}
})
const results = await Promise.all(deliveryTasks)
// 3. Update delivery status
const deliveredTo = results.filter((r) => r.status === "pushed").map((r) => r.userId)
if (deliveredTo.length > 0) {
await this.notifyDeliveryReceipt(message, deliveredTo)
}
}
private async pushToConnection(connId: string, message: Message): Promise<void> {
const connInfo = await this.redis.hgetall(`conn:${connId}`)
const gateway = connInfo.gateway
// Send via internal RPC to gateway
await this.gatewayClient.send(gateway, {
type: "deliver",
connectionId: connId,
message,
})
}
}

For conversations with >= 100 members:

15 collapsed lines
class KafkaFanoutHandler {
private readonly kafka: KafkaProducer
private readonly FANOUT_TOPIC = "messages.fanout"
async publishForFanout(message: Message, memberCount: number): Promise<void> {
// Partition by conversation for ordering guarantee
await this.kafka.send({
topic: this.FANOUT_TOPIC,
messages: [
{
key: message.conversationId,
value: JSON.stringify({
message,
memberCount,
publishedAt: Date.now(),
}),
},
],
})
}
}
// Fan-out consumer (multiple instances)
class FanoutConsumer {
private readonly BATCH_SIZE = 100
async processMessage(record: KafkaRecord): Promise<void> {
const { message, memberCount } = JSON.parse(record.value)
// Process members in batches to avoid memory pressure
let offset = 0
while (offset < memberCount) {
const memberBatch = await this.getMemberBatch(message.conversationId, offset, this.BATCH_SIZE)
await Promise.all(memberBatch.map((member) => this.deliverToMember(member, message)))
offset += this.BATCH_SIZE
}
}
}
8 collapsed lines
class SequenceGenerator {
private readonly redis: RedisCluster
async getNextSequence(conversationId: string): Promise<bigint> {
// Atomic increment in Redis
const sequence = await this.redis.incr(`seq:${conversationId}`)
return BigInt(sequence)
}
}
class MessageProcessor {
async processIncoming(conversationId: string, message: IncomingMessage): Promise<ProcessedMessage> {
// Acquire conversation lock for ordering
const lock = await this.acquireLock(`lock:msg:${conversationId}`, 5000)
try {
// Assign sequence number
const sequenceNumber = await this.sequenceGenerator.getNextSequence(conversationId)
// Assign server timestamp
const timestamp = Date.now()
const processed: ProcessedMessage = {
...message,
sequenceNumber,
timestamp,
status: "SENT",
}
// Persist with sequence number
await this.messageStore.insert(processed)
return processed
} finally {
await lock.release()
}
}
}
10 collapsed lines
class ClientMessageBuffer {
private pendingMessages: Map<string, Message[]> = new Map()
private lastSequence: Map<string, bigint> = new Map()
onMessageReceived(message: Message): void {
const expected = (this.lastSequence.get(message.conversationId) || 0n) + 1n
if (message.sequenceNumber === expected) {
// In order - deliver immediately
this.deliverToUI(message)
this.lastSequence.set(message.conversationId, message.sequenceNumber)
// Check for buffered messages that can now be delivered
this.flushBuffer(message.conversationId)
} else if (message.sequenceNumber > expected) {
// Out of order - buffer and request missing
this.bufferMessage(message)
this.requestMissing(message.conversationId, expected, message.sequenceNumber)
}
// If sequence < expected, it's a duplicate - ignore
}
private flushBuffer(conversationId: string): void {
const buffer = this.pendingMessages.get(conversationId) || []
buffer.sort((a, b) => Number(a.sequenceNumber - b.sequenceNumber))
let expected = (this.lastSequence.get(conversationId) || 0n) + 1n
while (buffer.length > 0 && buffer[0].sequenceNumber === expected) {
const msg = buffer.shift()!
this.deliverToUI(msg)
this.lastSequence.set(conversationId, msg.sequenceNumber)
expected++
}
this.pendingMessages.set(conversationId, buffer)
}
}
10 collapsed lines
class PresenceManager {
private readonly PRESENCE_TTL = 120 // seconds
private readonly TYPING_TTL = 5 // seconds
async handleHeartbeat(userId: string, deviceType: string): Promise<void> {
const now = Date.now()
// Update presence with TTL
await this.redis
.multi()
.hset(`presence:${userId}`, {
status: "online",
last_seen: now,
device_type: deviceType,
})
.expire(`presence:${userId}`, this.PRESENCE_TTL)
.exec()
// Publish presence change to subscribers
await this.redis.publish(
`presence:changes`,
JSON.stringify({
userId,
status: "online",
timestamp: now,
}),
)
}
async handleDisconnect(userId: string): Promise<void> {
// Check if user has other active connections
const connections = await this.redis.smembers(`user:conn:${userId}`)
if (connections.length === 0) {
// No more connections - mark offline
const now = Date.now()
await this.redis.hset(`presence:${userId}`, {
status: "offline",
last_seen: now,
})
await this.redis.publish(
`presence:changes`,
JSON.stringify({
userId,
status: "offline",
lastSeen: now,
}),
)
}
}
async setTyping(userId: string, conversationId: string): Promise<void> {
await this.redis.setex(`typing:${conversationId}:${userId}`, this.TYPING_TTL, "1")
// Notify conversation members
await this.redis.publish(
`typing:${conversationId}`,
JSON.stringify({
userId,
isTyping: true,
}),
)
}
}
12 collapsed lines
class PresenceSubscriber {
private subscribedUsers: Set<string> = new Set()
async subscribeToContacts(userId: string, contactIds: string[]): Promise<void> {
// Get current status of all contacts
const pipeline = this.redis.pipeline()
contactIds.forEach((id) => {
pipeline.hgetall(`presence:${id}`)
})
const results = await pipeline.exec()
// Send initial presence state
const presences = contactIds.map((id, i) => ({
userId: id,
...(results[i][1] || { status: "offline" }),
}))
this.sendToClient({ type: "presence.bulk", payload: { presences } })
// Subscribe to changes
contactIds.forEach((id) => this.subscribedUsers.add(id))
}
onPresenceChange(change: PresenceChange): void {
if (this.subscribedUsers.has(change.userId)) {
this.sendToClient({
type: "presence.update",
payload: change,
})
}
}
}
15 collapsed lines
class SyncService {
async syncConversations(
userId: string,
syncState: Array<{ conversationId: string; lastSequence: bigint }>,
): Promise<SyncResponse> {
const results: ConversationSync[] = []
for (const { conversationId, lastSequence } of syncState) {
// Verify user is member
const isMember = await this.checkMembership(userId, conversationId)
if (!isMember) continue
// Fetch missed messages
const messages = await this.messageStore.getMessagesAfter(
conversationId,
lastSequence,
100, // limit per conversation
)
// Get conversation metadata if changed
const conversation = await this.conversationStore.get(conversationId)
results.push({
conversationId,
messages,
hasMore: messages.length === 100,
lastSequence: messages.length > 0 ? messages[messages.length - 1].sequenceNumber : lastSequence,
unreadCount: await this.getUnreadCount(userId, conversationId),
})
}
// Also check for new conversations
const newConversations = await this.getNewConversations(userId, syncState)
return {
conversations: results,
newConversations,
serverTime: Date.now(),
}
}
}

Reconnection with exponential backoff:

15 collapsed lines
class WebSocketManager {
private ws: WebSocket | null = null
private reconnectAttempt = 0
private readonly MAX_RECONNECT_DELAY = 30000
private readonly BASE_DELAY = 1000
connect(): void {
this.ws = new WebSocket(this.buildUrl())
this.ws.onopen = () => {
this.reconnectAttempt = 0
this.onConnected()
}
this.ws.onclose = (event) => {
if (!event.wasClean) {
this.scheduleReconnect()
}
}
this.ws.onerror = () => {
// Will trigger onclose
}
}
private scheduleReconnect(): void {
const delay = Math.min(
this.BASE_DELAY * Math.pow(2, this.reconnectAttempt) + Math.random() * 1000, // Jitter
this.MAX_RECONNECT_DELAY,
)
this.reconnectAttempt++
setTimeout(() => this.connect(), delay)
}
}

IndexedDB schema for offline support:

20 collapsed lines
interface LocalDBSchema {
messages: {
key: [string, number] // [conversationId, sequenceNumber]
value: Message
indexes: {
"by-conversation": string
"by-timestamp": number
"by-status": string
}
}
conversations: {
key: string // conversationId
value: ConversationMeta
indexes: {
"by-updated": number
}
}
syncState: {
key: string // conversationId
value: { lastSequence: number; lastSync: number }
}
}
class LocalMessageStore {
private db: IDBDatabase
async saveMessage(message: Message): Promise<void> {
const tx = this.db.transaction("messages", "readwrite")
await tx.objectStore("messages").put(message)
}
async getMessages(conversationId: string, options: { before?: number; limit: number }): Promise<Message[]> {
const tx = this.db.transaction("messages", "readonly")
const index = tx.objectStore("messages").index("by-conversation")
const range = IDBKeyRange.bound([conversationId, 0], [conversationId, options.before || Number.MAX_SAFE_INTEGER])
const messages: Message[] = []
let cursor = await index.openCursor(range, "prev")
while (cursor && messages.length < options.limit) {
messages.push(cursor.value)
cursor = await cursor.continue()
}
return messages
}
}
10 collapsed lines
class MessageSender {
async sendMessage(conversationId: string, content: MessageContent): Promise<void> {
const clientMessageId = crypto.randomUUID()
const optimisticMessage: Message = {
messageId: clientMessageId,
conversationId,
senderId: this.currentUserId,
content,
timestamp: Date.now(),
sequenceNumber: -1n, // Pending
status: "PENDING",
}
// 1. Show immediately in UI
this.messageStore.addOptimistic(optimisticMessage)
this.ui.appendMessage(optimisticMessage)
// 2. Persist to local DB
await this.localDb.saveMessage(optimisticMessage)
// 3. Send to server
try {
const ack = await this.ws.sendAndWait({
type: "message.send",
payload: {
messageId: clientMessageId,
conversationId,
content,
},
})
// 4. Update with server-assigned values
const confirmedMessage = {
...optimisticMessage,
sequenceNumber: ack.sequenceNumber,
timestamp: ack.timestamp,
status: "SENT",
}
this.messageStore.updateOptimistic(clientMessageId, confirmedMessage)
await this.localDb.saveMessage(confirmedMessage)
} catch (error) {
// 5. Mark as failed
this.messageStore.markFailed(clientMessageId)
this.ui.showRetryOption(clientMessageId)
}
}
}
15 collapsed lines
interface VirtualListConfig {
containerHeight: number
itemHeight: number // Estimated, variable heights supported
overscan: number // Extra items to render above/below viewport
}
class VirtualMessageList {
private visibleRange = { start: 0, end: 0 }
private heightCache = new Map<string, number>()
calculateVisibleRange(scrollTop: number): { start: number; end: number } {
const messages = this.getMessages()
let accumulatedHeight = 0
let start = 0
let end = messages.length
// Find start index
for (let i = 0; i < messages.length; i++) {
const height = this.getItemHeight(messages[i])
if (accumulatedHeight + height > scrollTop - this.config.overscan * 50) {
start = i
break
}
accumulatedHeight += height
}
// Find end index
accumulatedHeight = 0
for (let i = start; i < messages.length; i++) {
accumulatedHeight += this.getItemHeight(messages[i])
if (accumulatedHeight > this.config.containerHeight + this.config.overscan * 50) {
end = i + 1
break
}
}
return { start, end }
}
// Render only visible messages
render(): MessageItem[] {
const { start, end } = this.visibleRange
return this.getMessages().slice(start, end)
}
}
ComponentPurposeOptions
WebSocket GatewayPersistent connectionsNginx (ws), HAProxy, Envoy
Message QueueAsync delivery, orderingKafka, Pulsar, NATS JetStream
KV StoreSessions, presenceRedis, KeyDB, Dragonfly
Message StoreMessage persistenceScyllaDB, Cassandra, DynamoDB
Relational DBUser/group metadataPostgreSQL, CockroachDB
Object StoreMedia filesMinIO, Ceph, S3-compatible
Push GatewayMobile notificationsSelf-hosted or APNs/FCM proxy

Push Notifications

Streaming

Data

EKS Cluster

Edge

Worker Tier

Service Tier

Gateway Tier

Route 53

(GeoDNS)

CloudFront

(Static + WebSocket)

WebSocket Pods

(Fargate, 4CPU/8GB)

Message Service

(Fargate)

Sync Service

(Fargate)

Fan-out Workers

(Fargate Spot)

Push Workers

(Fargate Spot)

ElastiCache Redis

(Cluster Mode)

Keyspaces

(Cassandra-compatible)

RDS PostgreSQL

(Multi-AZ)

S3

(Media)

Amazon MSK

(Kafka)

SNS

(Mobile Push)

Service configurations:

ServiceConfigurationRationale
WebSocket (Fargate)4 vCPU, 8GB, 500K conn/podMemory for connection state
Message Service2 vCPU, 4GBStateless, CPU-bound
Fan-out Workers2 vCPU, 4GB, SpotCost optimization for async
ElastiCache Redisr6g.2xlarge cluster modeSub-ms presence lookups
KeyspacesOn-demandServerless Cassandra for messages
RDS PostgreSQLdb.r6g.xlarge Multi-AZMetadata, moderate write load
MSKkafka.m5.large, 3 brokers6.5 Gbps throughput capacity

WebSocket connection limits:

  • Linux default: 1024 file descriptors per process
  • Tuned: 1M+ with sysctl adjustments
  • Practical per pod: 500K (memory constrained)
  • 50M concurrent users → 100 gateway pods minimum

Kafka partitioning:

  • Partition by conversation_id for ordering
  • Minimum partitions: 100 (allows 100 parallel consumers)
  • Hot partition handling: re-partition extremely active conversations

Message storage partitioning:

  • ScyllaDB partition key: conversation_id
  • Max partition size: 100MB (~200K messages)
  • Very active conversations: add time bucket to partition key

Presence fan-out:

  • Redis pub/sub scales to ~10K subscribers per channel
  • For users with 10K+ contacts: use hierarchical pub/sub or dedicated presence servers

This design provides real-time chat and messaging with:

  1. Sub-500ms message delivery via WebSocket with hybrid push/pull fan-out
  2. At-least-once delivery with client-side deduplication for reliability
  3. Strong per-conversation ordering through server-assigned sequence numbers
  4. Seamless offline support via local storage and sync protocol
  5. Scalable presence using Redis pub/sub with heartbeat-based status

Key architectural decisions:

  • Hybrid fan-out balances latency (direct push) with scalability (Kafka for large groups)
  • Server-assigned timestamps eliminate clock skew issues
  • Client-generated message IDs enable idempotent retry
  • Per-conversation partitioning ensures ordering without global coordination

Known limitations:

  • Server dependency for message ordering (no P2P)
  • At-least-once delivery requires client deduplication logic
  • Presence accuracy limited by heartbeat interval (30s staleness possible)
  • Large group delivery latency higher than 1:1 messages

Future enhancements:

  • End-to-end encryption with Signal protocol
  • Reactions and threaded replies
  • Message editing and deletion with tombstones
  • Voice and video calling integration (WebRTC)
  • Distributed systems fundamentals (consistency models, partitioning)
  • Real-time communication patterns (WebSocket, pub/sub)
  • Message queue concepts (Kafka partitions, consumer groups)
  • Database selection trade-offs (SQL vs. NoSQL)
TermDefinition
Fan-outDistributing a message to multiple recipients
Sequence numberMonotonically increasing identifier for ordering messages within a conversation
PresenceUser’s online/offline status and activity indicators
IdempotencyProperty ensuring duplicate requests produce the same result
HeartbeatPeriodic signal from client to server indicating connection is alive
ACKAcknowledgment message confirming receipt
TTLTime-to-live; automatic expiration of data after specified duration
  • Real-time chat requires persistent connections (WebSocket), reliable delivery (at-least-once with dedup), and ordering guarantees (server-assigned sequence numbers)
  • Hybrid fan-out (direct push for small groups, Kafka for large) balances latency with scalability
  • ScyllaDB provides time-series optimized storage with sub-15ms p99 read latency
  • Redis handles ephemeral state: sessions, presence, typing indicators with pub/sub distribution
  • Client-side sync protocol with sequence tracking enables seamless offline-to-online transitions
  • Scale to 350K+ messages/second with horizontal gateway scaling and partitioned message queues

Real-World Implementations:

Protocol Specifications:

Distributed Systems Theory:

Related Articles:

Read more

  • Previous

    Leaderboard Design

    System Design / System Design Building Blocks 16 min read

    Building real-time ranking systems that scale from thousands to hundreds of millions of players. This article covers the data structures, partitioning strategies, tie-breaking approaches, and scaling techniques that power leaderboards at gaming platforms, fitness apps, and competitive systems.

  • Next

    Design Uber-Style Ride Hailing

    System Design / System Design Problems 23 min read

    A comprehensive system design for a ride-hailing platform handling real-time driver-rider matching, geospatial indexing at scale, dynamic pricing, and sub-second location tracking. This design addresses the core challenges of matching millions of riders with drivers in real-time while optimizing for ETAs, driver utilization, and surge pricing across global markets.