Design Real-Time Chat and Messaging

A comprehensive system design for real-time chat and messaging covering connection management, message delivery guarantees, ordering strategies, presence systems, group chat fan-out, and offline synchronization. This design addresses sub-second message delivery at WhatsApp/Discord scale (100B+ messages/day) with strong delivery guarantees and mobile-first offline resilience.

High-level architecture: WebSocket gateways handle persistent connections, Kafka provides message routing, and fan-out service distributes messages to recipients.

Abstract

Real-time chat systems solve three interrelated problems: low-latency delivery (messages appear within milliseconds), reliable delivery (no message is ever lost), and ordering consistency (messages appear in the same order for all participants).

Core architectural decisions:

Decision	Choice	Rationale
Transport	WebSocket	Full-duplex, 2-byte overhead after handshake
Delivery guarantee	At-least-once + client dedup	Simpler than exactly-once; idempotency at app layer
Message ordering	Server-assigned timestamps	Single source of truth; avoids clock skew issues
Fan-out model	Hybrid push/pull	Push for small groups, pull for large channels
Presence	Heartbeat + Redis pub/sub	Ephemeral data; no persistence needed
Offline sync	Client-side sequence tracking	Fetch missed messages on reconnect

Key trade-offs accepted:

Server dependency for ordering (no P2P) in exchange for correctness guarantees
At-least-once delivery requiring client-side deduplication
Eventual consistency for presence (acceptable for UX)
Higher infrastructure cost for guaranteed delivery (message queue durability)

What this design optimizes:

Sub-500ms global message delivery
Zero message loss under network partitions
Seamless offline-to-online transitions
Horizontal scalability to billions of messages/day

Requirements

Functional Requirements

Requirement	Priority	Notes
1:1 direct messaging	Core	Private conversations between two users
Group messaging	Core	Up to 1000 members per group
Message delivery receipts	Core	Sent, delivered, read indicators
Typing indicators	Core	Real-time “user is typing” display
Online/offline presence	Core	Show user availability status
Offline message delivery	Core	Queue and deliver when user reconnects
Message history sync	Core	Retrieve past messages across devices
Read receipts	Extended	Track who has read messages
Media attachments	Extended	Images, videos, files (out of detailed scope)
End-to-end encryption	Extended	Signal protocol (out of detailed scope)

Non-Functional Requirements

Requirement	Target	Rationale
Availability	99.99% (4 nines)	Communication is critical; 52 min/year downtime max
Message delivery latency	p99 < 500ms	Real-time feel requires sub-second
Message durability	99.9999%	No message should ever be lost
Offline sync time	< 5s for 1000 messages	Fast reconnection experience
Concurrent connections	10M per region	Mobile-scale concurrent users
Message retention	30 days default, configurable	Storage cost vs. user expectations

Scale Estimation

Users:

Monthly Active Users (MAU): 500M
Daily Active Users (DAU): 200M (40% of MAU)
Peak concurrent connections: 50M (25% of DAU)

Traffic:

Messages per user per day: 50 (mix of 1:1 and group)
Daily messages: 200M × 50 = 10B messages/day
Peak messages per second: 10B / 86400 × 3 (peak multiplier) = 350K msgs/sec

Storage:

Average message size: 500 bytes (text + metadata)
Daily storage: 10B × 500B = 5TB/day
30-day retention: 150TB
With replication (3x): 450TB

Connections:

WebSocket connections per gateway: 500K (Linux file descriptor limits)
Gateway servers needed: 50M / 500K = 100 servers minimum
With redundancy (2x): 200 gateway servers

Design Paths

Path A: Connection-Centric (Server-Routed)

Best when:

Infrastructure team can maintain stateful WebSocket servers
Low latency is primary requirement
Moderate group sizes (< 500 members)
Strong consistency for message ordering needed

Architecture:

Key characteristics:

Each gateway maintains user-to-connection mapping
Message service routes directly to recipient’s gateway
Synchronous delivery with acknowledgment chain

Trade-offs:

✅ Lowest latency (direct routing)
✅ Simple mental model
✅ Strong ordering guarantees
❌ Gateway state management complexity
❌ User migration on gateway failure
❌ Limited group size due to fan-out cost

Real-world example: WhatsApp uses this approach with TCP persistent connections. Each server maintains ~1-2M connections. Messages route through servers using recipient’s assigned gateway.

Path B: Queue-Centric (Async Fan-out)

Best when:

Very large groups/channels (1000+ members)
Geographic distribution across regions
Tolerance for slightly higher latency (100-500ms)
Need for replay and audit capabilities

Architecture:

Key characteristics:

Messages published to Kafka partitioned by conversation
Fan-out workers consume and distribute to recipients
Decouples send from delivery for reliability

Trade-offs:

✅ Handles large fan-out efficiently
✅ Built-in replay capability
✅ Better failure isolation
❌ Higher latency (queue hop)
❌ More complex infrastructure
❌ Ordering requires partition strategy

Real-world example: Slack uses this approach with Kafka handling 6.5 Gbps peak throughput across 10 clusters. Channel servers use consistent hashing to maintain per-channel ordering.

Path C: Hybrid (Push for Small, Pull for Large)

Best when:

Mix of 1:1, small groups, and large channels
Need to balance latency vs. resource usage
Celebrity/influencer use cases with massive follower counts

Architecture:

Key characteristics:

Small groups use direct push for lowest latency
Large groups use async fan-out to avoid write amplification
Threshold typically 50-100 members

Trade-offs:

✅ Optimal latency for common case (1:1 and small groups)
✅ Scales to large channels without overwhelming gateways
✅ Flexible resource allocation
❌ Two code paths to maintain
❌ Threshold tuning required
❌ Slightly inconsistent delivery characteristics

Real-world example: Discord uses this approach. Small servers get direct fan-out; large servers (100+ members) route through Kafka for distributed processing.

Path Comparison

Factor	Connection-Centric	Queue-Centric	Hybrid
Latency (p50)	50-100ms	100-300ms	50-300ms
Max group size	~500	Unlimited	Unlimited
Complexity	Moderate	High	Highest
Failure isolation	Gateway-level	Topic-level	Mixed
Replay capability	Limited	Native	Mixed
Production examples	WhatsApp	Slack	Discord

This Article’s Focus

This article focuses on Path C (Hybrid) because:

Covers the full spectrum of use cases (1:1 to large channels)
Represents modern production architectures (Discord, Telegram)
Demonstrates trade-off thinking expected in system design interviews
Balances latency optimization with scalability

WebSocket connection lifecycle (connect, heartbeat, disconnect)
Authentication and session validation
Message routing to appropriate services
Presence event broadcasting
Graceful connection migration on shutdown

Design decisions:

Decision	Choice	Rationale
Protocol	WebSocket over TLS	Full-duplex, minimal overhead, universal support
Session affinity	Consistent hashing by user_id	Predictable routing, simplifies state management
Heartbeat interval	30 seconds	Balance between detection speed and overhead
Connection timeout	90 seconds	3 missed heartbeats before disconnect
Max connections/server	500K	Linux file descriptor limits with tuning

Scaling approach:

Horizontal scaling with consistent hashing
User-to-gateway mapping stored in Redis
Graceful drain on shutdown (notify clients to reconnect)

Message Service

Core service for message processing, persistence, and routing.

State per message:

1
interface Message {
2
  messageId: string // UUID, client-generated for idempotency
3
  conversationId: string // 1:1 or group conversation
4
  senderId: string
5
  content: MessageContent
6
  timestamp: number // Server-assigned Unix timestamp
7
  sequenceNumber: bigint // Per-conversation monotonic sequence
8
  status: MessageStatus // PENDING | SENT | DELIVERED | READ
9
  expiresAt?: number // Optional TTL for ephemeral messages
10
}
11

12
interface MessageContent {
13
  type: "text" | "image" | "file" | "location"
14
  text?: string
15
  mediaUrl?: string
16
  metadata?: Record<string, any>
17
}
18

19
type MessageStatus = "PENDING" | "SENT" | "DELIVERED" | "READ"

Message flow:

Receive: Gateway forwards message with client-generated messageId
Deduplicate: Check messageId in recent message cache (idempotency)
Validate: Verify sender membership in conversation, rate limits
Persist: Write to ScyllaDB with server timestamp and sequence number
Route: Determine delivery path (direct push vs. Kafka)
Acknowledge: Return sequence number to sender
Fan-out: Distribute to recipients via appropriate channel

Presence Service

Handles online/offline status, typing indicators, and last-seen timestamps.

Design decisions:

No persistence: Presence reconstructed from heartbeats
TTL-based: Status expires automatically on disconnect
Pub/Sub distribution: Redis pub/sub for real-time updates
Throttled updates: Max 1 presence update per second per user

Data structures:

1
interface UserPresence {
2
  userId: string
3
  status: "online" | "away" | "offline"
4
  lastSeen: number // Unix timestamp
5
  deviceType: "mobile" | "web" | "desktop"
6
  typingIn?: string // conversationId if typing
7
  typingExpires?: number // Auto-clear after 5 seconds
8
}

Presence subscription model:

Users subscribe to presence of their contacts on connect
Changes broadcast via Redis pub/sub to subscribed gateways
Gateways filter and forward to relevant connected clients

Sync Service

Handles message history retrieval and offline synchronization.

Sync protocol:

Client maintains lastSequenceNumber per conversation
On reconnect, client sends list of (conversationId, lastSeq) pairs
Server returns all messages with sequence > lastSeq
Client merges into local database, deduplicating by messageId

Pagination strategy:

Default page size: 50 messages
Cursor-based pagination using (conversationId, sequenceNumber)
Supports forward (newer) and backward (older) fetching

1
wss://chat.example.com/ws?token={jwt}&device_id={uuid}

Initial connection message (server → client):

1
{
2
  "type": "connected",
3
  "connectionId": "conn_abc123",
4
  "serverTime": 1706886400000,
5
  "heartbeatInterval": 30000,
6
  "resumeToken": "resume_xyz789"
7
}

Client → Server Messages

Send Message:

1
{
2
  "type": "message.send",
3
  "id": "req_001",
4
  "payload": {
5
    "messageId": "msg_uuid_client_generated",
6
    "conversationId": "conv_abc123",
7
    "content": {
8
      "type": "text",
9
      "text": "Hello, world!"
10
    }
11
  }
12
}

Typing Indicator:

1
{
2
  "type": "typing.start",
3
  "payload": {
4
    "conversationId": "conv_abc123"
5
  }
6
}

Mark Read:

1
{
2
  "type": "message.read",
3
  "payload": {
4
    "conversationId": "conv_abc123",
5
    "upToSequence": 1542
6
  }
7
}

Heartbeat:

1
{
2
  "type": "heartbeat",
3
  "timestamp": 1706886400000
4
}

Server → Client Messages

Message Acknowledgment:

1
{
2
  "type": "message.ack",
3
  "id": "req_001",
4
  "payload": {
5
    "messageId": "msg_uuid_client_generated",
6
    "sequenceNumber": 1543,
7
    "timestamp": 1706886400123
8
  }
9
}

New Message (from another user):

1
{
2
  "type": "message.new",
3
  "payload": {
4
    "messageId": "msg_xyz789",
5
    "conversationId": "conv_abc123",
6
    "senderId": "user_456",
7
    "content": {
8
      "type": "text",
9
      "text": "Hi there!"
10
    },
11
    "sequenceNumber": 1544,
12
    "timestamp": 1706886401000
13
  }
14
}

Delivery Receipt:

1
{
2
  "type": "message.delivered",
3
  "payload": {
4
    "conversationId": "conv_abc123",
5
    "messageIds": ["msg_uuid_1", "msg_uuid_2"],
6
    "deliveredTo": "user_456",
7
    "timestamp": 1706886402000
8
  }
9
}

Read Receipt:

1
{
2
  "type": "message.read_receipt",
3
  "payload": {
4
    "conversationId": "conv_abc123",
5
    "readUpToSequence": 1544,
6
    "readBy": "user_456",
7
    "timestamp": 1706886403000
8
  }
9
}

Presence Update:

1
{
2
  "type": "presence.update",
3
  "payload": {
4
    "userId": "user_456",
5
    "status": "online",
6
    "typingIn": null
7
  }
8
}

Typing Indicator:

1
{
2
  "type": "typing.update",
3
  "payload": {
4
    "conversationId": "conv_abc123",
5
    "userId": "user_456",
6
    "isTyping": true
7
  }
8
}

REST API

Sync Messages (Offline Recovery)

Endpoint: POST /api/v1/sync

Request:

1
{
2
  "conversations": [
3
    { "conversationId": "conv_abc", "lastSequence": 1500 },
4
    { "conversationId": "conv_xyz", "lastSequence": 2300 }
5
  ],
6
  "limit": 100
7
}

Response (200 OK):

1
{
2
  "conversations": [
3
    {
4
      "conversationId": "conv_abc",
5
      "messages": [
6
        {
7
          "messageId": "msg_001",
8
          "senderId": "user_123",
9
          "content": { "type": "text", "text": "Hello" },
10
          "sequenceNumber": 1501,
11
          "timestamp": 1706886400000,
12
          "status": "DELIVERED"
13
        }
14
      ],
15
      "hasMore": false
16
    }
17
  ],
18
  "serverTime": 1706886500000
19
}

Create Conversation

Endpoint: POST /api/v1/conversations

Request:

1
{
2
  "type": "direct",
3
  "participantIds": ["user_456"]
4
}

Response (201 Created):

1
{
2
  "conversationId": "conv_new123",
3
  "type": "direct",
4
  "participants": [
5
    { "userId": "user_123", "role": "member" },
6
    { "userId": "user_456", "role": "member" }
7
  ],
8
  "createdAt": "2024-02-03T10:00:00Z"
9
}

Create Group

Endpoint: POST /api/v1/groups

Request:

1
{
2
  "name": "Project Team",
3
  "participantIds": ["user_456", "user_789"],
4
  "settings": {
5
    "onlyAdminsCanPost": false,
6
    "allowMemberInvites": true
7
  }
8
}

Response (201 Created):

1
{
2
  "conversationId": "conv_group_abc",
3
  "type": "group",
4
  "name": "Project Team",
5
  "participants": [
6
    { "userId": "user_123", "role": "admin" },
7
    { "userId": "user_456", "role": "member" },
8
    { "userId": "user_789", "role": "member" }
9
  ],
10
  "memberCount": 3,
11
  "createdAt": "2024-02-03T10:00:00Z"
12
}

Message History

Endpoint: GET /api/v1/conversations/{id}/messages?before={sequence}&limit=50

Response (200 OK):

1
{
2
  "messages": [
3
    {
4
      "messageId": "msg_xyz",
5
      "senderId": "user_456",
6
      "content": { "type": "text", "text": "Earlier message" },
7
      "sequenceNumber": 1450,
8
      "timestamp": 1706880000000
9
    }
10
  ],
11
  "hasMore": true,
12
  "nextCursor": "seq_1449"
13
}

Error Responses

Code	Error	When
400	`INVALID_MESSAGE`	Message format invalid
401	`UNAUTHORIZED`	Invalid or expired token
403	`FORBIDDEN`	Not a member of conversation
404	`CONVERSATION_NOT_FOUND`	Conversation doesn’t exist
409	`DUPLICATE_MESSAGE`	messageId already processed
429	`RATE_LIMITED`	Too many messages

Rate limit response:

1
{
2
  "error": "RATE_LIMITED",
3
  "message": "Message rate limit exceeded",
4
  "retryAfter": 5,
5
  "limit": "100 messages per minute"
6
}

Data Modeling

Message Storage (ScyllaDB)

Table design for time-series message access:

1
CREATE TABLE messages (
2
    conversation_id UUID,
3
    sequence_number BIGINT,
4
    message_id UUID,
5
    sender_id UUID,
6
    content_type TEXT,
7
    content_text TEXT,
8
    content_media_url TEXT,
9
    timestamp TIMESTAMP,
10
    status TEXT,
11
    expires_at TIMESTAMP,
12
    PRIMARY KEY ((conversation_id), sequence_number)
13
) WITH CLUSTERING ORDER BY (sequence_number DESC);

Why ScyllaDB:

Optimized for time-series data (messages ordered by sequence)
Partition per conversation enables efficient range queries
No garbage collection pauses (C++ implementation)
Linear horizontal scaling

Performance characteristics:

Read latency p99: 15ms (vs. 40-125ms for Cassandra)
Write latency p99: 5ms
Partition size recommendation: < 100MB (~200K messages per conversation)

Partition hot-spot mitigation:

Very active conversations split into time-bucketed partitions
Partition key: (conversation_id, time_bucket)
Time bucket: daily for active chats, monthly for archives

Message Deduplication Cache (Redis)

Purpose: Prevent duplicate message processing (idempotency)

1
# Set messageId with 24-hour TTL
2
SETEX msg:dedup:{message_id} 86400 1
3

4
# Check before processing
5
EXISTS msg:dedup:{message_id}

User and Conversation Metadata (PostgreSQL)

1
CREATE TABLE users (
2
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
3
    username VARCHAR(50) UNIQUE NOT NULL,
4
    display_name VARCHAR(100),
5
    avatar_url TEXT,
6
    phone_hash VARCHAR(64) UNIQUE,
7
    created_at TIMESTAMPTZ DEFAULT NOW(),
8
    last_active_at TIMESTAMPTZ
9
);
10

11
CREATE TABLE conversations (
12
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
13
    type VARCHAR(20) NOT NULL, -- 'direct' or 'group'
14
    name VARCHAR(100),         -- NULL for direct
15
    avatar_url TEXT,
16
    created_by UUID REFERENCES users(id),
17
    created_at TIMESTAMPTZ DEFAULT NOW(),
18
    updated_at TIMESTAMPTZ DEFAULT NOW(),
19
    last_message_at TIMESTAMPTZ,
20
    last_sequence BIGINT DEFAULT 0,
21
    member_count INT DEFAULT 0,
22
    settings JSONB DEFAULT '{}'
23
);
24

25
CREATE TABLE conversation_members (
26
    conversation_id UUID REFERENCES conversations(id) ON DELETE CASCADE,
27
    user_id UUID REFERENCES users(id) ON DELETE CASCADE,
28
    role VARCHAR(20) DEFAULT 'member', -- 'admin', 'member'
29
    joined_at TIMESTAMPTZ DEFAULT NOW(),
30
    last_read_sequence BIGINT DEFAULT 0,
31
    muted_until TIMESTAMPTZ,
32
    PRIMARY KEY (conversation_id, user_id)
33
);
34

35
-- Indexes
36
CREATE INDEX idx_members_user ON conversation_members(user_id);
37
CREATE INDEX idx_conversations_updated ON conversations(updated_at DESC);

Session and Connection State (Redis)

1
# User's active connections (set)
2
SADD user:conn:{user_id} {connection_id}
3
SREM user:conn:{user_id} {connection_id}
4

5
# Connection → Gateway mapping (hash)
6
HSET conn:{connection_id}
7
    gateway "gateway-1.us-east-1"
8
    user_id "user_123"
9
    device_id "device_abc"
10
    connected_at 1706886400000
11

12
# User presence (hash with TTL)
13
HSET presence:{user_id}
14
    status "online"
15
    last_seen 1706886400000
16
    device_type "mobile"
17
EXPIRE presence:{user_id} 120
18

19
# Typing indicators (with auto-expire)
20
SETEX typing:{conversation_id}:{user_id} 5 1

Database Selection Matrix

Data Type	Store	Rationale
Messages	ScyllaDB	Time-series optimized, low latency, horizontal scale
User profiles	PostgreSQL	ACID, complex queries, moderate scale
Conversation metadata	PostgreSQL	Relational queries, ACL management
Sessions, presence	Redis Cluster	Sub-ms latency, TTL support, pub/sub
Message dedup cache	Redis	Fast lookups, automatic expiry
Media files	S3	Object storage, CDN integration
Analytics events	Kafka → ClickHouse	High-volume time-series analytics


12 collapsed lines
1
class DirectPushHandler {
2
  private readonly redis: RedisCluster
3
  private readonly messageStore: MessageStore
4

5
  async deliverMessage(message: Message): Promise<void> {
6
    // 1. Get all members of conversation
7
    const members = await this.getConversationMembers(message.conversationId)
8

9
    // 2. For each member, find their active connections
10
    const deliveryTasks = members
11
      .filter((m) => m.userId !== message.senderId)
12
      .map(async (member) => {
13
        const connections = await this.redis.smembers(`user:conn:${member.userId}`)
14

15
        if (connections.length > 0) {
16
          // User is online - push directly
17
          await Promise.all(connections.map((connId) => this.pushToConnection(connId, message)))
18
          return { userId: member.userId, status: "pushed" }
19
        } else {
20
          // User is offline - queue for push notification
21
          await this.queuePushNotification(member.userId, message)
22
          return { userId: member.userId, status: "queued" }
23
        }
24
      })
25

26
    const results = await Promise.all(deliveryTasks)
27

28
    // 3. Update delivery status
29
    const deliveredTo = results.filter((r) => r.status === "pushed").map((r) => r.userId)
30

31
    if (deliveredTo.length > 0) {
32
      await this.notifyDeliveryReceipt(message, deliveredTo)
33
    }
34
  }
35

36
  private async pushToConnection(connId: string, message: Message): Promise<void> {
37
    const connInfo = await this.redis.hgetall(`conn:${connId}`)
38
    const gateway = connInfo.gateway
39

40
    // Send via internal RPC to gateway
41
    await this.gatewayClient.send(gateway, {
42
      type: "deliver",
43
      connectionId: connId,
44
      message,
45
    })
46
  }
47
}

Kafka Fan-out (Large Groups)

For conversations with >= 100 members:


15 collapsed lines
1
class KafkaFanoutHandler {
2
  private readonly kafka: KafkaProducer
3
  private readonly FANOUT_TOPIC = "messages.fanout"
4

5
  async publishForFanout(message: Message, memberCount: number): Promise<void> {
6
    // Partition by conversation for ordering guarantee
7
    await this.kafka.send({
8
      topic: this.FANOUT_TOPIC,
9
      messages: [
10
        {
11
          key: message.conversationId,
12
          value: JSON.stringify({
13
            message,
14
            memberCount,
15
            publishedAt: Date.now(),
16
          }),
17
        },
18
      ],
19
    })
20
  }
21
}
22

23
// Fan-out consumer (multiple instances)
24
class FanoutConsumer {
25
  private readonly BATCH_SIZE = 100
26

27
  async processMessage(record: KafkaRecord): Promise<void> {
28
    const { message, memberCount } = JSON.parse(record.value)
29

30
    // Process members in batches to avoid memory pressure
31
    let offset = 0
32
    while (offset < memberCount) {
33
      const memberBatch = await this.getMemberBatch(message.conversationId, offset, this.BATCH_SIZE)
34

35
      await Promise.all(memberBatch.map((member) => this.deliverToMember(member, message)))
36

37
      offset += this.BATCH_SIZE
38
    }
39
  }
40
}

Message Ordering and Sequencing

Sequence Number Assignment


8 collapsed lines
1
class SequenceGenerator {
2
  private readonly redis: RedisCluster
3

4
  async getNextSequence(conversationId: string): Promise<bigint> {
5
    // Atomic increment in Redis
6
    const sequence = await this.redis.incr(`seq:${conversationId}`)
7
    return BigInt(sequence)
8
  }
9
}
10

11
class MessageProcessor {
12
  async processIncoming(conversationId: string, message: IncomingMessage): Promise<ProcessedMessage> {
13
    // Acquire conversation lock for ordering
14
    const lock = await this.acquireLock(`lock:msg:${conversationId}`, 5000)
15

16
    try {
17
      // Assign sequence number
18
      const sequenceNumber = await this.sequenceGenerator.getNextSequence(conversationId)
19

20
      // Assign server timestamp
21
      const timestamp = Date.now()
22

23
      const processed: ProcessedMessage = {
24
        ...message,
25
        sequenceNumber,
26
        timestamp,
27
        status: "SENT",
28
      }
29

30
      // Persist with sequence number
31
      await this.messageStore.insert(processed)
32

33
      return processed
34
    } finally {
35
      await lock.release()
36
    }
37
  }
38
}

Client-Side Ordering


10 collapsed lines
1
class ClientMessageBuffer {
2
  private pendingMessages: Map<string, Message[]> = new Map()
3
  private lastSequence: Map<string, bigint> = new Map()
4

5
  onMessageReceived(message: Message): void {
6
    const expected = (this.lastSequence.get(message.conversationId) || 0n) + 1n
7

8
    if (message.sequenceNumber === expected) {
9
      // In order - deliver immediately
10
      this.deliverToUI(message)
11
      this.lastSequence.set(message.conversationId, message.sequenceNumber)
12

13
      // Check for buffered messages that can now be delivered
14
      this.flushBuffer(message.conversationId)
15
    } else if (message.sequenceNumber > expected) {
16
      // Out of order - buffer and request missing
17
      this.bufferMessage(message)
18
      this.requestMissing(message.conversationId, expected, message.sequenceNumber)
19
    }
20
    // If sequence < expected, it's a duplicate - ignore
21
  }
22

23
  private flushBuffer(conversationId: string): void {
24
    const buffer = this.pendingMessages.get(conversationId) || []
25
    buffer.sort((a, b) => Number(a.sequenceNumber - b.sequenceNumber))
26

27
    let expected = (this.lastSequence.get(conversationId) || 0n) + 1n
28
    while (buffer.length > 0 && buffer[0].sequenceNumber === expected) {
29
      const msg = buffer.shift()!
30
      this.deliverToUI(msg)
31
      this.lastSequence.set(conversationId, msg.sequenceNumber)
32
      expected++
33
    }
34

35
    this.pendingMessages.set(conversationId, buffer)
36
  }
37
}

Presence System

Heartbeat Processing


10 collapsed lines
1
class PresenceManager {
2
  private readonly PRESENCE_TTL = 120 // seconds
3
  private readonly TYPING_TTL = 5 // seconds
4

5
  async handleHeartbeat(userId: string, deviceType: string): Promise<void> {
6
    const now = Date.now()
7

8
    // Update presence with TTL
9
    await this.redis
10
      .multi()
11
      .hset(`presence:${userId}`, {
12
        status: "online",
13
        last_seen: now,
14
        device_type: deviceType,
15
      })
16
      .expire(`presence:${userId}`, this.PRESENCE_TTL)
17
      .exec()
18

19
    // Publish presence change to subscribers
20
    await this.redis.publish(
21
      `presence:changes`,
22
      JSON.stringify({
23
        userId,
24
        status: "online",
25
        timestamp: now,
26
      }),
27
    )
28
  }
29

30
  async handleDisconnect(userId: string): Promise<void> {
31
    // Check if user has other active connections
32
    const connections = await this.redis.smembers(`user:conn:${userId}`)
33

34
    if (connections.length === 0) {
35
      // No more connections - mark offline
36
      const now = Date.now()
37

38
      await this.redis.hset(`presence:${userId}`, {
39
        status: "offline",
40
        last_seen: now,
41
      })
42

43
      await this.redis.publish(
44
        `presence:changes`,
45
        JSON.stringify({
46
          userId,
47
          status: "offline",
48
          lastSeen: now,
49
        }),
50
      )
51
    }
52
  }
53

54
  async setTyping(userId: string, conversationId: string): Promise<void> {
55
    await this.redis.setex(`typing:${conversationId}:${userId}`, this.TYPING_TTL, "1")
56

57
    // Notify conversation members
58
    await this.redis.publish(
59
      `typing:${conversationId}`,
60
      JSON.stringify({
61
        userId,
62
        isTyping: true,
63
      }),
64
    )
65
  }
66
}

Presence Subscription


12 collapsed lines
1
class PresenceSubscriber {
2
  private subscribedUsers: Set<string> = new Set()
3

4
  async subscribeToContacts(userId: string, contactIds: string[]): Promise<void> {
5
    // Get current status of all contacts
6
    const pipeline = this.redis.pipeline()
7
    contactIds.forEach((id) => {
8
      pipeline.hgetall(`presence:${id}`)
9
    })
10
    const results = await pipeline.exec()
11

12
    // Send initial presence state
13
    const presences = contactIds.map((id, i) => ({
14
      userId: id,
15
      ...(results[i][1] || { status: "offline" }),
16
    }))
17

18
    this.sendToClient({ type: "presence.bulk", payload: { presences } })
19

20
    // Subscribe to changes
21
    contactIds.forEach((id) => this.subscribedUsers.add(id))
22
  }
23

24
  onPresenceChange(change: PresenceChange): void {
25
    if (this.subscribedUsers.has(change.userId)) {
26
      this.sendToClient({
27
        type: "presence.update",
28
        payload: change,
29
      })
30
    }
31
  }
32
}

Offline Sync Protocol


15 collapsed lines
1
class SyncService {
2
  async syncConversations(
3
    userId: string,
4
    syncState: Array<{ conversationId: string; lastSequence: bigint }>,
5
  ): Promise<SyncResponse> {
6
    const results: ConversationSync[] = []
7

8
    for (const { conversationId, lastSequence } of syncState) {
9
      // Verify user is member
10
      const isMember = await this.checkMembership(userId, conversationId)
11
      if (!isMember) continue
12

13
      // Fetch missed messages
14
      const messages = await this.messageStore.getMessagesAfter(
15
        conversationId,
16
        lastSequence,
17
        100, // limit per conversation
18
      )
19

20
      // Get conversation metadata if changed
21
      const conversation = await this.conversationStore.get(conversationId)
22

23
      results.push({
24
        conversationId,
25
        messages,
26
        hasMore: messages.length === 100,
27
        lastSequence: messages.length > 0 ? messages[messages.length - 1].sequenceNumber : lastSequence,
28
        unreadCount: await this.getUnreadCount(userId, conversationId),
29
      })
30
    }
31

32
    // Also check for new conversations
33
    const newConversations = await this.getNewConversations(userId, syncState)
34

35
    return {
36
      conversations: results,
37
      newConversations,
38
      serverTime: Date.now(),
39
    }
40
  }
41
}

Frontend Considerations

Connection Management

Reconnection with exponential backoff:


15 collapsed lines
1
class WebSocketManager {
2
  private ws: WebSocket | null = null
3
  private reconnectAttempt = 0
4
  private readonly MAX_RECONNECT_DELAY = 30000
5
  private readonly BASE_DELAY = 1000
6

7
  connect(): void {
8
    this.ws = new WebSocket(this.buildUrl())
9

10
    this.ws.onopen = () => {
11
      this.reconnectAttempt = 0
12
      this.onConnected()
13
    }
14

15
    this.ws.onclose = (event) => {
16
      if (!event.wasClean) {
17
        this.scheduleReconnect()
18
      }
19
    }
20

21
    this.ws.onerror = () => {
22
      // Will trigger onclose
23
    }
24
  }
25

26
  private scheduleReconnect(): void {
27
    const delay = Math.min(
28
      this.BASE_DELAY * Math.pow(2, this.reconnectAttempt) + Math.random() * 1000, // Jitter
29
      this.MAX_RECONNECT_DELAY,
30
    )
31

32
    this.reconnectAttempt++
33

34
    setTimeout(() => this.connect(), delay)
35
  }
36
}

Local Message Storage

IndexedDB schema for offline support:


20 collapsed lines
1
interface LocalDBSchema {
2
  messages: {
3
    key: [string, number] // [conversationId, sequenceNumber]
4
    value: Message
5
    indexes: {
6
      "by-conversation": string
7
      "by-timestamp": number
8
      "by-status": string
9
    }
10
  }
11
  conversations: {
12
    key: string // conversationId
13
    value: ConversationMeta
14
    indexes: {
15
      "by-updated": number
16
    }
17
  }
18
  syncState: {
19
    key: string // conversationId
20
    value: { lastSequence: number; lastSync: number }
21
  }
22
}
23

24
class LocalMessageStore {
25
  private db: IDBDatabase
26

27
  async saveMessage(message: Message): Promise<void> {
28
    const tx = this.db.transaction("messages", "readwrite")
29
    await tx.objectStore("messages").put(message)
30
  }
31

32
  async getMessages(conversationId: string, options: { before?: number; limit: number }): Promise<Message[]> {
33
    const tx = this.db.transaction("messages", "readonly")
34
    const index = tx.objectStore("messages").index("by-conversation")
35

36
    const range = IDBKeyRange.bound([conversationId, 0], [conversationId, options.before || Number.MAX_SAFE_INTEGER])
37

38
    const messages: Message[] = []
39
    let cursor = await index.openCursor(range, "prev")
40

41
    while (cursor && messages.length < options.limit) {
42
      messages.push(cursor.value)
43
      cursor = await cursor.continue()
44
    }
45

46
    return messages
47
  }
48
}

Optimistic Updates


10 collapsed lines
1
class MessageSender {
2
  async sendMessage(conversationId: string, content: MessageContent): Promise<void> {
3
    const clientMessageId = crypto.randomUUID()
4
    const optimisticMessage: Message = {
5
      messageId: clientMessageId,
6
      conversationId,
7
      senderId: this.currentUserId,
8
      content,
9
      timestamp: Date.now(),
10
      sequenceNumber: -1n, // Pending
11
      status: "PENDING",
12
    }
13

14
    // 1. Show immediately in UI
15
    this.messageStore.addOptimistic(optimisticMessage)
16
    this.ui.appendMessage(optimisticMessage)
17

18
    // 2. Persist to local DB
19
    await this.localDb.saveMessage(optimisticMessage)
20

21
    // 3. Send to server
22
    try {
23
      const ack = await this.ws.sendAndWait({
24
        type: "message.send",
25
        payload: {
26
          messageId: clientMessageId,
27
          conversationId,
28
          content,
29
        },
30
      })
31

32
      // 4. Update with server-assigned values
33
      const confirmedMessage = {
34
        ...optimisticMessage,
35
        sequenceNumber: ack.sequenceNumber,
36
        timestamp: ack.timestamp,
37
        status: "SENT",
38
      }
39

40
      this.messageStore.updateOptimistic(clientMessageId, confirmedMessage)
41
      await this.localDb.saveMessage(confirmedMessage)
42
    } catch (error) {
43
      // 5. Mark as failed
44
      this.messageStore.markFailed(clientMessageId)
45
      this.ui.showRetryOption(clientMessageId)
46
    }
47
  }
48
}

Virtual List for Message History


15 collapsed lines
1
interface VirtualListConfig {
2
  containerHeight: number
3
  itemHeight: number // Estimated, variable heights supported
4
  overscan: number // Extra items to render above/below viewport
5
}
6

7
class VirtualMessageList {
8
  private visibleRange = { start: 0, end: 0 }
9
  private heightCache = new Map<string, number>()
10

11
  calculateVisibleRange(scrollTop: number): { start: number; end: number } {
12
    const messages = this.getMessages()
13
    let accumulatedHeight = 0
14
    let start = 0
15
    let end = messages.length
16

17
    // Find start index
18
    for (let i = 0; i < messages.length; i++) {
19
      const height = this.getItemHeight(messages[i])
20
      if (accumulatedHeight + height > scrollTop - this.config.overscan * 50) {
21
        start = i
22
        break
23
      }
24
      accumulatedHeight += height
25
    }
26

27
    // Find end index
28
    accumulatedHeight = 0
29
    for (let i = start; i < messages.length; i++) {
30
      accumulatedHeight += this.getItemHeight(messages[i])
31
      if (accumulatedHeight > this.config.containerHeight + this.config.overscan * 50) {
32
        end = i + 1
33
        break
34
      }
35
    }
36

37
    return { start, end }
38
  }
39

40
  // Render only visible messages
41
  render(): MessageItem[] {
42
    const { start, end } = this.visibleRange
43
    return this.getMessages().slice(start, end)
44
  }
45
}

Infrastructure

Cloud-Agnostic Components

Component	Purpose	Options
WebSocket Gateway	Persistent connections	Nginx (ws), HAProxy, Envoy
Message Queue	Async delivery, ordering	Kafka, Pulsar, NATS JetStream
KV Store	Sessions, presence	Redis, KeyDB, Dragonfly
Message Store	Message persistence	ScyllaDB, Cassandra, DynamoDB
Relational DB	User/group metadata	PostgreSQL, CockroachDB
Object Store	Media files	MinIO, Ceph, S3-compatible
Push Gateway	Mobile notifications	Self-hosted or APNs/FCM proxy

AWS Reference Architecture

Service configurations:

Service	Configuration	Rationale
WebSocket (Fargate)	4 vCPU, 8GB, 500K conn/pod	Memory for connection state
Message Service	2 vCPU, 4GB	Stateless, CPU-bound
Fan-out Workers	2 vCPU, 4GB, Spot	Cost optimization for async
ElastiCache Redis	r6g.2xlarge cluster mode	Sub-ms presence lookups
Keyspaces	On-demand	Serverless Cassandra for messages
RDS PostgreSQL	db.r6g.xlarge Multi-AZ	Metadata, moderate write load
MSK	kafka.m5.large, 3 brokers	6.5 Gbps throughput capacity

Scaling Considerations

WebSocket connection limits:

Linux default: 1024 file descriptors per process
Tuned: 1M+ with sysctl adjustments
Practical per pod: 500K (memory constrained)
50M concurrent users → 100 gateway pods minimum

Kafka partitioning:

Partition by conversation_id for ordering
Minimum partitions: 100 (allows 100 parallel consumers)
Hot partition handling: re-partition extremely active conversations

Message storage partitioning:

ScyllaDB partition key: conversation_id
Max partition size: 100MB (~200K messages)
Very active conversations: add time bucket to partition key

Presence fan-out:

Redis pub/sub scales to ~10K subscribers per channel
For users with 10K+ contacts: use hierarchical pub/sub or dedicated presence servers

Conclusion

This design provides real-time chat and messaging with:

Sub-500ms message delivery via WebSocket with hybrid push/pull fan-out
At-least-once delivery with client-side deduplication for reliability
Strong per-conversation ordering through server-assigned sequence numbers
Seamless offline support via local storage and sync protocol
Scalable presence using Redis pub/sub with heartbeat-based status

Key architectural decisions:

Hybrid fan-out balances latency (direct push) with scalability (Kafka for large groups)
Server-assigned timestamps eliminate clock skew issues
Client-generated message IDs enable idempotent retry
Per-conversation partitioning ensures ordering without global coordination

Known limitations:

Server dependency for message ordering (no P2P)
At-least-once delivery requires client deduplication logic
Presence accuracy limited by heartbeat interval (30s staleness possible)
Large group delivery latency higher than 1:1 messages

Future enhancements:

End-to-end encryption with Signal protocol
Reactions and threaded replies
Message editing and deletion with tombstones
Voice and video calling integration (WebRTC)

Appendix

Prerequisites

Distributed systems fundamentals (consistency models, partitioning)
Real-time communication patterns (WebSocket, pub/sub)
Message queue concepts (Kafka partitions, consumer groups)
Database selection trade-offs (SQL vs. NoSQL)

Terminology

Term	Definition
Fan-out	Distributing a message to multiple recipients
Sequence number	Monotonically increasing identifier for ordering messages within a conversation
Presence	User’s online/offline status and activity indicators
Idempotency	Property ensuring duplicate requests produce the same result
Heartbeat	Periodic signal from client to server indicating connection is alive
ACK	Acknowledgment message confirming receipt
TTL	Time-to-live; automatic expiration of data after specified duration

Summary

Real-time chat requires persistent connections (WebSocket), reliable delivery (at-least-once with dedup), and ordering guarantees (server-assigned sequence numbers)
Hybrid fan-out (direct push for small groups, Kafka for large) balances latency with scalability
ScyllaDB provides time-series optimized storage with sub-15ms p99 read latency
Redis handles ephemeral state: sessions, presence, typing indicators with pub/sub distribution
Client-side sync protocol with sequence tracking enables seamless offline-to-online transitions
Scale to 350K+ messages/second with horizontal gateway scaling and partitioned message queues

References

Real-World Implementations:

How Discord Stores Trillions of Messages - ScyllaDB migration and performance gains
How Discord Serves 15 Million Users on One Server - Elixir and BEAM architecture
Real-time Messaging at Slack - Channel server architecture
LinkedIn’s Real-Time Presence Platform - Presence at scale

Protocol Specifications:

RFC 6455 - The WebSocket Protocol - WebSocket specification
XMPP Core (RFC 6120) - Extensible Messaging and Presence Protocol
MQTT Version 5.0 (OASIS Standard) - IoT messaging protocol

Distributed Systems Theory:

Time, Clocks, and the Ordering of Events - Lamport’s foundational paper
Exactly-Once Semantics in Apache Kafka - Kafka’s exactly-once implementation

Related Articles:

Design Collaborative Document Editing - Real-time sync with OT/CRDT
Design a Notification System - Multi-channel push delivery

Read more