Real-Time Sync Client

Client-side architecture for real-time data synchronization: transport protocols, connection management, conflict resolution, and state reconciliation patterns used by Figma, Notion, Discord, and Linear.

Real-time sync client architecture: optimistic local state, sync engine with persistence, and transport abstraction connecting to authoritative server state.

Abstract

Real-time sync is bidirectional state convergence between client and server under network uncertainty. The core tension: users expect instant feedback, but the server is the source of truth.

Mental model:

Optimistic local state — Apply mutations immediately; rollback on server rejection
Transport selection — WebSocket for bidirectional low-latency; SSE for server-push simplicity; polling as fallback
Connection resilience — Exponential backoff with jitter; heartbeats detect stale connections; reconnect without data loss
Conflict resolution — Last-write-wins at property level (Figma), Operational Transform for text (Google Docs), CRDTs for offline-first (Notion, Linear)
State reconciliation — Server acknowledgments; re-apply pending local mutations on top of authoritative state
Presence — Ephemeral metadata (cursors, typing indicators) via heartbeat or CRDT broadcast

The approach depends on data shape, offline requirements, and acceptable consistency latency.

The Challenge

Why Real-Time Sync Is Hard

Real-time sync violates the request-response model that HTTP assumes. Three constraints create tension:

Latency vs. Consistency — Users expect instant feedback, but network round-trips are unavoidable
Offline vs. Conflicts — Offline edits require local storage, which creates divergent state that must merge
Scale vs. Complexity — Millions of connections require infrastructure that single-server models cannot provide

Browser Constraints

Resource	Limit	Impact
WebSocket connections per domain	6–30	Multiple tabs share quota; exhaustion causes failures
SSE connections (HTTP/1.1)	6 per domain	Shared across all tabs; HTTP/2 raises to ~100
Main thread budget	16ms/frame	Long sync operations cause UI jank
IndexedDB quota	50% of disk (Chrome)	Large local caches can hit quota errors

WebSocket connections prevent the browser’s back/forward cache (bfcache) from storing the page. Close connections on pagehide to preserve navigation performance.

Device and Network Profiles

Scenario	Latency	Constraints
Desktop + fiber	10–50ms	Full WebSocket, generous caching
Mobile + LTE	50–200ms	Battery drain from radio wake; bundle updates
Mobile + 3G	200–500ms	Aggressive local caching; defer non-critical sync
Offline	N/A	Queue mutations locally; replay on reconnect

Cellular radio “tail time” keeps the antenna active for seconds after each transmission. Batch messages to minimize battery impact.

Transport Protocols

WebSocket (RFC 6455)

WebSocket provides full-duplex communication over a single TCP connection. After an HTTP handshake, frames flow bidirectionally with minimal overhead.

Protocol characteristics:

Property	Value
Direction	Bidirectional
Frame overhead	2–14 bytes (after handshake)
Data types	Text (UTF-8), binary
Auto-reconnect	No (must implement)
Compression	Per-message DEFLATE (RFC 7692)

Handshake:

1
GET /chat HTTP/1.1
2
Host: server.example.com
3
Upgrade: websocket
4
Connection: Upgrade
5
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
6
Sec-WebSocket-Version: 13

The server responds with 101 Switching Protocols, and the connection upgrades to WebSocket framing.

Design reasoning: RFC 6455 chose HTTP-compatible handshake so WebSocket traffic traverses proxies and firewalls that allow HTTP. The Sec-WebSocket-Key prevents cross-protocol attacks by requiring the server to prove it understands WebSocket.

Limitations:

No backpressure — If the server sends faster than the client processes, messages buffer in memory until the browser crashes or discards data. Experimental WebSocketStream (Chrome 124+) adds backpressure support.
No multiplexing — Each logical channel requires a separate connection or application-level multiplexing.
Connection limits — Browsers cap WebSocket connections per domain (typically 6–30). Multiple tabs compete for this quota.

When to use: Chat, collaborative editing, gaming, financial tickers—any scenario requiring low-latency bidirectional communication.

Server-Sent Events (SSE)

SSE is server-to-client push over HTTP. The server holds an open connection and streams events as text/event-stream.

Event format:

1
event: message
2
data: {"user":"alice","text":"hello"}
3
id: 12345
4
retry: 3000
5

6
: keep-alive comment

Field	Purpose
`event`	Event type name (triggers named listeners)
`data`	Payload (UTF-8 only)
`id`	Resume token for reconnection
`retry`	Reconnection delay in milliseconds

Design reasoning: SSE uses HTTP, so it works through any proxy or firewall that allows HTTP. The id field enables resumption: on reconnect, the browser sends Last-Event-ID, and the server can replay missed events.

Advantages over WebSocket:

Built-in reconnection with configurable delay
Works through restrictive corporate proxies
Simpler server implementation (any HTTP server)

Limitations:

Server-to-client only (client uses HTTP POST for uplink)
UTF-8 text only (no binary)
HTTP/1.1 limits to 6 connections per domain across all tabs

When to use: Live feeds, notifications, dashboards—scenarios where the server pushes and the client occasionally posts.

Long Polling

Long polling simulates push by having the client hold an open HTTP request until the server has data or timeout.

Flow:

Client sends GET /events?since=12345
Server holds connection until new data exists or 30s timeout
Server responds with data (or empty on timeout)
Client immediately sends next request

Design reasoning: Long polling works everywhere HTTP works. Before WebSocket existed, this was the only reliable cross-browser push mechanism. It remains useful for restrictive networks or legacy browser support.

Trade-offs:

Aspect	Long Polling	WebSocket	SSE
Latency	Medium (one RTT per message)	Low	Low
Overhead	Full HTTP headers per response	2 bytes/frame	~5 bytes/event
Firewall compatibility	Excellent	Sometimes blocked	Excellent
Implementation complexity	Low	Medium	Low

When to use: Fallback when WebSocket and SSE fail; infrequent updates where latency is acceptable.

Decision Framework

Connection Management

Reconnection with Exponential Backoff

Network disconnections are inevitable. Naive immediate retry causes thundering herd problems when servers recover. Exponential backoff with jitter spreads reconnection attempts.


2 collapsed lines
1
// Types and constants
2
type ReconnectionOptions = {
3
  initialDelay?: number
4
  maxDelay?: number
5
  multiplier?: number
6
  jitter?: number
7
  maxAttempts?: number
8
}
9

10
class ReconnectionManager {
11
  private currentDelay: number
12
  private attempts = 0
13

14
  constructor(private options: Required<ReconnectionOptions>) {
15
    this.currentDelay = options.initialDelay
16
  }
17

18
  // Core logic: calculate next delay with jitter
19
  getNextDelay(): number | null {
20
    if (this.attempts >= this.options.maxAttempts) {
21
      return null // Give up
22
    }
23

24
    const jitterRange = this.currentDelay * this.options.jitter
25
    const jitter = Math.random() * jitterRange * 2 - jitterRange
26
    const delay = Math.min(this.currentDelay + jitter, this.options.maxDelay)
27

28
    this.currentDelay = Math.min(this.currentDelay * this.options.multiplier, this.options.maxDelay)
29
    this.attempts++
30

31
    return delay
32
  }
33

34
  reset(): void {
11 collapsed lines
35
    this.currentDelay = this.options.initialDelay
36
    this.attempts = 0
37
  }
38
}
39

40
// Usage
41
const manager = new ReconnectionManager({
42
  initialDelay: 1000,
43
  maxDelay: 30000,
44
  multiplier: 2,
45
  jitter: 0.1,
46
  maxAttempts: 10,
47
})

Design reasoning:

Exponential growth caps at a maximum (30s typical) to prevent infinite waits
Jitter (±10%) desynchronizes clients that disconnected at the same time
Attempt limit prevents infinite loops against a permanently down service

Production considerations:

Error Type	Retry Strategy
Network timeout	Full backoff
5xx server error	Full backoff
401 Unauthorized	Do not retry; re-authenticate
429 Rate Limited	Honor `Retry-After` header
DNS failure	Backoff with longer max delay

Heartbeat and Connection Health

TCP keep-alives are insufficient for application-level connection health. A dead connection may not trigger a TCP reset for minutes. Application heartbeats detect stale connections faster.


4 collapsed lines
1
// WebSocket wrapper with heartbeat
2
type HeartbeatOptions = {
3
  interval: number // How often to send ping
4
  timeout: number // How long to wait for pong
5
}
6

7
class HeartbeatConnection {
8
  private pingTimer: number | null = null
9
  private pongTimer: number | null = null
10
  private ws: WebSocket
11

12
  constructor(
13
    url: string,
14
    private options: HeartbeatOptions,
15
  ) {
16
    this.ws = new WebSocket(url)
17
    this.ws.onopen = () => this.startHeartbeat()
18
    this.ws.onmessage = (e) => this.handleMessage(e)
19
    this.ws.onclose = () => this.stopHeartbeat()
20
  }
21

22
  // Core heartbeat logic
23
  private startHeartbeat(): void {
24
    this.pingTimer = window.setInterval(() => {
25
      this.ws.send(JSON.stringify({ type: "ping", ts: Date.now() }))
26

27
      this.pongTimer = window.setTimeout(() => {
28
        // No pong received - connection is dead
29
        this.ws.close(4000, "Heartbeat timeout")
11 collapsed lines
30
      }, this.options.timeout)
31
    }, this.options.interval)
32
  }
33

34
  private handleMessage(event: MessageEvent): void {
35
    const data = JSON.parse(event.data)
36
    if (data.type === "pong") {
37
      if (this.pongTimer) clearTimeout(this.pongTimer)
38
      return
39
    }
40
    // Handle application messages...
41
  }
42

43
  private stopHeartbeat(): void {
44
    if (this.pingTimer) clearInterval(this.pingTimer)
45
    if (this.pongTimer) clearTimeout(this.pongTimer)
46
  }
47
}

Typical values:

Setting	Value	Reasoning
Ping interval	15–30s	Balance between detection speed and bandwidth
Pong timeout	5–10s	Allow for network jitter
Missed pings before disconnect	2–3	Avoid false positives from single dropped packet

State Recovery on Reconnect

Reconnection without state recovery loses messages sent during disconnection. Two patterns handle this:

1. Sequence-based recovery:


5 collapsed lines
1
// Track last acknowledged sequence
2
interface Message {
3
  seq: number
4
  payload: unknown
5
}
6

7
class SequenceRecovery {
8
  private lastAckedSeq = 0
9
  private pendingMessages: Message[] = []
10

11
  send(payload: unknown): void {
12
    const msg: Message = {
13
      seq: this.pendingMessages.length + this.lastAckedSeq + 1,
14
      payload,
15
    }
16
    this.pendingMessages.push(msg)
17
    this.ws.send(JSON.stringify(msg))
18
  }
19

20
  // On reconnect, request replay from last ack
21
  onReconnect(): void {
22
    this.ws.send(
23
      JSON.stringify({
24
        type: "resume",
11 collapsed lines
25
        lastSeq: this.lastAckedSeq,
26
      }),
27
    )
28
  }
29

30
  onAck(seq: number): void {
31
    this.lastAckedSeq = seq
32
    this.pendingMessages = this.pendingMessages.filter((m) => m.seq > seq)
33
  }
34

35
  onServerReplay(messages: Message[]): void {
36
    // Server sends missed messages since lastSeq
37
    for (const msg of messages) {
38
      this.handleMessage(msg)
39
    }
40
  }
41
}

2. Event sourcing recovery (SSE):

SSE’s Last-Event-ID header automatically requests replay:

1
const eventSource = new EventSource("/events")
2

3
eventSource.onmessage = (event) => {
4
  // event.lastEventId contains the id from the server
5
  // On reconnect, browser automatically sends Last-Event-ID header
6
  processEvent(JSON.parse(event.data))
7
}

Design reasoning: Sequence numbers are simpler but require server-side storage of recent messages. Event sourcing naturally fits event logs but requires the server to support replay from arbitrary points.

Message Handling

Ordering Guarantees

Out-of-order delivery happens when:

Multiple WebSocket connections exist (load balancing)
Server processes messages in parallel
Network path changes mid-stream

Strategies:

Strategy	Complexity	Guarantee
Single connection, FIFO	Low	Total order
Sequence numbers per sender	Medium	Per-sender order
Vector clocks	High	Causal order
Accept disorder	None	Eventual consistency

For most applications, per-sender ordering suffices:


4 collapsed lines
1
// Per-sender message ordering
2
type SenderId = string
3

4
class OrderedMessageHandler {
5
  private lastSeq = new Map<SenderId, number>()
6
  private pending = new Map<SenderId, Map<number, unknown>>()
7

8
  handle(senderId: SenderId, seq: number, payload: unknown): void {
9
    const expected = (this.lastSeq.get(senderId) ?? 0) + 1
10

11
    if (seq === expected) {
12
      // In order - process and check pending
13
      this.process(payload)
14
      this.lastSeq.set(senderId, seq)
15
      this.processPending(senderId)
16
    } else if (seq > expected) {
17
      // Out of order - buffer
18
      const senderPending = this.pending.get(senderId) ?? new Map()
19
      senderPending.set(seq, payload)
20
      this.pending.set(senderId, senderPending)
21
    }
22
    // seq < expected means duplicate - ignore
23
  }
24

25
  private processPending(senderId: SenderId): void {
26
    const senderPending = this.pending.get(senderId)
27
    if (!senderPending) return
28

29
    let next = (this.lastSeq.get(senderId) ?? 0) + 1
30
    while (senderPending.has(next)) {
31
      this.process(senderPending.get(next))
32
      senderPending.delete(next)
33
      this.lastSeq.set(senderId, next)
34
      next++
35
    }
36
  }
37

38
  private process(payload: unknown): void {
39
    // Application-specific processing
40
  }
41
}

Deduplication

Retries and reconnection can deliver the same message multiple times. Idempotency keys prevent duplicate processing:


3 collapsed lines
1
// Time-bounded deduplication
2
const DEDUP_WINDOW_MS = 5 * 60 * 1000 // 5 minutes
3

4
class Deduplicator {
5
  private processed = new Map<string, number>() // id -> timestamp
6

7
  isDuplicate(messageId: string): boolean {
8
    this.cleanup()
9

10
    if (this.processed.has(messageId)) {
11
      return true
12
    }
13

14
    this.processed.set(messageId, Date.now())
15
    return false
16
  }
17

18
  private cleanup(): void {
19
    const cutoff = Date.now() - DEDUP_WINDOW_MS
20
    for (const [id, ts] of this.processed) {
21
      if (ts < cutoff) {
22
        this.processed.delete(id)
23
      }
24
    }
11 collapsed lines
25
  }
26
}
27

28
// Usage
29
const dedup = new Deduplicator()
30

31
function handleMessage(msg: { id: string; payload: unknown }): void {
32
  if (dedup.isDuplicate(msg.id)) {
33
    return // Already processed
34
  }
35
  processPayload(msg.payload)
36
}

Trade-offs:

Window Size	Memory	Risk
1 minute	Low	May miss slow retries
5 minutes	Medium	Covers most retry scenarios
1 hour	High	Handles extended outages

Optimistic Updates

Optimistic updates show changes immediately while the server processes asynchronously. If the server rejects, rollback to previous state.

Pattern Implementation


8 collapsed lines
1
// Optimistic update with rollback
2
type Todo = { id: string; text: string; completed: boolean }
3
type TodoStore = {
4
  todos: Todo[]
5
  pendingUpdates: Map<string, { previous: Todo; current: Todo }>
6
}
7

8
const store: TodoStore = { todos: [], pendingUpdates: new Map() }
9

10
async function toggleTodo(id: string): Promise<void> {
11
  const todo = store.todos.find((t) => t.id === id)
12
  if (!todo) return
13

14
  // 1. Save previous state
15
  const previous = { ...todo }
16

17
  // 2. Apply optimistic update
18
  const updated = { ...todo, completed: !todo.completed }
19
  store.todos = store.todos.map((t) => (t.id === id ? updated : t))
20
  store.pendingUpdates.set(id, { previous, current: updated })
21

22
  // 3. Notify UI (immediate feedback)
23
  renderTodos()
24

25
  try {
26
    // 4. Send to server
27
    await fetch(`/api/todos/${id}`, {
28
      method: "PATCH",
29
      body: JSON.stringify({ completed: updated.completed }),
30
      headers: { "Idempotency-Key": `toggle-${id}-${Date.now()}` },
31
    })
32

33
    // 5. Success - remove from pending
34
    store.pendingUpdates.delete(id)
35
  } catch (error) {
36
    // 6. Failure - rollback
37
    store.todos = store.todos.map((t) => (t.id === id ? store.pendingUpdates.get(id)!.previous : t))
38
    store.pendingUpdates.delete(id)
39
    renderTodos()
40
    showError("Failed to update todo")
41
  }
42
}
43

44
// TanStack Query equivalent
15 collapsed lines
45
const mutation = useMutation({
46
  mutationFn: (todo: Todo) => updateTodo(todo),
47
  onMutate: async (newTodo) => {
48
    await queryClient.cancelQueries({ queryKey: ["todos"] })
49
    const previous = queryClient.getQueryData(["todos"])
50
    queryClient.setQueryData(["todos"], (old: Todo[]) => old.map((t) => (t.id === newTodo.id ? newTodo : t)))
51
    return { previous }
52
  },
53
  onError: (err, newTodo, context) => {
54
    queryClient.setQueryData(["todos"], context?.previous)
55
  },
56
  onSettled: () => {
57
    queryClient.invalidateQueries({ queryKey: ["todos"] })
58
  },
59
})

When Optimistic Updates Break

Scenario	Problem	Mitigation
Concurrent edits	Two users edit same item	Server conflict resolution; merge or reject
Validation failure	Server rejects invalid data	Client-side validation before optimistic apply
Network partition	User thinks action succeeded	Queue mutations; replay on reconnect
Race conditions	Stale read before write	Version vectors; conditional updates

React 19 useOptimistic:


4 collapsed lines
1
// React 19 built-in optimistic updates
2
import { useOptimistic } from 'react';
3

4
function TodoItem({ todo }: { todo: Todo }) {
5
  const [optimisticTodo, setOptimisticTodo] = useOptimistic(
6
    todo,
7
    (current, completed: boolean) => ({ ...current, completed })
8
  );
9

10
  async function toggle() {
11
    setOptimisticTodo(!optimisticTodo.completed);
12
    await updateTodo({ ...todo, completed: !todo.completed });
13
  }
14

15
  return (
16
    <li style={{ opacity: optimisticTodo.completed !== todo.completed ? 0.5 : 1 }}>
17
      <input type="checkbox" checked={optimisticTodo.completed} onChange={toggle} />
18
      {todo.text}
19
    </li>
2 collapsed lines
20
  );
21
}

Conflict Resolution

When multiple clients edit the same data, conflicts arise. The resolution strategy depends on data shape and acceptable complexity.

Last-Write-Wins (LWW)

Each write carries a timestamp; the latest timestamp wins. Simple but loses data from concurrent edits.

Used by: Cassandra, DynamoDB, Figma (at property level)

1
type LWWRegister<T> = {
2
  value: T
3
  timestamp: number
4
  clientId: string
5
}
6

7
function merge<T>(local: LWWRegister<T>, remote: LWWRegister<T>): LWWRegister<T> {
8
  if (remote.timestamp > local.timestamp) {
9
    return remote
10
  }
11
  if (remote.timestamp === local.timestamp) {
12
    // Tiebreaker: lexicographic client ID comparison
13
    return remote.clientId > local.clientId ? remote : local
14
  }
15
  return local
16
}

Figma’s approach: LWW at property level, not object level. Two users editing different properties of the same shape (e.g., color vs. position) both succeed. Only same-property edits conflict.

Design reasoning: Figma rejected Operational Transform as “overkill”—their data model is a tree of objects with properties, not a linear text document. Property-level LWW is simpler and sufficient for design files.

Clock skew problem: LWW assumes synchronized clocks. NTP skew of 0.5+ seconds can cause “earlier” writes to win. Mitigations:

Use server timestamps (single source of truth)
Hybrid logical clocks (HLC) combining physical time with logical counters
Version vectors for causal ordering

Operational Transformation (OT)

OT transforms operations based on concurrent operations to preserve user intent. Designed for linear sequences (text documents).

Used by: Google Docs, Microsoft Office Online

Example:

1
Initial: "abc"
2
User A: Insert "X" at position 2 → "abXc"
3
User B: Delete position 1 → "ac" (concurrent with A)
4

5
Without transformation:
6
  Apply A's insert to B's result: "aXc" (wrong - X at wrong position)
7

8
With transformation:
9
  Transform A's operation: Insert at 2 becomes Insert at 1 (because B deleted before position 2)
10
  Result: "aXc" becomes correct


10 collapsed lines
1
// Simplified OT for text
2
type Insert = { type: "insert"; pos: number; char: string }
3
type Delete = { type: "delete"; pos: number }
4
type Op = Insert | Delete
5

6
function transform(op1: Op, op2: Op): Op {
7
  // Transform op1 assuming op2 has been applied
8
  if (op1.type === "insert" && op2.type === "insert") {
9
    if (op1.pos <= op2.pos) {
10
      return op1 // op1 is before op2, no change
11
    }
12
    return { ...op1, pos: op1.pos + 1 } // Shift right
13
  }
14

15
  if (op1.type === "insert" && op2.type === "delete") {
16
    if (op1.pos <= op2.pos) {
17
      return op1
18
    }
19
    return { ...op1, pos: op1.pos - 1 } // Shift left
20
  }
21

22
  if (op1.type === "delete" && op2.type === "insert") {
23
    if (op1.pos < op2.pos) {
24
      return op1
25
    }
26
    return { ...op1, pos: op1.pos + 1 }
27
  }
28

29
  if (op1.type === "delete" && op2.type === "delete") {
30
    if (op1.pos < op2.pos) {
31
      return op1
32
    }
33
    if (op1.pos > op2.pos) {
34
      return { ...op1, pos: op1.pos - 1 }
35
    }
36
    // Same position - op1 is a no-op (already deleted)
37
    return { type: "insert", pos: -1, char: "" } // No-op sentinel
38
  }
39

16 collapsed lines
40
  return op1
41
}
42

43
// OT requires central server for operation ordering
44
class OTClient {
45
  private pending: Op[] = []
46
  private serverVersion = 0
47

48
  applyLocal(op: Op): void {
49
    this.pending.push(op)
50
    this.sendToServer(op, this.serverVersion)
51
  }
52

53
  onServerAck(ackVersion: number): void {
54
    this.serverVersion = ackVersion
55
    this.pending.shift() // Remove acknowledged operation
56
  }
57

58
  onServerOp(op: Op): void {
59
    // Transform all pending operations against server operation
60
    for (let i = 0; i < this.pending.length; i++) {
61
      this.pending[i] = transform(this.pending[i], op)
62
    }
63
    this.applyToDocument(op)
64
    this.serverVersion++
65
  }
66
}

Trade-offs:

Aspect	Advantage	Disadvantage
Consistency	Strong (immediate)	Requires central server
Complexity	Intent-preserving	O(n²) worst case; hard to implement correctly
Latency	Low (operations, not state)	Server bottleneck under load

CRDTs (Conflict-Free Replicated Data Types)

CRDTs are data structures mathematically guaranteed to converge. Operations commute—order doesn’t matter.

Used by: Notion (offline pages), Linear (issue metadata), Figma (for specific features), Automerge, Yjs

Common CRDT types:

Type	Use Case	Example
G-Counter	Grow-only counter	Page view counts
PN-Counter	Increment/decrement	Inventory stock
LWW-Register	Single value	User status
OR-Set	Add/remove set	Tags, members
RGA/LSEQ	Ordered text	Collaborative documents


3 collapsed lines
1
// G-Counter CRDT
2
type NodeId = string
3

4
class GCounter {
5
  private counts = new Map<NodeId, number>()
6

7
  constructor(private nodeId: NodeId) {}
8

9
  increment(): void {
10
    const current = this.counts.get(this.nodeId) ?? 0
11
    this.counts.set(this.nodeId, current + 1)
12
  }
13

14
  value(): number {
15
    let sum = 0
16
    for (const count of this.counts.values()) {
17
      sum += count
18
    }
19
    return sum
20
  }
21

22
  // Merge is commutative, associative, idempotent
23
  merge(other: GCounter): void {
24
    for (const [nodeId, count] of other.counts) {
25
      const current = this.counts.get(nodeId) ?? 0
26
      this.counts.set(nodeId, Math.max(current, count))
27
    }
28
  }
29

30
  serialize(): Record<NodeId, number> {
31
    return Object.fromEntries(this.counts)
32
  }
33
}

Tombstone problem: Deleted items leave tombstones (metadata indicating deletion) that accumulate forever. Mitigation:

Garbage collection with consensus (complex in peer-to-peer)
Time-based tombstone expiry (risks resurrection of deleted items)
Periodic state snapshots that exclude tombstones

Design reasoning: CRDTs trade memory for simplicity. No conflict resolution logic needed—the math guarantees convergence. This makes them ideal for offline-first apps where merge timing is unpredictable.

Decision Matrix

Factor	LWW	OT	CRDT
Complexity	Low	High	Medium
Offline support	Poor	None	Excellent
Memory overhead	Low	Low	High (tombstones)
Central server required	No	Yes	No
Best for	Key-value, properties	Text documents	Offline-first, P2P

State Reconciliation

When client and server state diverge, reconciliation brings them back in sync without losing pending local changes.

Client-Side Prediction with Server Reconciliation

Originally from game development, this pattern applies local changes immediately but treats server state as authoritative.


10 collapsed lines
1
// State reconciliation with pending input replay
2
type Input = { id: string; action: string; params: unknown }
3
type State = {
4
  /* application state */
5
}
6

7
class ReconciliationEngine {
8
  private pendingInputs: Input[] = []
9
  private localState: State
10
  private lastAckedInputId: string | null = null
11

12
  applyInput(input: Input): void {
13
    // 1. Apply locally for immediate feedback
14
    this.localState = this.applyToState(this.localState, input)
15

16
    // 2. Track as pending
17
    this.pendingInputs.push(input)
18

19
    // 3. Send to server
20
    this.sendToServer(input)
21
  }
22

23
  onServerUpdate(serverState: State, lastAckedId: string): void {
24
    // 1. Remove acknowledged inputs
25
    const ackIndex = this.pendingInputs.findIndex((i) => i.id === lastAckedId)
26
    if (ackIndex !== -1) {
27
      this.pendingInputs = this.pendingInputs.slice(ackIndex + 1)
28
    }
29

30
    // 2. Start from authoritative server state
31
    let reconciledState = serverState
32

33
    // 3. Re-apply pending (unacknowledged) inputs
34
    for (const input of this.pendingInputs) {
35
      reconciledState = this.applyToState(reconciledState, input)
36
    }
37

38
    // 4. Update local state
39
    this.localState = reconciledState
40

41
    // 5. Re-render
42
    this.render()
43
  }
44

45
  private applyToState(state: State, input: Input): State {
46
    // Application-specific state transition
47
    return state
48
  }
49

8 collapsed lines
50
  private sendToServer(input: Input): void {
51
    // WebSocket send
52
  }
53

54
  private render(): void {
55
    // Update UI
56
  }
57
}

Why this works: The server is always right. Local state is a prediction that will be corrected. By re-applying pending inputs on top of server state, we maintain responsiveness while converging to truth.

Smooth vs. snap reconciliation:

Snap: Immediately apply server correction (can cause visual jitter)
Smooth: Interpolate toward server state over multiple frames (better UX for games/animations, more complex)

Full State Sync

For simpler applications, fetch full state on reconnect and discard local state:


4 collapsed lines
1
// Simple full state sync on reconnect
2
async function reconnect(): Promise<void> {
3
  const state = await fetch("/api/state").then((r) => r.json())
4
  store.setState(state)
5
  render()
6
}
7

8
// With local mutation queue
9
async function reconnectWithQueue(): Promise<void> {
10
  // 1. Fetch server state
11
  const serverState = await fetch("/api/state").then((r) => r.json())
12

13
  // 2. Replay queued mutations
14
  for (const mutation of localQueue) {
15
    try {
16
      await fetch("/api/mutate", {
17
        method: "POST",
18
        body: JSON.stringify(mutation),
19
      })
11 collapsed lines
20
    } catch {
21
      // Handle permanent failure
22
    }
23
  }
24

25
  // 3. Fetch final state (includes replayed mutations)
26
  const finalState = await fetch("/api/state").then((r) => r.json())
27
  store.setState(finalState)
28
  localQueue.clear()
29
  render()
30
}

Presence

Presence tracks ephemeral user state: who’s online, cursor positions, typing indicators.

Design Considerations

Aspect	Consideration
Persistence	Not needed—presence is ephemeral
Update frequency	Cursors: 20–60 Hz; typing: on change; online: 15–30s heartbeat
Bandwidth	Throttle cursor updates; batch presence changes
Cleanup	Automatic on disconnect; timeout for crashed clients

Implementation Patterns

Heartbeat-based online status:


5 collapsed lines
1
// Heartbeat presence
2
const HEARTBEAT_INTERVAL = 15000 // 15 seconds
3
const TIMEOUT_THRESHOLD = 45000 // 3 missed heartbeats
4

5
class PresenceManager {
6
  private users = new Map<string, { lastSeen: number; metadata: unknown }>()
7
  private heartbeatTimer: number | null = null
8

9
  start(userId: string, metadata: unknown): void {
10
    this.heartbeatTimer = window.setInterval(() => {
11
      this.ws.send(
12
        JSON.stringify({
13
          type: "presence",
14
          userId,
15
          metadata,
16
          timestamp: Date.now(),
17
        }),
18
      )
19
    }, HEARTBEAT_INTERVAL)
20
  }
21

22
  onPresenceUpdate(userId: string, metadata: unknown): void {
23
    this.users.set(userId, { lastSeen: Date.now(), metadata })
24
    this.pruneStale()
25
    this.render()
26
  }
27

28
  private pruneStale(): void {
29
    const now = Date.now()
11 collapsed lines
30
    for (const [userId, data] of this.users) {
31
      if (now - data.lastSeen > TIMEOUT_THRESHOLD) {
32
        this.users.delete(userId)
33
      }
34
    }
35
  }
36

37
  stop(): void {
38
    if (this.heartbeatTimer) clearInterval(this.heartbeatTimer)
39
  }
40

41
  getOnlineUsers(): string[] {
42
    return Array.from(this.users.keys())
43
  }
44
}

Cursor sharing:


8 collapsed lines
1
// Throttled cursor sharing
2
type CursorPosition = { x: number; y: number; userId: string }
3

4
class CursorPresence {
5
  private cursors = new Map<string, CursorPosition>()
6
  private throttledSend: (pos: CursorPosition) => void
7

8
  constructor(
9
    private ws: WebSocket,
10
    private userId: string,
11
  ) {
12
    // Throttle to 30 updates per second max
13
    this.throttledSend = throttle((pos: CursorPosition) => {
14
      ws.send(JSON.stringify({ type: "cursor", ...pos }))
15
    }, 33)
16
  }
17

18
  onLocalMove(x: number, y: number): void {
19
    this.throttledSend({ x, y, userId: this.userId })
20
  }
21

22
  onRemoteCursor(cursor: CursorPosition): void {
23
    this.cursors.set(cursor.userId, cursor)
24
    this.renderCursors()
25
  }
26

27
  onUserLeave(userId: string): void {
28
    this.cursors.delete(userId)
29
    this.renderCursors()
30
  }
31

32
  private renderCursors(): void {
33
    // Render remote cursors with user labels
34
  }
12 collapsed lines
35
}
36

37
function throttle<T extends (...args: unknown[]) => void>(fn: T, ms: number): T {
38
  let lastCall = 0
39
  return ((...args: Parameters<T>) => {
40
    const now = Date.now()
41
    if (now - lastCall >= ms) {
42
      lastCall = now
43
      fn(...args)
44
    }
45
  }) as T
46
}

Phoenix Presence (CRDT-based)

Phoenix Channels uses a CRDT-based presence system that automatically syncs across cluster nodes:

Each presence update is a CRDT merge operation
No single point of failure
Automatic cleanup when connections close
Built-in conflict resolution for presence metadata

Design reasoning: Presence is inherently distributed (users connect to different servers). CRDT semantics guarantee all nodes converge to the same view without coordination.

Real-World Implementations

Figma: Multiplayer Design

Scale: Millions of concurrent editors

Architecture:

Client/server with WebSocket
Multiplayer service is authoritative
File state held in-memory for speed
Checkpointing to DynamoDB every 30–60 seconds
Write-ahead log prevents data loss between checkpoints

Conflict resolution: Property-level LWW. Two simultaneous changes to different properties of the same object both succeed. Only same-property conflicts use timestamp comparison.

Why not OT? Figma’s data model is a tree of objects with properties, not linear text. OT is optimized for character-level text operations. Property-level LWW is simpler and sufficient.

Fractional indexing for ordered sequences: Instead of integer indices, objects have arbitrary-precision fractional positions. Insert between A (0.3) and B (0.4) by assigning 0.35. No reindexing required.

Source: Figma Engineering Blog

Notion: Offline-First

Challenge: Block-based architecture where pages reference blocks that reference other blocks. Opening a page requires all referenced blocks.

Architecture:

SQLite for local caching (pre-existed for performance)
CRDTs for conflict resolution on offline-marked pages
Each client tracks lastDownloadedTimestamp per offline page
On reconnect: compare with server’s lastUpdatedTime, fetch only newer pages

Design decision: If a page might be missing data, Notion refuses to show it at all rather than showing partial content. Missing data is worse UX than “unavailable offline.”

Source: Notion Engineering Blog

Linear: Local-First Sync

Architecture:

Loads all issues into memory/IndexedDB on startup
Search is instant (0ms)—just filtering a JavaScript array
Hybrid: OT for issue descriptions (text), CRDTs for metadata (status, assignee)

Why it works: Issue trackers have bounded data per workspace (unlike documents). Full client-side data enables instant interactions.

Competitive advantage: “Snappiness” is Linear’s primary differentiator from Jira. Local-first makes every interaction feel instant.

Source: Linear Engineering Blog

Discord: Message Delivery at Scale

Scale: Trillions of messages, millions of concurrent connections

Architecture:

Gateway infrastructure in Elixir (BEAM VM for concurrency)
Single Elixir process per guild (server) as central routing point
Separate process for each connected user’s client
Storage evolution: MongoDB → Cassandra (2017) → ScyllaDB (2022)

Message fanout:

User sends message
Guild process receives it
Guild process tracks all member sessions
Fans out to all connected user client processes
Client processes forward over WebSocket to devices

Source: Discord Engineering Blog

Slack: Real-Time at Enterprise Scale

Scale: Tens of millions of channels per host, 500ms message delivery worldwide

Architecture:

Channel Servers (CS): Stateful, in-memory servers holding channel history (~16M channels per host)
Gateway Servers (GS): Maintain WebSocket connections, deployed across regions
CHARMs: Consistent hash ring managers ensuring CS replacement within 20 seconds

Reliability guarantees:

Messages have strong guarantees around arrival
Ordered and delivered exactly once
All messages are persisted
Idempotency keys prevent duplicates
Kafka for durable queuing + Redis for fast in-flight job data

Source: Slack Engineering Blog

Browser Constraints Deep Dive

Main Thread Budget

The main thread has 16ms per frame for 60fps. Real-time sync operations compete with rendering:

Operation	Typical Cost	Mitigation
JSON.parse (1KB)	0.1–0.5ms	Stream parsing for large payloads
JSON.parse (100KB)	5–50ms	Web Worker
IndexedDB write	1–10ms	Batch writes; requestIdleCallback
DOM update (100 items)	5–20ms	Virtual lists; batched updates

Offload to Web Workers:

1
// Main thread
2
const worker = new Worker("sync-worker.js")
3

4
worker.postMessage({ type: "parse", data: rawJson })
5

6
worker.onmessage = (e) => {
7
  // Parsed data ready
8
  updateState(e.data)
9
}
10

11
// sync-worker.js
12
self.onmessage = (e) => {
13
  if (e.data.type === "parse") {
14
    const parsed = JSON.parse(e.data.data)
15
    self.postMessage(parsed)
16
  }
17
}

Memory Management

Browser	WebSocket buffer limit	IndexedDB quota
Chrome	~1GB before crash	50% of disk
Firefox	~500MB	50% of disk
Safari	~256MB	1GB

WebSocket backpressure (experimental):

1
// Chrome 124+ WebSocketStream with backpressure
2
const ws = new WebSocketStream("wss://example.com")
3
const { readable, writable } = await ws.opened
4

5
const reader = readable.getReader()
6
while (true) {
7
  const { value, done } = await reader.read()
8
  if (done) break
9

10
  // Backpressure: read() naturally pauses when we can't keep up
11
  await processMessage(value)
12
}

Storage Quotas


5 collapsed lines
1
// Handle quota exceeded gracefully
2
async function cacheData(key: string, value: unknown): Promise<void> {
3
  const db = await openDB("cache", 1)
4

5
  try {
6
    await db.put("data", { key, value, timestamp: Date.now() })
7
  } catch (e) {
8
    if (e.name === "QuotaExceededError") {
9
      // Evict oldest entries
10
      const all = await db.getAll("data")
11
      all.sort((a, b) => a.timestamp - b.timestamp)
12

13
      // Delete oldest 20%
14
      const toDelete = all.slice(0, Math.ceil(all.length * 0.2))
15
      for (const item of toDelete) {
16
        await db.delete("data", item.key)
17
      }
18

19
      // Retry
20
      await db.put("data", { key, value, timestamp: Date.now() })
21
    } else {
22
      throw e
23
    }
24
  }
25
}

Mobile and Offline Considerations

Battery Optimization

Strategy	Impact	Implementation
Batch updates	High	Buffer messages for 1–5s before send
Adaptive polling	Medium	Increase interval on cellular
Binary protocols	Medium	MessagePack, Protocol Buffers
Sync on Wi-Fi only	High	Defer large sync until Wi-Fi detected


10 collapsed lines
1
// Network-aware sync strategy
2
type ConnectionType = "wifi" | "cellular" | "none"
3

4
function getConnectionType(): ConnectionType {
5
  const connection = (navigator as unknown as { connection?: { type: string } }).connection
6
  if (!connection) return "wifi" // Assume best
7
  return connection.type === "wifi" ? "wifi" : "cellular"
8
}
9

10
class NetworkAwareSync {
11
  private batchBuffer: unknown[] = []
12
  private flushTimer: number | null = null
13

14
  send(message: unknown): void {
15
    this.batchBuffer.push(message)
16

17
    const delay = getConnectionType() === "cellular" ? 2000 : 100
18

19
    if (!this.flushTimer) {
20
      this.flushTimer = window.setTimeout(() => {
21
        this.flush()
22
      }, delay)
23
    }
24
  }
25

26
  private flush(): void {
27
    if (this.batchBuffer.length === 0) return
28

29
    this.ws.send(
30
      JSON.stringify({
31
        type: "batch",
32
        messages: this.batchBuffer,
33
      }),
34
    )
10 collapsed lines
35

36
    this.batchBuffer = []
37
    this.flushTimer = null
38
  }
39
}
40

41
// Detect network type changes
42
navigator.connection?.addEventListener("change", () => {
43
  adjustSyncStrategy(getConnectionType())
44
})

Offline Queue


8 collapsed lines
1
// Persistent offline mutation queue
2
import { openDB, IDBPDatabase } from "idb"
3

4
type Mutation = {
5
  id: string
6
  action: string
7
  payload: unknown
8
  timestamp: number
9
}
10

11
class OfflineQueue {
12
  private db: IDBPDatabase | null = null
13

14
  async init(): Promise<void> {
15
    this.db = await openDB("offline-queue", 1, {
16
      upgrade(db) {
17
        db.createObjectStore("mutations", { keyPath: "id" })
18
      },
19
    })
20
  }
21

22
  async enqueue(mutation: Omit<Mutation, "id" | "timestamp">): Promise<void> {
23
    const item: Mutation = {
24
      ...mutation,
25
      id: crypto.randomUUID(),
26
      timestamp: Date.now(),
27
    }
28
    await this.db!.add("mutations", item)
29
  }
30

31
  async flush(): Promise<void> {
32
    const mutations = await this.db!.getAll("mutations")
33
    mutations.sort((a, b) => a.timestamp - b.timestamp)
34

35
    for (const mutation of mutations) {
36
      try {
37
        await this.sendToServer(mutation)
38
        await this.db!.delete("mutations", mutation.id)
39
      } catch (e) {
40
        // Stop on first failure; retry later
41
        break
42
      }
43
    }
44
  }
16 collapsed lines
45

46
  private async sendToServer(mutation: Mutation): Promise<void> {
47
    const response = await fetch("/api/mutate", {
48
      method: "POST",
49
      headers: {
50
        "Content-Type": "application/json",
51
        "Idempotency-Key": mutation.id,
52
      },
53
      body: JSON.stringify(mutation),
54
    })
55

56
    if (!response.ok) {
57
      throw new Error(`Server error: ${response.status}`)
58
    }
59
  }
60

61
  async getPendingCount(): Promise<number> {
62
    return (await this.db!.getAll("mutations")).length
63
  }
64
}
65

66
// Usage with online/offline detection
67
const queue = new OfflineQueue()
68
await queue.init()
69

70
window.addEventListener("online", () => queue.flush())

Background Sync (Service Worker)


5 collapsed lines
1
// Service Worker background sync
2
self.addEventListener("sync", (event: SyncEvent) => {
3
  if (event.tag === "sync-mutations") {
4
    event.waitUntil(syncMutations())
5
  }
6
})
7

8
async function syncMutations(): Promise<void> {
9
  const db = await openDB("offline-queue", 1)
10
  const mutations = await db.getAll("mutations")
11

12
  for (const mutation of mutations) {
13
    try {
14
      await fetch("/api/mutate", {
15
        method: "POST",
16
        headers: {
17
          "Content-Type": "application/json",
18
          "Idempotency-Key": mutation.id,
19
        },
20
        body: JSON.stringify(mutation),
21
      })
22
      await db.delete("mutations", mutation.id)
23
    } catch {
24
      // Will retry on next sync event
25
      return
26
    }
27
  }
28
}
29

30
// Register from main thread
31
navigator.serviceWorker.ready.then((registration) => {
32
  return registration.sync.register("sync-mutations")
33
})

Failure Modes and Edge Cases

Common Failure Scenarios

Scenario	Symptom	Detection	Recovery
Network partition	Messages queued, no acks	Heartbeat timeout	Reconnect with sequence recovery
Server restart	WebSocket close event	`close` event handler	Exponential backoff reconnect
Message loss	Missing sequence numbers	Gap detection	Request replay from server
Duplicate delivery	Same message twice	Idempotency key check	Skip processing
Clock skew	LWW picks wrong winner	N/A (hard to detect)	Use server timestamps or HLC
Thundering herd	Server overload on recovery	Server-side monitoring	Jittered backoff
Split brain	Divergent state	Consensus protocol	CRDT convergence or manual resolve

Testing Checklist

Rapid connect/disconnect cycles (10x in 1 second)
Slow network simulation (3G, 500ms latency)
Large payloads (10MB+ messages)
Many concurrent tabs (hit connection limits)
Device sleep/wake cycles
Offline for extended periods (1+ hours)
Server restart during active session
Network type change (Wi-Fi to cellular)
Clock adjustment during session

Conclusion

Real-time sync client architecture balances three tensions: latency vs. consistency, offline support vs. conflict complexity, and simplicity vs. reliability. The right approach depends on your data model and user expectations:

Property-level LWW (Figma) for structured objects where concurrent edits to different properties should both succeed
OT (Google Docs) for text documents requiring strong consistency and intent preservation
CRDTs (Notion, Linear) for offline-first scenarios where eventual convergence is acceptable
Full state sync for simpler applications where complexity isn’t justified

Connection management is non-negotiable: exponential backoff with jitter, heartbeats for health detection, and sequence-based recovery for message continuity. Optimistic updates with rollback provide the instant feedback users expect while maintaining server authority.

The production systems that get this right—Figma, Linear, Discord, Slack—invest heavily in the sync layer because it defines the core user experience. A 100ms delay feels instant; a 500ms delay feels sluggish; anything over 1s feels broken.

Appendix

Prerequisites

WebSocket API fundamentals
Async JavaScript (Promises, async/await)
State management patterns
Basic distributed systems concepts (consensus, eventual consistency)

Terminology

Term	Definition
CRDT	Conflict-Free Replicated Data Type—data structures that merge without coordination
OT	Operational Transformation—algorithm that transforms concurrent operations to preserve intent
LWW	Last-Write-Wins—conflict resolution using timestamps
Optimistic update	Applying changes locally before server confirmation
Reconciliation	Process of merging divergent client and server state
Presence	Ephemeral user state (online status, cursor position)
Heartbeat	Periodic signal to detect connection health
Tombstone	Marker indicating deleted item (in CRDTs)
Vector clock	Logical timestamp tracking causal ordering across nodes
Idempotency	Property where repeated operations have same effect as single execution

Summary

Transport selection: WebSocket for bidirectional low-latency; SSE for server-push simplicity; polling as fallback
Connection resilience: Exponential backoff with jitter prevents thundering herd; heartbeats detect stale connections
Message handling: Sequence numbers for ordering; idempotency keys for deduplication
Optimistic updates: Apply locally, rollback on server rejection—users expect instant feedback
Conflict resolution: LWW for simple cases; OT for text; CRDTs for offline-first
State reconciliation: Server is authoritative; re-apply pending local mutations on top of server state
Presence: Ephemeral, doesn’t need persistence; throttle high-frequency updates (cursors)

References

Specifications:

RFC 6455: The WebSocket Protocol - Authoritative WebSocket specification
RFC 7692: Compression Extensions for WebSocket - Per-message DEFLATE compression
RFC 8441: Bootstrapping WebSockets with HTTP/2 - WebSocket multiplexing over HTTP/2
WHATWG HTML Living Standard: Server-sent events - SSE specification

Official Documentation:

MDN: WebSocket API - Browser WebSocket implementation
MDN: Server-Sent Events - SSE usage guide
MDN: IndexedDB - Client-side storage for offline support

Production Engineering Blogs:

Figma: How multiplayer technology works - Property-level LWW, fractional indexing
Figma: Making multiplayer more reliable - Checkpointing, write-ahead log
Notion: How we made Notion available offline - CRDT-based offline sync
Linear: Scaling the sync engine - Local-first architecture
Discord: How Discord stores trillions of messages - Gateway architecture, storage evolution
Slack: Real-time messaging - Channel servers, consistency guarantees

Research and Theory:

CRDT.tech - Comprehensive CRDT resources
Martin Kleppmann: CRDTs and the Quest for Distributed Consistency - CRDT fundamentals
Gabriel Gambetta: Client-Side Prediction and Server Reconciliation - Game networking patterns applicable to real-time apps

Read more