Design Collaborative Document Editing (Google Docs)

A system design for real-time collaborative document editing covering synchronization algorithms, presence broadcasting, conflict resolution, storage patterns, and offline support. The target is sub-second convergence for concurrent edits while maintaining a full revision history and supporting tens of simultaneous editors per document — the regime Google Docs, Sheets, and Slides operate in today, where each file caps at 100 open tabs or devices editing the same file.¹

System overview of a collaborative editor: clients connect over a sticky-routed WebSocket gateway to a per-document sync engine, which writes operations into a durable log and a hot Redis cache. — System overview: sticky WebSocket gateway, per-document sync engine, append-only operation log, periodic snapshots.

Abstract

Collaborative document editing has to solve three interlocking problems at once: real-time synchronization (every active client sees every edit within hundreds of milliseconds), conflict resolution (concurrent edits never corrupt the document), and durability (no committed edit is ever lost).

Core architectural decisions:

Decision	Choice	Rationale
Sync algorithm	OT with server ordering	Single source of truth eliminates the TP2 obligation
Transport	WebSocket	Full-duplex, 2-14-byte frame headers after the handshake²
Persistence	Event-sourced operation log	Enables revision history, undo, and conflict replay
Presence	Ephemeral broadcast	Cursors don’t need durability; memory-only with TTL
Offline	Operation queue with reconciliation	Local-first editing, transform-and-replay on reconnect

Key trade-offs accepted:

A central server orders operations (no true peer-to-peer) in exchange for correctness guarantees.
The operation log grows without bound and has to be compacted with periodic snapshots.
Per-document affinity drives memory pressure on collaboration servers (one process effectively owns each active document).

What this design optimizes for:

Sub-100 ms operation propagation between connected clients on the same edge.
Convergence guaranteed regardless of network jitter or partial partitions.
A full, addressable revision history without inflating every cold read.

Requirements

Functional Requirements

Requirement	Priority	Notes
Real-time text editing	Core	Character-level granularity
Concurrent multi-user editing	Core	Tens of simultaneous editors
Live cursor / selection	Core	See where others are editing
Revision history	Core	View / restore any previous version
Rich text formatting	Core	Bold, italic, headings, lists
Comments and suggestions	Extended	Anchored to text ranges
Offline editing	Extended	Queue operations, sync on reconnect
Tables, images, embeds	Extended	Block-level elements

Non-Functional Requirements

Requirement	Target	Rationale
Availability	99.9% (3 nines)	User-facing, but brief outages acceptable
Edit propagation latency	p99 < 200 ms	Real-time feel requires sub-second
Document load time	p99 < 2 s	Cold start with full history
Concurrent editors	100 per document	Matches the published Google Workspace limit; Docs / Sheets / Slides cap at 100 open tabs or devices per file¹
Operation durability	99.999%	No edit should ever be lost
Revision retention	Indefinite	Full history for compliance

Scale Estimation

Note

The numbers below are first-cut order-of-magnitude estimates for sizing exercises, not measured production figures. Use them to reason about hot-path arithmetic, not as quotable benchmarks.

Users:

Monthly active users (MAU): 500M (Google Docs scale).
Daily active users (DAU): 100M (≈ 20% of MAU).
Peak concurrent users: 10M.

Documents:

Total documents: 5B.
Active documents (edited in last 30 days): 500M (≈ 10%).
Documents open concurrently at peak: 50M.

Traffic:

Operations per active editor: 1-5 per second while typing.
Average editing session: 15 minutes.
Peak concurrent editing sessions: 50M documents × 3 editors avg ≈ 150M.
Operations per second at peak: 150M × 2 ops/sec ≈ 300M ops/sec globally.

Storage:

Average operation size: ~100 bytes (insert / delete + metadata).
Operations per active document per day: ~10,000.
Daily operation volume: 500M docs × 10K ops × 100B ≈ 500 TB/day before compaction.
With daily snapshots: 500M × 50KB ≈ 25 TB/day of snapshot footprint.

Design Paths

Path A: Operational Transformation (Server-Ordered)

Best when:

Always-online with reliable connectivity.
Central infrastructure already exists.
Correctness is paramount (financial, legal documents).
Team has OT implementation experience or uses an existing library.

Architecture:

Central-server OT topology: each client keeps a pending op and an optional buffered op; the server holds the canonical revision counter, transform engine, append-only log, and broadcaster. — Central-server (Jupiter) OT: a single authoritative server transforms incoming ops against everything since their baseRev, appends to the log, then fans the transformed op out.

The Jupiter³ paper formalised this shape in 1995: every client speaks two-party OT to one authoritative server, which serialises everything and rebroadcasts. Google Wave⁴ and Google Docs both adopted Jupiter, and Wave added the restriction that a client can have at most one unacknowledged operation in flight — a constraint that keeps the per-client transformation history linear.

Sequence diagram of two clients submitting concurrent operations to a central server that orders, transforms, and rebroadcasts them. — Server-ordered OT: the server linearizes concurrent ops by transforming each one against everything that landed first.

Key characteristics:

The server assigns canonical operation order.
Clients transform incoming ops against their own pending local ops.
A single source of truth eliminates the TP2 (transformation property 2) requirement — the property that two transformation paths through three concurrent ops must agree, originally formalised by the OT community after Ellis and Gibbs’s 1989 dOPT algorithm⁵ was shown to mishandle three-way concurrency. Avoiding TP2 is what makes Jupiter-style OT tractable.

Trade-offs:

Proven correct in long-running production systems (Google Docs, CKEditor, the Wave / ShareDB lineage).
Only TP1 needs to hold, which makes the transformation functions tractable.
Wire format is small (operations stay close to “delta” sized).
Every operation batch needs a server round-trip.
Offline capability is limited to a local buffer; reconciliation still depends on the server.
The owning server is a single point of failure per document until ownership transfers.

Real-world example: Google Docs uses Jupiter-style server-ordered OT — every character change is appended as an event in a per-document log, and the document renders by replaying that log from a periodic checkpoint, an architecture Google described publicly in the 2010 “What’s different about the new Google Docs” post.⁶

Path B: CRDT-Based (Decentralized)

Best when:

Offline-first is a hard requirement.
Peer-to-peer scenarios where no authoritative server is available.
Multi-device sync over unreliable networks.
You want a mathematical convergence proof rather than relying on transformation correctness.

Architecture:

CRDT replication topology: two devices each maintain a local replica and exchange deltas through an optional sync relay or directly peer-to-peer. — CRDT topology: every replica is a peer; a relay is optional, not authoritative.

Key characteristics:

Operations commute by construction; no server arbitration needed.
Each replica carries the full CRDT state (or enough metadata to reconstruct it).
Convergence is guaranteed by the CRDT’s algebraic properties, independent of delivery order.

Trade-offs:

Native offline support, including arbitrarily long disconnects.
P2P synchronization is possible.
No authoritative server bottleneck.
Higher memory and storage overhead for tombstones, vector clocks, and identity metadata.
Initial document load can be slower because the replica may have to replay a lot of history.
Intent preservation for rich text is harder; pure CRDTs need careful work to model formatting boundaries (Peritext, Yjs’s rich-text types, etc.).

Real-world example: Yjs and Automerge are widely used pure-CRDT libraries; on top of them, products like JupyterLab Real-Time Collaboration, Tldraw, and many block-based editors get offline-first behavior without standing up a transformation engine. Notion, often cited here, is in fact closer to last-write-wins per block today, with CRDTs an explicitly stated future direction.⁷

Path C: Hybrid (Server-Ordered with CRDT Properties)

Best when:

You need real offline support but you also have server infrastructure you want to keep using.
You want CRDT-style merge guarantees with OT-style steady-state efficiency.
You’re willing to build on newer algorithms such as Eg-walker or Fugue.

Architecture:

Store an append-only operation DAG (CRDT-style provenance).
Use the server for canonical ordering and persistence (OT-style).
Merge divergent branches with a CRDT-style algorithm only when the DAG actually forks.
Free the merge state when there is no active divergence.

Trade-offs:

Best of both: efficient steady state, robust merging when clients reconverge from offline edits.
Substantially better merge complexity than traditional OT in the worst case.
True offline editing with real branch merging.
Newest approach in production; tooling and library maturity lag the OT/CRDT ecosystems.
Implementation complexity is higher than either pure approach.

Real-world example: Figma uses Eg-walker as the merge backbone of the multiplayer service that powers its Code Layers feature, launched in June 2025.⁸ The underlying algorithm — Gentle and Kleppmann, EuroSys 2025 — merges two divergent branches of k and m local events in O((k + m) log (k + m)) time and uses 1–2 orders of magnitude less steady-state memory than a comparable text CRDT, because the CRDT structure is built on demand for each merge and discarded afterwards.⁹

Path Comparison

Factor	OT (server-ordered)	CRDT	Hybrid (Eg-walker / Fugue)
Correctness proof	Transformation-based	Algebraic	Algebraic
Offline support	Buffer only	Native	Native
Server dependency	Required	Optional	Optional
Memory overhead	Low	High	Medium
Implementation	Moderate	Complex	Complex
Production examples	Google Docs, CKEditor	Yjs / Automerge, Tldraw, JupyterLab RTC	Figma Code Layers

The OT vs CRDT debate

The choice is not as settled as either camp likes to claim, and senior engineers should know why both sides are right about different things.¹⁰

Claim	Status (as of 2026-04)
“CRDTs need no central server.”	Verified for sync. But every production deployment still uses a server for auth, persistence, presence routing, and access control — see Yjs’s `y-websocket` provider and Automerge’s relay services.
”OT is simpler.”	Verified for the centralised, single-server case (Jupiter eliminates TP2). Rejected for peer-to-peer or multi-server: TP2 has decades of known bugs and only a handful of correct algorithms.
”CRDTs are slow and bloated.”	Verified circa 2018, rejected today: Yjs and Automerge 2/3 with columnar binary encoding store a 100 KB editing trace in low-MB on disk and apply ops in the millions/sec range.¹¹¹²
”OT preserves user intent for rich text better than CRDTs.”	Historically true; partially rejected since Peritext (Litt, Lim, Kleppmann, van Hardenberg, 2022) and the Fugue family closed most of the gap for inline formatting.¹³
”CRDTs are inherently superior because they have a convergence proof.”	Inferred only — the 2020 Sun et al. survey shows OT and CRDTs are two presentations of the same underlying transformation framework; the difference is in where the transformation happens, not whether it is provable.

The pragmatic split most teams land on:

Centralised, always-online, rich-text-heavy (Google Docs, Notion-style apps): server-ordered OT or OT-shaped step rebasing (ProseMirror’s prosemirror-collab is the reference). Mature, fast, integrates cleanly with auth and persistence.
Local-first, offline-first, multi-device, federated (Linear, Tldraw, JupyterLab RTC, anything that needs to merge weeks of offline work): pure CRDT with awareness for presence. Yjs and Automerge are the production-grade choices.
Centralised today, offline-first tomorrow: hybrid algorithms (Eg-walker, Fugue) keep the server-ordered steady state but unlock real branch merging when needed.

Important

The interesting axis is no longer “which algorithm” but “what’s your topology, and what failure modes are you willing to wear?” If you have a server, OT is fewer moving parts. If you genuinely don’t, you need a CRDT — and you should pick one whose rich-text story (Peritext, YATA-with-format-attributes, etc.) matches your editor.

This Article’s Focus

This article focuses on Path A (server-ordered OT) because:

It is the most battle-tested approach — Google Docs has run on this shape since the 2010 rewrite.⁶
Most real workloads have a reliable server-side path; the offline fraction is small.
The correctness story is easier to reason about and to test (no TP2 obligation).
There is mature library support — ShareDB, ot.js, and the Quill Delta toolchain — to build on instead of writing transformation functions from scratch.

A deep dive into CRDTs, including TP2 and intent preservation, lives in the companion article CRDTs for Collaborative Systems.

High-Level Design

Component Overview

WebSocket Gateway

Manages persistent connections between clients and the collaboration tier.

Responsibilities:

Connection lifecycle (connect, heartbeat, graceful disconnect).
Routing messages to the document processor that owns a given document.
Broadcasting presence updates.
Handling reconnection and state recovery without forcing a full document reload.

Design decisions:

Decision	Choice	Rationale
Protocol	WebSocket (RFC 6455)	Full-duplex, 2-14-byte frame header vs. per-request HTTP overhead²
Session affinity	Sticky by document	All editors of a document hit the same server, so transformation state is local
Heartbeat	30-second interval	Detect dead connections fast enough to release locks
Reconnection	Exponential backoff	Avoid thundering-herd reconnects on a partial outage

Scaling approach:

Horizontal scaling with consistent hashing by document ID.
One server “owns” each active document at a time.
Ownership transfers on server failure via a distributed lock (etcd, ZooKeeper, or a Redis-based primitive).

Document Processor (OT Engine)

The core synchronization component that transforms and orders operations.

State per active document:

1interface DocumentState {2  documentId: string3  revision: number               // Monotonic operation counter4  content: DocumentContent       // Current document state5  pendingOps: Map<ClientId, Operation[]> // Ops awaiting transform6  clients: Map<ClientId, ClientState>    // Connected clients7}89interface ClientState {10  clientId: string11  lastAckedRevision: number12  cursor: CursorPosition | null13  color: string                  // For presence display14}

Operation flow:

Receive — client sends an operation tagged with its base revision.
Validate — check that the base revision is recent enough that we can still replay missing ops.
Transform — transform the incoming op against every operation since the base revision.
Apply — update the in-memory document state.
Persist — append to the durable operation log.
Broadcast — fan the transformed op out to every other connected client.

Memory management:

Keep document state in memory while the document is active.
Evict after 5 minutes of inactivity.
Rehydrate from the latest snapshot plus the tail of the operation log.

Presence Service

Handles ephemeral state: cursors, selections, and “user is here” indicators.

Presence pipeline: caret events get throttled to 20 Hz, coalesced over a 50 ms window, sent on a separate WebSocket channel, kept in an in-memory TTL map per document, and fanned out to receivers without transformation. — Presence pipeline: throttle, coalesce, send on a separate channel, TTL'd in-memory map, fan-out — no log, no transformation.

Design decisions:

No persistence. Presence rebuilds on reconnect; nothing depends on it being durable. Yjs’s awareness protocol takes the same stance — last-write-wins per clientID, marked offline if no update arrives within ~30 seconds.¹⁴
Throttled broadcast. Cap at ~20 updates / second per client to keep the fan-out cost predictable.
Coalesced updates. Batch cursor movements before broadcast (50 ms collection window is a good default).
Separate channel from operations. Presence and ops share the same WebSocket but ride different message types so a backed-up op queue never delays cursor updates (this is also how ShareDB’s DocPresence keeps cursors aligned to the document version they were captured against).

Data structure:

1interface PresenceUpdate {2  clientId: string3  documentId: string4  cursor: {5    anchor: number               // Selection start6    head: number                 // Cursor position7  } | null8  user: {9    id: string10    name: string11    avatar: string12    color: string                // Assigned per-document for stable identity13  }14  timestamp: number15}

Document API

Handles document CRUD, access control, and version retrieval.

Endpoints:

Endpoint	Method	Purpose
`/documents`	POST	Create document
`/documents/{id}`	GET	Load document (latest or specific revision)
`/documents/{id}/operations`	GET	Fetch operation range for history / replay
`/documents/{id}/snapshot`	POST	Create manual snapshot
`/documents/{id}/revisions`	GET	List revision metadata
`/documents/{id}/permissions`	PUT	Update access control

API Design

WebSocket Protocol

Client → Server Messages

Send operation:

1{2  "type": "operation",3  "documentId": "doc_abc123",4  "clientId": "client_xyz",5  "baseRevision": 142,6  "operation": {7    "ops": [{ "retain": 50 }, { "insert": "Hello, " }, { "retain": 100 }, { "delete": 5 }]8  },9  "timestamp": 170688640000010}

Update presence:

1{2  "type": "presence",3  "documentId": "doc_abc123",4  "cursor": { "anchor": 150, "head": 150 },5  "selection": null6}

Server → Client Messages

Operation acknowledgement:

1{2  "type": "ack",3  "documentId": "doc_abc123",4  "revision": 143,5  "transformedOp": { "ops": [{ "retain": 50 }, { "insert": "Hello, " }] }6}

Broadcast operation (to other clients):

1{2  "type": "remote_operation",3  "documentId": "doc_abc123",4  "clientId": "client_other",5  "revision": 143,6  "operation": { "ops": [{ "retain": 50 }, { "insert": "Hello, " }] },7  "user": {8    "id": "user_123",9    "name": "Alice"10  }11}

Presence broadcast:

1{2  "type": "remote_presence",3  "documentId": "doc_abc123",4  "presences": [5    {6      "clientId": "client_other",7      "cursor": { "anchor": 200, "head": 210 },8      "user": { "id": "user_123", "name": "Alice", "color": "#4285f4" }9    }10  ]11}

REST API

Create Document

POST /api/v1/documents

Request:

1{2  "title": "Untitled Document",3  "content": "",4  "folderId": "folder_abc",5  "templateId": "template_xyz"6}

Response (201 Created):

1{2  "id": "doc_abc123",3  "title": "Untitled Document",4  "revision": 0,5  "createdAt": "2024-02-03T10:00:00Z",6  "createdBy": {7    "id": "user_123",8    "name": "Alice"9  },10  "permissions": {11    "owner": "user_123",12    "editors": [],13    "viewers": []14  },15  "wsEndpoint": "wss://collab.example.com/ws/doc_abc123"16}

Load Document

GET /api/v1/documents/{id}?revision={optional}

Response (200 OK):

1{2  "id": "doc_abc123",3  "title": "Project Proposal",4  "revision": 1542,5  "content": {6    "type": "doc",7    "content": [8      {9        "type": "heading",10        "attrs": { "level": 1 },11        "content": [{ "type": "text", "text": "Introduction" }]12      },13      {14        "type": "paragraph",15        "content": [{ "type": "text", "text": "..." }]16      }17    ]18  },19  "snapshot": {20    "revision": 1500,21    "createdAt": "2024-02-03T09:00:00Z"22  },23  "pendingOperations": 42,24  "collaborators": [{ "id": "user_456", "name": "Bob", "online": true }]25}

List Revisions

GET /api/v1/documents/{id}/revisions?limit=50&before={revision}

Response (200 OK):

1{2  "revisions": [3    {4      "revision": 1542,5      "timestamp": "2024-02-03T10:30:00Z",6      "user": { "id": "user_123", "name": "Alice" },7      "summary": "Edited section 3",8      "operationCount": 159    },10    {11      "revision": 1500,12      "timestamp": "2024-02-03T09:00:00Z",13      "user": { "id": "user_456", "name": "Bob" },14      "summary": "Added introduction",15      "operationCount": 203,16      "isSnapshot": true17    }18  ],19  "hasMore": true,20  "nextCursor": "rev_1499"21}

Error Responses

Code	Error	When
400	`INVALID_OPERATION`	Operation format invalid
409	`REVISION_CONFLICT`	Base revision too old
410	`DOCUMENT_DELETED`	Document was deleted
423	`DOCUMENT_LOCKED`	Document temporarily locked
429	`RATE_LIMITED`	Too many operations

Revision conflict handling:

1{2  "error": "REVISION_CONFLICT",3  "message": "Base revision 100 is too old. Current: 150",4  "currentRevision": 150,5  "missingOperations": "/api/v1/documents/doc_abc/operations?from=100&to=150"6}

The client fetches the missing operations, transforms its local pending operations against them, and retries.

Data Modeling

Document Metadata (PostgreSQL)

1CREATE TABLE documents (2    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),3    title TEXT NOT NULL,4    owner_id UUID NOT NULL REFERENCES users(id),5    folder_id UUID REFERENCES folders(id),6    current_revision BIGINT DEFAULT 0,7    latest_snapshot_revision BIGINT,8    content_type VARCHAR(50) DEFAULT 'rich_text',9    created_at TIMESTAMPTZ DEFAULT NOW(),10    updated_at TIMESTAMPTZ DEFAULT NOW(),11    deleted_at TIMESTAMPTZ,1213    -- Denormalized for read performance14    collaborator_count INT DEFAULT 0,15    word_count INT DEFAULT 0,16    last_edited_by UUID REFERENCES users(id),17    last_edited_at TIMESTAMPTZ18);1920CREATE TABLE document_permissions (21    document_id UUID REFERENCES documents(id) ON DELETE CASCADE,22    user_id UUID REFERENCES users(id),23    role VARCHAR(20) NOT NULL, -- 'owner', 'editor', 'commenter', 'viewer'24    granted_at TIMESTAMPTZ DEFAULT NOW(),25    granted_by UUID REFERENCES users(id),26    PRIMARY KEY (document_id, user_id)27);2829CREATE INDEX idx_documents_owner ON documents(owner_id, updated_at DESC);30CREATE INDEX idx_documents_folder ON documents(folder_id, updated_at DESC);31CREATE INDEX idx_permissions_user ON document_permissions(user_id);

Operation Log (DynamoDB)

Table design for an append-heavy workload:

Partition Key	Sort Key	Attributes
`document_id`	`revision`	`operation`, `client_id`, `user_id`, `timestamp`, `checksum`

Per-row schema:

1{2  "document_id": "doc_abc123",3  "revision": 1542,4  "operation": {5    "ops": [{ "retain": 50 }, { "insert": "Hello" }]6  },7  "client_id": "client_xyz",8  "user_id": "user_123",9  "timestamp": 1706886400000,10  "checksum": "sha256:abc123...",11  "ttl": null12}

Why DynamoDB:

Append-only workload (write-optimized).
Predictable single-digit-ms latency at scale.
Built-in TTL for old operations after they roll into a snapshot.
Range queries by sort key (revision) are efficient and cheap.

Capacity planning:

Write capacity: 300M ops/sec globally → naturally partitioned across documents.
Single document: capped at ~200 ops/sec (100 active editors × 2 ops/sec — matching the published 100-tab Google Docs ceiling).
Read capacity: bursts on document load, otherwise minimal.

Snapshots (S3)

Object key:

1s3://doc-snapshots/{document_id}/{revision}.json.gz

Snapshot payload:

1{2  "documentId": "doc_abc123",3  "revision": 1500,4  "createdAt": "2024-02-03T09:00:00Z",5  "content": {6    "type": "doc",7    "content": []8  },9  "metadata": {10    "wordCount": 5420,11    "characterCount": 32150,12    "imageCount": 1213  },14  "checksum": "sha256:..."15}

Snapshot policy:

Create a snapshot every 1000 operations.
Or every 1 hour of active editing.
Or on manual request when the user opens revision history.
Keep all snapshots indefinitely for compliance.

Active Document Cache (Redis)

1# Document state (hash)2HSET doc:{id}:state3    revision 15424    content "{serialized_content}"5    last_updated 170688640000067# Connected clients (sorted set by last activity)8ZADD doc:{id}:clients {timestamp} {client_id}910# Pending operations queue (list)11RPUSH doc:{id}:pending "{operation_json}"1213# Presence (hash with TTL per client)14HSET doc:{id}:presence:{client_id}15    cursor_anchor 15016    cursor_head 15017    user_name "Alice"18    user_color "#4285f4"19EXPIRE doc:{id}:presence:{client_id} 60

Eviction policy:

Documents evicted after 5 minutes of no activity.
Presence entries auto-expire after 60 seconds without a refresh.

Low-Level Design

OT Transformation Engine

Operation Format

The format below is similar to Quill Delta and Google Wave’s wire shape — a flat list of retain, insert, and delete operations:¹⁵

1type Operation = {2  ops: (RetainOp | InsertOp | DeleteOp)[]3}45type RetainOp = {6  retain: number7  attributes?: Record<string, unknown> // For formatting changes8}910type InsertOp = {11  insert: string | { image: string } | { embed: unknown }12  attributes?: Record<string, unknown>13}1415type DeleteOp = {16  delete: number17}

Examples:

1// Insert "Hello" at position 02{3  ops: [{ insert: "Hello" }]4}56// Delete 3 characters at position 107{8  ops: [{ retain: 10 }, { delete: 3 }]9}1011// Bold characters 5..1012{13  ops: [{ retain: 5 }, { retain: 5, attributes: { bold: true } }]14}

Transformation Functions

The transformation function is what makes OT non-trivial. The diagram below works through a concrete two-client convergence on a three-character document; the code beneath generalises it.

OT transform on a concrete three-character document: client A inserts at 0, client B deletes at 1, server orders A first, transforms B to delete at 2, both clients converge to XAC. — OT operation transform on a concrete example: server orders concurrent edits, transforms B against A, and both clients converge to the same string.

1function transform(op1: Operation, op2: Operation, priority: "left" | "right"): [Operation, Operation] {2  // op1' = transform(op1, op2) — op1 transformed against op23  // op2' = transform(op2, op1) — op2 transformed against op14  // Guarantee: apply(apply(doc, op1), op2') === apply(apply(doc, op2), op1')56  const ops1 = [...op1.ops]7  const ops2 = [...op2.ops]8  const result1: Op[] = []9  const result2: Op[] = []1011  let i1 = 0,12    i2 = 01314  while (i1 < ops1.length || i2 < ops2.length) {15    const o1 = ops1[i1]16    const o2 = ops2[i2]1718    // Case: insert vs anything — inserts go first19    if (o1 && "insert" in o1) {20      if (priority === "left") {21        result2.push({ retain: insertLength(o1) })22        result1.push(o1)23        i1++24        continue25      }26    }27    if (o2 && "insert" in o2) {28      result1.push({ retain: insertLength(o2) })29      result2.push(o2)30      i2++31      continue32    }3334    // Case: retain vs retain35    if (o1 && "retain" in o1 && o2 && "retain" in o2) {36      const len = Math.min(o1.retain, o2.retain)37      result1.push({ retain: len, attributes: o1.attributes })38      result2.push({ retain: len, attributes: o2.attributes })39      consumeLength(ops1, i1, len)40      consumeLength(ops2, i2, len)41      continue42    }4344    // Case: delete vs retain45    if (o1 && "delete" in o1 && o2 && "retain" in o2) {46      const len = Math.min(o1.delete, o2.retain)47      result1.push({ delete: len })48      // o2 produces no output — content was deleted49      consumeLength(ops1, i1, len)50      consumeLength(ops2, i2, len)51      continue52    }5354    // Case: retain vs delete55    if (o1 && "retain" in o1 && o2 && "delete" in o2) {56      const len = Math.min(o1.retain, o2.delete)57      // o1 produces no output — content was deleted58      result2.push({ delete: len })59      consumeLength(ops1, i1, len)60      consumeLength(ops2, i2, len)61      continue62    }6364    // Case: delete vs delete — both delete the same content65    if (o1 && "delete" in o1 && o2 && "delete" in o2) {66      const len = Math.min(o1.delete, o2.delete)67      // Neither produces output — already deleted68      consumeLength(ops1, i1, len)69      consumeLength(ops2, i2, len)70      continue71    }72  }7374  return [{ ops: result1 }, { ops: result2 }]75}

Server-Side Processing

1class DocumentProcessor {2  private state: DocumentState3  private opLog: OperationLog4  private broadcaster: Broadcaster56  async processOperation(clientId: string, baseRevision: number, operation: Operation): Promise<ProcessResult> {7    if (baseRevision < this.state.revision - MAX_REVISION_LAG) {8      throw new RevisionConflictError(this.state.revision)9    }1011    let transformedOp = operation12    for (let rev = baseRevision + 1; rev <= this.state.revision; rev++) {13      const serverOp = await this.opLog.getOperation(this.state.documentId, rev)14      ;[transformedOp] = transform(transformedOp, serverOp, "right")15    }1617    const newContent = applyOperation(this.state.content, transformedOp)18    const newRevision = this.state.revision + 11920    await this.opLog.append({21      documentId: this.state.documentId,22      revision: newRevision,23      operation: transformedOp,24      clientId,25      timestamp: Date.now(),26    })2728    this.state.content = newContent29    this.state.revision = newRevision3031    this.broadcaster.broadcastOperation(32      this.state.documentId,33      clientId, // Exclude sender34      newRevision,35      transformedOp,36    )3738    return {39      revision: newRevision,40      transformedOp,41    }42  }43}

Client-Side State Machine

The client owns a small, three-state machine that lets the user keep typing without waiting for the server. The diagram below shows the transitions; the code right after it implements them.

State diagram for the client OT state machine: Synchronized, AwaitingAck, AwaitingWithBuffer, with transitions on local edit, server ack, and remote op. — Client OT state machine: a single in-flight operation plus an optional buffer keeps the local view always-editable.

1type ClientOTState =2  | { type: "synchronized"; serverRevision: number }3  | { type: "awaitingAck"; serverRevision: number; pending: Operation }4  | { type: "awaitingWithBuffer"; serverRevision: number; pending: Operation; buffer: Operation }56class ClientOT {7  private state: ClientOTState = { type: "synchronized", serverRevision: 0 }8  private document: DocumentContent910  onLocalEdit(operation: Operation): void {11    switch (this.state.type) {12      case "synchronized":13        this.sendToServer(operation, this.state.serverRevision)14        this.state = {15          type: "awaitingAck",16          serverRevision: this.state.serverRevision,17          pending: operation,18        }19        break2021      case "awaitingAck":22        this.state = {23          type: "awaitingWithBuffer",24          serverRevision: this.state.serverRevision,25          pending: this.state.pending,26          buffer: operation,27        }28        break2930      case "awaitingWithBuffer":31        this.state = {32          ...this.state,33          buffer: compose(this.state.buffer, operation),34        }35        break36    }3738    this.document = applyOperation(this.document, operation)39  }4041  onServerAck(revision: number): void {42    switch (this.state.type) {43      case "awaitingAck":44        this.state = { type: "synchronized", serverRevision: revision }45        break4647      case "awaitingWithBuffer":48        this.sendToServer(this.state.buffer, revision)49        this.state = {50          type: "awaitingAck",51          serverRevision: revision,52          pending: this.state.buffer,53        }54        break55    }56  }5758  onRemoteOperation(revision: number, operation: Operation): void {59    let transformedRemote = operation6061    if (this.state.type === "awaitingAck" || this.state.type === "awaitingWithBuffer") {62      ;[, transformedRemote] = transform(this.state.pending, operation, "left")63      const [newPending] = transform(this.state.pending, operation, "left")64      this.state = { ...this.state, pending: newPending }65    }6667    if (this.state.type === "awaitingWithBuffer") {68      ;[, transformedRemote] = transform(this.state.buffer, transformedRemote, "left")69      const [newBuffer] = transform(this.state.buffer, operation, "left")70      this.state = { ...this.state, buffer: newBuffer }71    }7273    this.document = applyOperation(this.document, transformedRemote)74  }75}

Snapshot and Compaction

Operation logs grow without bound. Bounding the cost of a fresh document load means rolling a snapshot in periodically and letting old operations age out.

Snapshot Worker

1class SnapshotWorker {2  private readonly SNAPSHOT_THRESHOLD = 1000          // Operations since last snapshot3  private readonly SNAPSHOT_INTERVAL_MS = 3_600_000   // 1 hour45  async processDocument(documentId: string): Promise<void> {6    const doc = await this.documentStore.getMetadata(documentId)7    const latestSnapshot = await this.snapshotStore.getLatest(documentId)89    const opsSinceSnapshot = doc.currentRevision - (latestSnapshot?.revision ?? 0)10    const timeSinceSnapshot = Date.now() - (latestSnapshot?.createdAt ?? 0)1112    if (opsSinceSnapshot < this.SNAPSHOT_THRESHOLD && timeSinceSnapshot < this.SNAPSHOT_INTERVAL_MS) {13      return14    }1516    let content = latestSnapshot?.content ?? emptyDocument()17    const operations = await this.opLog.getRange(18      documentId,19      (latestSnapshot?.revision ?? 0) + 1,20      doc.currentRevision,21    )2223    for (const op of operations) {24      content = applyOperation(content, op.operation)25    }2627    await this.snapshotStore.create({28      documentId,29      revision: doc.currentRevision,30      content,31      createdAt: Date.now(),32    })3334    // Mark old operations for TTL expiry; keep last 100 for fine-grained replay35    await this.opLog.setTTL(documentId, 0, doc.currentRevision - 100, TTL_30_DAYS)36  }37}

Undo and Redo with Concurrent Edits

Undo in a collaborative editor is not “pop the last op off the log.” That would erase someone else’s intervening work. The standard pattern, used by Google Docs and most ProseMirror-based editors, is:

Every locally-committed op pushes its inverse onto a per-user undo stack at the revision the original op landed.
On Ctrl-Z, pop the inverse, then transform it against every server op that has arrived since the original was applied.
Send the transformed inverse as a fresh operation. Redo pushes the inverse-of-the-inverse onto the redo stack and pops it the same way.

Sequence diagram: user A submits opA, two remote ops arrive, then A undoes — the inverse of opA is transformed against the intervening remote ops before being sent. — Undo with concurrent edits: invert the original op, transform the inverse against everything since, then submit it as a fresh op so remote work survives.

Caution

Per-user undo stacks are not shared. If Alice undoes after Bob has typed inside Alice’s earlier paragraph, Alice’s undo only removes Alice’s text — Bob’s characters stay. Surface this in the UX (e.g. show an “undo affects only your changes” hint) instead of trying to make undo globally LIFO, which is what destroys other users’ work in naïve implementations.

The same pattern is what makes “selective undo” tractable in OT: any op in the stack can be inverted and rebased forward, not just the most recent one. Production systems still cap the depth (Google Docs’ undo history is bounded; once you scroll the doc through enough revisions, older inverses become harder to rebase reliably).

Frontend Considerations

Editor Integration

Most teams build on an existing rich-text editor instead of writing one. The shape of the OT integration depends on the editor’s own model:

Editor	OT / CRDT story	Notes
ProseMirror	”Steps” (OT-like)	Used by Atlassian; first-party `prosemirror-collab` package handles step rebasing¹⁶
Slate	Plugin-based	Flexible, needs an OT or CRDT library bolted on
Quill	Delta format (OT-shaped)	Native OT support via Quill Delta
TipTap	ProseMirror-based	Modern API; inherits ProseMirror’s collab story

Integration sketch (ProseMirror-style):

1class CollaborativeEditor {2  private view: EditorView3  private otClient: ClientOT4  private ws: WebSocket56  constructor(container: HTMLElement, documentId: string) {7    this.otClient = new ClientOT()89    this.ws = new WebSocket(`wss://collab.example.com/ws/${documentId}`)10    this.ws.onmessage = this.handleServerMessage.bind(this)1112    this.view = new EditorView(container, {13      state: EditorState.create({14        plugins: [collab({ version: 0 }), this.cursorPlugin(), this.presencePlugin()],15      }),16      dispatchTransaction: this.handleLocalChange.bind(this),17    })18  }1920  private handleLocalChange(tr: Transaction): void {21    const newState = this.view.state.apply(tr)22    this.view.updateState(newState)2324    if (tr.docChanged) {25      const steps = sendableSteps(newState)26      if (steps) {27        const operation = stepsToOperation(steps.steps)28        this.otClient.onLocalEdit(operation)29        this.ws.send(30          JSON.stringify({31            type: "operation",32            operation,33            baseRevision: this.otClient.serverRevision,34          }),35        )36      }37    }38  }39}

Presence Rendering

Cursor overlay sketch:

1interface RemoteCursor {2  clientId: string3  user: { name: string; color: string }4  anchor: number5  head: number6}78class CursorOverlay {9  private cursors: Map<string, RemoteCursor> = new Map()1011  updateCursor(cursor: RemoteCursor): void {12    this.cursors.set(cursor.clientId, cursor)13    this.render()14  }1516  removeCursor(clientId: string): void {17    this.cursors.delete(clientId)18    this.render()19  }2021  private render(): void {22    for (const [clientId, cursor] of this.cursors) {23      const coords = this.positionToCoords(cursor.head)2425      this.renderCaret(clientId, coords, cursor.user.color)2627      if (cursor.anchor !== cursor.head) {28        this.renderSelection(clientId, cursor.anchor, cursor.head, cursor.user.color)29      }3031      this.renderNameLabel(clientId, coords, cursor.user)32    }33  }34}

Performance optimizations worth defaulting on:

Technique	Purpose	Implementation
Throttle cursor updates	Reduce network traffic	Max 20 updates/sec
Batch presence broadcasts	Reduce message count	Collect 50 ms, send batch
Use CSS transforms	Avoid layout thrashing	`transform: translate()`
Virtual cursor layer	Don’t mutate editor DOM	Absolute-positioned overlay

Offline Support

Offline editing leans on IndexedDB for the queue and the same transformation functions for reconciliation:

1class OfflineQueue {2  private db: IDBDatabase3  private queueName = "pendingOperations"45  async enqueue(documentId: string, operation: Operation): Promise<void> {6    const tx = this.db.transaction(this.queueName, "readwrite")7    const store = tx.objectStore(this.queueName)89    await store.add({10      documentId,11      operation,12      timestamp: Date.now(),13      id: crypto.randomUUID(),14    })15  }1617  async syncPending(documentId: string): Promise<void> {18    const pending = await this.getPending(documentId)1920    for (const item of pending) {21      try {22        await this.sendOperation(item)23        await this.remove(item.id)24      } catch (e) {25        if (e instanceof RevisionConflictError) {26          await this.handleConflict(documentId, item)27        } else {28          throw e29        }30      }31    }32  }33}

Caution

Offline-then-online is the most common source of subtle convergence bugs. Treat the offline → reconnect path as a separate test suite: long disconnects (hours), formatting-only edits, edits to deleted ranges, and edits that cross a snapshot boundary.

Infrastructure

Cloud-Agnostic Components

Component	Purpose	Options
WebSocket gateway	Real-time connections	Nginx, HAProxy, Envoy
Message queue	Operation streaming	Kafka, RabbitMQ, NATS
KV store	Active document state	Redis, Memcached, KeyDB
Document store	Operation log	Cassandra, ScyllaDB, DynamoDB
Object store	Snapshots, media	MinIO, Ceph, S3-compatible
Relational DB	Metadata, ACL	PostgreSQL, CockroachDB

AWS Reference Architecture

Service configurations:

Service	Configuration	Rationale
WebSocket (Fargate)	4 vCPU, 8 GB RAM	Memory for active documents
API (Fargate)	2 vCPU, 4 GB RAM	Stateless, scales on traffic
Workers (Fargate Spot)	2 vCPU, 4 GB RAM	Cost optimization for async work
ElastiCache	r6g.xlarge cluster	Sub-ms latency for hot documents
RDS PostgreSQL	db.r6g.2xlarge Multi-AZ	Metadata queries, ACL
DynamoDB	On-demand	Predictable per-op pricing
S3	Standard + Intelligent-Tiering	Hot snapshots, cold history

Scaling Considerations

WebSocket connection limits:

A single Linux server typically tops out around ~65k connections without aggressive ulimit / port-range tuning.¹⁷
Solution: consistent hashing by document ID across a server pool.
Active documents per server: ~10k (memory-constrained, not socket-constrained).

Document processor memory:

Average document state: ~100 KB.
Active document with history buffer: ~500 KB.
An 8 GB server fits roughly ~16k active documents in steady state.

Operation log partitioning:

DynamoDB partition key is document_id.
Hot partition limit is 3,000 WCU.
Solution: split a single document across logical sub-streams only when one document genuinely exceeds that ceiling, which is very rare.

Conclusion

This design delivers real-time collaborative document editing with:

Sub-200 ms operation propagation via WebSocket and server-ordered OT.
Strong convergence guarantees without the TP2 obligation pure peer-to-peer OT carries.
Full revision history through an event-sourced operation log with periodic snapshots.
Offline resilience via an IndexedDB queue plus transform-and-replay reconciliation.

Key architectural decisions:

Server-ordered OT eliminates the TP2 correctness burden.
Periodic snapshots bound operation replay cost on cold reads.
Ephemeral presence avoids persistence overhead for cursors.
Per-document process affinity simplifies scaling at the cost of memory pressure on hot servers.

Known limitations:

Server dependency for real-time sync (no true peer-to-peer).
Memory pressure climbs steeply at very high concurrent-editor counts.
Snapshot creation adds latency on very active documents while it runs.

Future enhancements:

Hybrid OT / CRDT (Eg-walker, Fugue) for stronger offline merging without giving up steady-state efficiency.
Incremental snapshot deltas to reduce storage churn for very long-lived documents.
Smarter presence coalescing for large collaborator counts.

Appendix

Prerequisites

Distributed-systems fundamentals (eventual consistency, vector clocks, causal order).
Real-time communication patterns (WebSocket, SSE).
Event-sourcing concepts.
A working understanding of OT or CRDTs (see related articles).

Terminology

Term	Definition
OT	Operational Transformation — algorithm for transforming concurrent operations
TP1 / TP2	Transformation properties ensuring convergence
Revision	Monotonic counter representing document state version
Operation	Atomic change to a document (insert, delete, format)
Snapshot	Full document state at a specific revision
Presence	Ephemeral state like cursors and selections
Tombstone	Marker for deleted content in CRDT systems

Summary

Real-time collaborative editing requires synchronization algorithms (OT or CRDT), presence broadcasting, and event-sourced persistence.
Server-ordered OT dominates production text editors (Google Docs, CKEditor) because it sidesteps TP2.
WebSocket provides full-duplex communication with 2-14-byte frame headers after the handshake — far cheaper than HTTP per message.
Operation log + periodic snapshots enables full revision history while bounding cold-read replay cost.
Presence is ephemeral — cursors and selections live in memory only, reconstructed on reconnect.
Scale to ~100 concurrent editors per document at the published Google Docs ceiling, with sub-200 ms operation propagation.

References

Architecture and implementation:

How Figma’s multiplayer technology works — Figma engineering blog
Making multiplayer more reliable — Figma transaction-journal design
Realtime editing of ordered sequences — fractional indexing at Figma
Canvas, meet code: building Figma’s code layers — Eg-walker in production
The data model behind Notion — block-based architecture
Sharding Postgres at Notion — database scaling patterns
Scaling the Linear sync engine — local-first sync architecture

Operational Transformation:

Apache Wave OT whitepaper — protocol spec
What’s different about the new Google Docs — Google’s 2010 architecture overview
Lessons learned from CKEditor 5 — production OT for rich text
Jupiter collaboration system (Nichols et al., UIST ‘95) — the central-server OT design Google Docs and Wave inherit
Concurrency control in groupware systems (Ellis & Gibbs, SIGMOD ‘89) — original OT and dOPT
Architectures for Central Server Collaboration — Matthew Weidner — modern survey of OT, OT-ish, and CRDT server shapes

Algorithms and research:

Eg-walker: collaborative text editing — Gentle & Kleppmann, EuroSys 2025
Real differences between OT and CRDT — ACM 2020 comparison
Peritext: a CRDT for collaborative rich text editing — Litt, Lim, Kleppmann, van Hardenberg
YATA: near real-time peer-to-peer shared editing — the algorithm behind Yjs
I was wrong. CRDTs are the future — Joseph Gentle (ShareJS) on the OT/CRDT pivot
Performance of real-time collaborative editors at large scale — scaling analysis

Libraries and protocols:

Yjs y-protocols/PROTOCOL.md — sync + awareness wire format
ShareDB presence docs — typed DocPresence for cursor alignment
ProseMirror collaborative editing guide — step rebasing as OT-ish

Related articles:

Operational Transformation — deep dive into OT algorithms
CRDTs for Collaborative Systems — alternative approach for offline-first

Share files from Google Drive — Google Docs Editors Help. As of 2026-04, Google’s published limit is “Google Docs, Sheets, Slides, or Vids files can be edited on up to 100 open tabs or devices at the same time. After 100 tabs or devices, only the owner and some users with editing permissions can edit the file.” The cap is per open tab or device, not per distinct user identity. ↩ ↩²
RFC 6455 §5.2 — Base framing protocol. The frame header is 2 bytes minimum, plus 0/2/8 bytes of extended payload length, plus 0 or 4 bytes of mask key — 2-14 bytes total. ↩ ↩²
David A. Nichols, Pavel Curtis, Michael Dixon, John Lamping, High-latency, low-bandwidth windowing in the Jupiter collaboration system, UIST ‘95. The original two-party server-mediated OT design that Google Wave and Google Docs both adopted; PDF mirror at the Lively Kernel repository. ↩
Google Wave Operational Transformation whitepaper, archived under Apache Wave. Documents Wave’s “one unacknowledged op per client” extension to Jupiter and the server’s role as the canonical orderer. ↩
C. A. Ellis and S. J. Gibbs, Concurrency control in groupware systems, SIGMOD ‘89. Introduces dOPT and the convergence / precedence properties; later shown by Ressel et al. (1995) to mishandle three-way concurrency, motivating the TP1 / TP2 formalisation. ↩
What’s different about the new Google Docs: Making collaboration fast, Google Drive Blog, 2010-09-23. ↩ ↩²
Discussion thread with Notion engineers: You don’t need a CRDT to build a collaborative experience — Hacker News. At the time, Notion used server-mediated last-write-wins per block and described pure CRDTs as a future direction. ↩
Canvas, meet code: building Figma’s code layers, Figma engineering blog. The launch announcement Make your site interactive with code layers is dated 2025-06-17. ↩
Joseph Gentle and Martin Kleppmann, Collaborative text editing with Eg-walker: better, faster, smaller, EuroSys ‘25 (ACM DOI). ↩
Chengzheng Sun et al., Real Differences between OT and CRDT under a General Transformation Framework for Consistency Maintenance in Co-Editors, PACMHCI 2020. The most thorough side-by-side, framing both as instances of the same transformation framework rather than rival paradigms. ↩
Joseph Gentle, I was wrong. CRDTs are the future, 2020. The author of ShareJS recanting his long-standing CRDT skepticism after Yjs/Automerge benchmarks closed the speed and size gaps. ↩
Automerge binary document format specification and Introducing Automerge 2.0. Columnar encoding plus DEFLATE on change chunks gets the on-disk overhead close to the raw text size. ↩
Geoffrey Litt, Sarah Lim, Martin Kleppmann, Peter van Hardenberg, Peritext: A CRDT for Collaborative Rich Text Editing, PACMHCI 2022. Demonstrates that CRDTs can preserve rich-text formatting intent across concurrent edits without falling back to OT. ↩
Yjs y-protocols PROTOCOL.md and Awareness API docs. Awareness is a separate state-based CRDT layered on top of the document sync protocol; clients are dropped after ~30 seconds without an update. ↩
Quill Delta — quilljs.com. Delta is the OT-friendly operation format used by Quill and a number of downstream editors. ↩
ProseMirror collaborative editing guide. The first-party prosemirror-collab plugin handles step-rebasing in the same shape as classic OT. ↩
getrlimit(2) / setrlimit(2) — Linux manual page. The default soft RLIMIT_NOFILE is conservative; raising it to ~65k is standard practice for WebSocket gateways, and beyond that you need port-range tuning and additional tweaks. ↩