Design a Rich Text Editor

Building a rich text editor for web applications requires choosing between fundamentally different document models, input handling strategies, and collaboration architectures. This article covers contentEditable vs custom rendering trade-offs, the document models behind ProseMirror, Slate, Lexical, and Quill, transaction-based state management, browser input handling (Input Events Level 2, IME), collaboration patterns (OT vs CRDT), virtualization for large documents, and accessibility requirements.

Rich text editor architecture: input events flow through the transaction engine, updating immutable state that the DOM reconciler renders. Collaboration layers intercept transactions for sync.

Abstract

Rich text editors reduce to three core design decisions:

Document model: Linear operations (Quill Delta) vs hierarchical node trees (ProseMirror, Slate). Linear models are simpler but limit nesting; hierarchical models enable complex structures but require careful normalization.
Rendering strategy: contentEditable delegates to browser but inherits inconsistencies; custom rendering (Lexical, modern Slate) gives precise control at the cost of reimplementing selection, IME, and accessibility.
Collaboration architecture: Operational Transform (OT) requires a central server for ordering but is proven at Google Docs scale; CRDTs (Conflict-free Replicated Data Types) enable peer-to-peer sync and offline-first but carry metadata overhead.

The decision matrix:

Factor	Linear + contentEditable	Hierarchical + Custom
Implementation complexity	Low	High
Browser consistency	Poor	Excellent
Complex nesting (tables, nested lists)	Limited	Full support
Collaboration integration	OT-friendly	OT or CRDT
Bundle size	~15KB (Quill)	~22KB (Lexical), ~40KB (ProseMirror)

The Challenge

Why Rich Text Is Hard

Browser text editing was designed for simple forms, not document authoring. The challenges:

contentEditable inconsistency: Different browsers handle identical operations differently. Pressing Backspace at a link boundary deletes the link in IE but only the character in Firefox.
IME complexity: Input Method Editors for CJK (Chinese, Japanese, Korean) languages compose characters before committing—editors must handle composition events without corrupting state.
Selection edge cases: Cursor positioning at block boundaries, triple-click behavior, and drag-selection across nested structures vary across browsers.
Undo/redo semantics: What constitutes a single “action” for undo? A keystroke? A word? A formatting change?

Browser Constraints

Constraint	Impact	Mitigation
Main thread budget (16ms for 60fps)	Heavy DOM operations block typing	Transaction batching, incremental rendering
DOM mutation overhead	Frequent updates cause layout thrashing	Virtual DOM or custom reconciler
Selection API limitations	Can’t programmatically set selection in some states	Shadow selection tracking
`execCommand` deprecation	No standard API for formatting	Custom command implementations

Scale Factors

Factor	Small Scale	Large Scale
Document size	< 1,000 words	> 100,000 words
Concurrent editors	1-2	50+
Nesting depth	2-3 levels	Unlimited (outliners, nested tables)
Update frequency	< 1/sec	> 10/sec per user

Large-scale documents require virtualization (rendering only visible blocks) and efficient position mapping. High concurrency demands robust conflict resolution.

Design Paths

Path 1: contentEditable with Thin Abstraction

Architecture:

1
[Browser contentEditable] → [Mutation Observer] → [Document Sync] → [State]

How it works:

Leverage browser’s native editing. Observe mutations, extract operations, update internal state. Quill exemplifies this approach.


2 collapsed lines
1
// Quill Delta: linear sequence of operations
2
import Quill from "quill"
3

4
const delta = {
5
  ops: [{ insert: "Hello " }, { insert: "World", attributes: { bold: true } }, { insert: "\n" }],
6
}
7

8
// Apply formatting: retain 6, apply bold to next 5 chars
9
const formatBold = {
10
  ops: [{ retain: 6 }, { retain: 5, attributes: { bold: true } }],
11
}

Best for:

Blog post editors, comment systems
Teams without deep frontend expertise
Projects prioritizing time-to-market

Device/network profile:

Works well on: All devices (native browser behavior)
Struggles on: Complex documents with nested structures

Implementation complexity:

Aspect	Effort
Initial setup	Low
Basic formatting	Low
Complex nesting	High (fighting contentEditable)
Collaboration	Medium (Delta is OT-friendly)

Real-world example:

Quill powers Slack’s message composer. Linear Delta format suits chat where messages are primarily inline text with occasional formatting.

Trade-offs:

✅ Native typing feel, spell check, IME support
✅ Small bundle (~15KB)
✅ Fast initial render
❌ Browser inconsistencies leak through
❌ Complex structures (tables, nested lists) require workarounds
❌ Limited control over selection behavior

Path 2: Hierarchical Model with Controlled contentEditable

Architecture:

1
[Schema] → [Immutable State] → [Transaction] → [New State] → [DOM Sync]
2
                                    ↑
3
                              [contentEditable events]

How it works:

Define a schema specifying valid document structures. All changes flow through transactions that produce new immutable state. A reconciler syncs state to DOM. ProseMirror and Tiptap use this approach.


4 collapsed lines
1
import { Schema, NodeSpec, MarkSpec } from "prosemirror-model"
2

3
// Schema defines valid document structure
4
const nodes: Record<string, NodeSpec> = {
5
  doc: { content: "block+" },
6
  paragraph: {
7
    content: "inline*",
8
    group: "block",
9
    parseDOM: [{ tag: "p" }],
10
    toDOM: () => ["p", 0],
11
  },
12
  heading: {
13
    attrs: { level: { default: 1 } },
14
    content: "inline*",
15
    group: "block",
16
    parseDOM: [1, 2, 3, 4, 5, 6].map((level) => ({
17
      tag: `h${level}`,
18
      attrs: { level },
19
    })),
20
    toDOM: (node) => [`h${node.attrs.level}`, 0],
21
  },
22
  text: { group: "inline" },
23
}
24

6 collapsed lines
25
const marks: Record<string, MarkSpec> = {
26
  bold: {
27
    parseDOM: [{ tag: "strong" }, { tag: "b" }],
28
    toDOM: () => ["strong", 0],
29
  },
30
  link: {
31
    attrs: { href: {} },
32
    parseDOM: [{ tag: "a[href]", getAttrs: (dom) => ({ href: (dom as HTMLElement).getAttribute("href") }) }],
33
    toDOM: (node) => ["a", { href: node.attrs.href }, 0],
34
  },
35
}
36

37
const schema = new Schema({ nodes, marks })

Transaction-based state management:


3 collapsed lines
1
import { EditorState, Transaction } from "prosemirror-state"
2
import { toggleMark } from "prosemirror-commands"
3

4
// Every change creates a transaction
5
function applyBold(state: EditorState): Transaction {
6
  const { from, to } = state.selection
7
  const tr = state.tr.addMark(from, to, schema.marks.bold.create())
8
  return tr
9
}
10

11
// State is immutable—applying transaction creates new state
12
const newState = state.apply(transaction)

Best for:

CMS content editors, documentation tools
Applications requiring custom block types
Collaborative editing with complex structures

Device/network profile:

Works well on: Desktop, modern mobile browsers
Struggles on: Low-memory devices with very large documents (>50K words)

Implementation complexity:

Aspect	Effort
Initial setup	Medium
Schema definition	Medium
Custom node types	Medium
Collaboration (via prosemirror-collab)	Medium
Undo/redo	Low (built-in)

Real-world example:

The New York Times uses ProseMirror (via Tiptap wrapper) for article authoring. The schema enforces editorial structure while allowing rich embedded content.

Trade-offs:

✅ Consistent behavior across browsers
✅ Schema enforces valid structures
✅ Transaction history enables time-travel debugging
✅ Mature collaboration support (prosemirror-collab)
❌ Larger bundle (~40KB core)
❌ Steeper learning curve
❌ Still relies on contentEditable for text input

Path 3: Fully Custom Rendering

Architecture:

1
[Hidden Input] → [Editor State] → [Custom DOM Reconciler] → [Rendered View]
2
                      ↑
3
              [Virtual Selection]

How it works:

Abandon contentEditable entirely. Capture input via hidden textarea or synthetic event handling. Maintain virtual selection state. Render document with complete control. Lexical (Meta) and modern Slate versions use this approach.


4 collapsed lines
1
import { $getRoot, $createParagraphNode, $createTextNode } from "lexical"
2
import { LexicalEditor } from "lexical"
3

4
// Lexical: All mutations happen in update callbacks
5
editor.update(() => {
6
  const root = $getRoot()
7
  const paragraph = $createParagraphNode()
8
  const text = $createTextNode("Hello World")
9

10
  paragraph.append(text)
11
  root.append(paragraph)
12
})
13

14
// Read state in read callbacks
15
editor.getEditorState().read(() => {
16
  const root = $getRoot()
17
  const textContent = root.getTextContent()
18
  console.log(textContent) // "Hello World"
19
})

The $ convention:

Lexical uses $-prefixed functions (like $getRoot(), $getSelection()) that only work inside editor.update() or editor.read() callbacks. This is similar to React Hooks—calling them outside the proper context throws an error.

Best for:

Applications requiring pixel-perfect rendering
Complex interactive features (mentions, embeds)
Performance-critical large documents

Device/network profile:

Works well on: All devices with proper optimization
Requires: Careful IME handling, custom accessibility implementation

Implementation complexity:

Aspect	Effort
Initial setup	High
IME support	High
Accessibility	High (must implement ARIA manually)
Performance optimization	Medium (framework handles reconciliation)
Collaboration	Medium (integrates with Yjs)

Real-world example:

Meta uses Lexical for Facebook and Instagram post composers. Custom rendering enables consistent behavior across their massive user base with varying browser versions.

Trade-offs:

✅ Complete control over rendering
✅ Consistent cross-browser behavior
✅ Smaller bundle than ProseMirror (~22KB)
✅ Excellent performance characteristics
❌ Must handle IME composition manually
❌ Accessibility requires explicit implementation
❌ Native spell check integration is complex

Decision Matrix

Factor	Quill (contentEditable)	ProseMirror (Controlled)	Lexical (Custom)
Bundle size	15KB	40KB	22KB
Browser consistency	Poor	Good	Excellent
Complex structures	Limited	Full	Full
IME handling	Native	Native	Manual
Accessibility	Native	Native + ARIA	Manual
Learning curve	Low	Medium	Medium-High
Collaboration	OT (Delta)	OT (prosemirror-collab)	CRDT (Yjs)
Extensibility	Plugins	Schema + Plugins	Nodes + Commands

Document Models Deep Dive

ProseMirror: Schema-Driven Node Trees

ProseMirror’s document model is a tree of nodes, where the schema strictly defines what structures are valid.

Key concepts:

Nodes: Block-level (paragraph, heading) or inline (text). Block nodes contain other nodes; inline nodes contain marks.
Marks: Annotations on inline content (bold, italic, link). Unlike nested DOM elements, marks are flat—text can have multiple marks without creating nested spans.
Positions: Dual indexing system. Tree navigation for structure, flat token sequences for efficient position mapping.

1
// Position mapping: critical for collaboration
2
// Document: <p>Hello</p><p>World</p>
3
// Positions: 0=before doc, 1=before "Hello", 6=after "Hello", 7=between paragraphs...
4

5
// When inserting at position 3, all positions ≥3 shift
6
import { Mapping, StepMap } from "prosemirror-transform"
7

8
const map = new StepMap([3, 0, 5]) // At pos 3, delete 0, insert 5
9
const newPos = map.map(10) // Position 10 becomes 15

Why flat marks instead of nested elements?

Consider bold and italic text: **_text_**. DOM would nest text. ProseMirror stores text with marks [bold, italic]. This simplifies:

Toggling formats (no tree restructuring)
Overlapping ranges (partial bold + partial italic)
Serialization (marks are metadata, not structure)

Slate.js: Schema-Less Flexibility

Slate takes the opposite approach—no schema by default. The document is a recursive tree, and you enforce structure through normalization.

Core principles:


3 collapsed lines
1
import { Editor, Transforms, Element, Node } from "slate"
2

3
// Custom normalizer: ensure paragraphs contain only text/inline elements
4
const withParagraphsNormalized = (editor: Editor): Editor => {
5
  const { normalizeNode } = editor
6

7
  editor.normalizeNode = ([node, path]) => {
8
    if (Element.isElement(node) && node.type === "paragraph") {
9
      for (const [child, childPath] of Node.children(editor, path)) {
10
        // If paragraph contains a block element, unwrap it
11
        if (Element.isElement(child) && !editor.isInline(child)) {
12
          Transforms.unwrapNodes(editor, { at: childPath })
13
          return // Normalization is recursive—return after one fix
14
        }
15
      }
16
    }
17
    normalizeNode([node, path])
18
  }
19

20
  return editor
21
}

Why return after one fix?

Slate’s normalization is iterative. After fixing one issue, the node is “dirty” again and will be re-normalized. This prevents infinite loops and ensures each fix is atomic.

Quill Delta: Linear Simplicity

Delta represents documents as a sequence of operations applied to an empty document.

1
// Document: "Hello World" with "World" bold
2
const doc = {
3
  ops: [{ insert: "Hello " }, { insert: "World", attributes: { bold: true } }, { insert: "\n" }],
4
}
5

6
// Edit: Insert "Beautiful " before "World"
7
const edit = {
8
  ops: [
9
    { retain: 6 }, // Keep "Hello "
10
    { insert: "Beautiful " }, // Insert new text
11
  ],
12
}
13

14
// Composed result: "Hello Beautiful World\n"

Operational Transform compatibility:

Delta’s linear structure makes OT transformation straightforward. When two users edit concurrently:

1
User A: retain 6, insert "X"       → "Hello XWorld"
2
User B: retain 6, insert "Y"       → "Hello YWorld"
3

4
Transform A against B: retain 7, insert "X"  → "Hello YXWorld"
5
Transform B against A: retain 6, insert "Y"  → "Hello XYWorld" (wrong!)

Delta includes tie-breaking rules (typically: earlier operation wins position, later operation follows).

Lexical: Commands and Nodes

Lexical separates concerns into:

Nodes: Define structure (ParagraphNode, TextNode, custom nodes)
Commands: Define operations (FORMAT_TEXT_COMMAND, INSERT_PARAGRAPH_COMMAND)
Listeners: React to state changes


5 collapsed lines
1
import {
2
  DecoratorNode,
3
  LexicalNode,
4
  NodeKey,
5
  SerializedLexicalNode,
6
  Spread
7
} from 'lexical';
8

9
type SerializedMentionNode = Spread<
10
  { userId: string; displayName: string },
11
  SerializedLexicalNode
12
>;
13

14
export class MentionNode extends DecoratorNode<JSX.Element> {
15
  __userId: string;
16
  __displayName: string;
17

18
  static getType(): string {
19
    return 'mention';
20
  }
21

22
  static clone(node: MentionNode): MentionNode {
23
    return new MentionNode(node.__userId, node.__displayName, node.__key);
24
  }
25

26
  constructor(userId: string, displayName: string, key?: NodeKey) {
27
    super(key);
28
    this.__userId = userId;
29
    this.__displayName = displayName;
30
  }
31

32
  createDOM(): HTMLElement {
33
    const span = document.createElement('span');
34
    span.className = 'mention';
16 collapsed lines
35
    return span;
36
  }
37

38
  decorate(): JSX.Element {
39
    return <MentionComponent userId={this.__userId} name={this.__displayName} />;
40
  }
41

42
  static importJSON(serialized: SerializedMentionNode): MentionNode {
43
    return new MentionNode(serialized.userId, serialized.displayName);
44
  }
45

46
  exportJSON(): SerializedMentionNode {
47
    return {
48
      type: 'mention',
49
      version: 1,
50
      userId: this.__userId,
51
      displayName: this.__displayName
52
    };
53
  }
54
}

Input Handling

Input Events Level 2

The beforeinput event (W3C Input Events Level 2) fires before the browser modifies the DOM. This is the interception point for custom editors.

1
element.addEventListener("beforeinput", (e: InputEvent) => {
2
  // inputType describes the editing action
3
  switch (e.inputType) {
4
    case "insertText":
5
      e.preventDefault()
6
      insertText(e.data!) // e.data contains the text
7
      break
8

9
    case "insertParagraph":
10
      e.preventDefault()
11
      insertParagraph()
12
      break
13

14
    case "deleteContentBackward": // Backspace
15
      e.preventDefault()
16
      deleteBackward()
17
      break
18

19
    case "insertFromPaste":
20
      e.preventDefault()
21
      const html = e.dataTransfer?.getData("text/html")
22
      const text = e.dataTransfer?.getData("text/plain")
23
      handlePaste(html || text)
24
      break
25

26
    case "insertCompositionText":
27
      // IME composition—cannot preventDefault during composition
28
      // Handle in compositionend instead
29
      break
30
  }
31
})

Key inputType values:

inputType	Trigger	Cancelable
`insertText`	Typing	Yes
`insertParagraph`	Enter	Yes
`insertLineBreak`	Shift+Enter	Yes
`deleteContentBackward`	Backspace	Yes
`deleteContentForward`	Delete	Yes
`insertFromPaste`	Ctrl+V	Yes
`insertCompositionText`	IME	No (during composition)
`historyUndo`	Ctrl+Z	Yes
`historyRedo`	Ctrl+Y	Yes

IME Composition Handling

Input Method Editors (IME) for CJK languages compose characters before committing. During composition, the text is provisional.

1
let isComposing = false
2

3
element.addEventListener("compositionstart", () => {
4
  isComposing = true
5
  // Store current selection for restoration if composition is cancelled
6
})
7

8
element.addEventListener("compositionupdate", (e: CompositionEvent) => {
9
  // e.data contains the current composition string
10
  // Update preview but don't commit to document state
11
  showCompositionPreview(e.data)
12
})
13

14
element.addEventListener("compositionend", (e: CompositionEvent) => {
15
  isComposing = false
16
  // e.data contains the final committed text
17
  commitText(e.data)
18
  clearCompositionPreview()
19
})
20

21
// In beforeinput handler:
22
element.addEventListener("beforeinput", (e: InputEvent) => {
23
  if (e.inputType === "insertCompositionText") {
24
    // Cannot preventDefault during IME composition
25
    // Let the browser handle it, sync state in compositionend
26
    return
27
  }
28
  // Handle other input types...
29
})

Why can’t we preventDefault IME input?

The IME is a system component outside browser control. Preventing default would break the composition UI. Instead, editors must:

Allow composition to proceed
Track composition state
Sync internal state only when composition ends

Selection and Range APIs

1
// Get current selection
2
const selection = window.getSelection()
3
if (!selection || selection.rangeCount === 0) return
4

5
const range = selection.getRangeAt(0)
6

7
// Range boundaries
8
const { startContainer, startOffset, endContainer, endOffset } = range
9

10
// Check if selection is collapsed (cursor, no selection)
11
const isCollapsed = range.collapsed
12

13
// Get bounding rect for positioning UI (e.g., formatting toolbar)
14
const rect = range.getBoundingClientRect()
15
positionToolbar(rect.left, rect.top - 40)
16

17
// Programmatically set selection
18
const newRange = document.createRange()
19
newRange.setStart(textNode, 5)
20
newRange.setEnd(textNode, 10)
21
selection.removeAllRanges()
22
selection.addRange(newRange)

Edge case: Selection across block boundaries

When selection spans multiple blocks (e.g., from paragraph 1 to paragraph 3), range.commonAncestorContainer is their common parent. Iterating selected content requires walking the tree.

Collaboration Architectures

Operational Transform (OT)

OT transforms concurrent operations to maintain consistency. A central server determines operation order.

1
Timeline:
2
  Server: [v0] -----> [v1] -----> [v2]
3
                ↑           ↑
4
  Client A: op_a ----→      transform(op_a, op_b)
5
  Client B:      op_b ----→ transform(op_b, op_a)

Transform function example:

1
type Op = { type: "insert"; pos: number; text: string } | { type: "delete"; pos: number; len: number }
2

3
function transform(op1: Op, op2: Op): Op {
4
  // Transform op1 assuming op2 has already been applied
5
  if (op1.type === "insert" && op2.type === "insert") {
6
    if (op1.pos <= op2.pos) {
7
      return op1 // op1 is before op2, no change
8
    }
9
    return { ...op1, pos: op1.pos + op2.text.length }
10
  }
11

12
  if (op1.type === "insert" && op2.type === "delete") {
13
    if (op1.pos <= op2.pos) {
14
      return op1
15
    }
16
    if (op1.pos >= op2.pos + op2.len) {
17
      return { ...op1, pos: op1.pos - op2.len }
18
    }
19
    // Insert is within deleted range—complex case
20
    return { ...op1, pos: op2.pos }
21
  }
22

23
  // ... handle all operation type combinations
24
  return op1
25
}

ProseMirror’s simplification:

Instead of transforming operations against operations, ProseMirror transforms positions through position maps:


2 collapsed lines
1
import { collab, sendableSteps, receiveTransaction } from "prosemirror-collab"
2

3
// Server sends: { version: 5, steps: [...], clientIDs: [...] }
4

5
// Apply remote steps
6
const tr = receiveTransaction(state, steps, clientIDs)
7
const newState = state.apply(tr)
8

9
// Send local steps
10
const sendable = sendableSteps(state)
11
if (sendable) {
12
  socket.send({
13
    version: sendable.version,
14
    steps: sendable.steps.map((s) => s.toJSON()),
15
    clientID: sendable.clientID,
16
  })
17
}

OT trade-offs:

✅ Proven at massive scale (Google Docs: hundreds of millions of users)
✅ Predictable ordering (server is authority)
✅ Efficient for real-time collaboration
❌ Central server is required and is a bottleneck
❌ Transform functions are complex and error-prone
❌ Offline editing requires careful conflict resolution on reconnect

CRDTs (Conflict-free Replicated Data Types)

CRDTs embed metadata in the data structure itself, enabling automatic conflict-free merging without coordination.


4 collapsed lines
1
import * as Y from "yjs"
2
import { yXmlFragment } from "y-prosemirror"
3
import { WebsocketProvider } from "y-websocket"
4

5
// Create shared document
6
const ydoc = new Y.Doc()
7

8
// Create WebSocket provider for sync
9
const provider = new WebsocketProvider("wss://your-server.com", "document-room", ydoc)
10

11
// Get shared XML fragment for ProseMirror
12
const yXmlFragment = ydoc.getXmlFragment("prosemirror")
13

14
// Use in ProseMirror
15
import { ySyncPlugin, yCursorPlugin, yUndoPlugin } from "y-prosemirror"
16

17
const plugins = [ySyncPlugin(yXmlFragment), yCursorPlugin(provider.awareness), yUndoPlugin()]

How Yjs handles text:

Yjs uses YATA (Yet Another Transformation Approach), which assigns unique IDs to each character:

1
"Hello" → [(id1, 'H'), (id2, 'e'), (id3, 'l'), (id4, 'l'), (id5, 'o')]
2

3
User A inserts 'X' after 'l' (id3):
4
  → New item: (id6, 'X', parent: id3)
5

6
User B deletes 'l' (id4):
7
  → Mark id4 as deleted (tombstone)
8

9
Merge: Both operations apply. 'X' appears after first 'l', second 'l' is deleted.
10
Result: "HelXo"

CRDT trade-offs:

✅ No central server required (peer-to-peer possible)
✅ Native offline support—operations merge automatically
✅ Simpler mental model (no transform functions)
✅ Scales theoretically unlimited (no coordination needed)
❌ Metadata overhead (unique IDs for every character)
❌ Tombstones accumulate (deleted items leave markers)
❌ Some conflict resolutions are arbitrary (concurrent inserts at same position)

Real-World Collaboration Implementations

Product	Approach	Reason
Google Docs	OT	Proven at scale, team expertise
Figma	CRDT (migrated from OT)	OT was “overkill” for design objects
Notion	OT-like (proprietary)	Block-based operations are simpler than text OT
Linear	Yjs (CRDT)	Offline-first architecture

Figma’s CTO wrote: “For text, we still use OT because character-by-character sync benefits from its precision. For design objects, CRDT’s simpler model is sufficient.”

Performance Optimization

Virtualization for Large Documents

Documents with 100K+ words need virtualization—render only visible blocks.


5 collapsed lines
1
import { useEffect, useState, useRef } from "react"
2

3
interface Block {
4
  id: string
5
  type: string
6
  content: string
7
}
8

9
function useVirtualBlocks(blocks: Block[], containerRef: React.RefObject<HTMLElement>) {
10
  const [visibleRange, setVisibleRange] = useState({ start: 0, end: 20 })
11

12
  useEffect(() => {
13
    const container = containerRef.current
14
    if (!container) return
15

16
    const observer = new IntersectionObserver(
17
      (entries) => {
18
        // Update visible range based on intersecting blocks
19
        const visible = entries.filter((e) => e.isIntersecting).map((e) => parseInt(e.target.dataset.index!, 10))
20

21
        if (visible.length > 0) {
22
          setVisibleRange({
23
            start: Math.max(0, Math.min(...visible) - 5),
24
            end: Math.min(blocks.length, Math.max(...visible) + 5),
25
          })
26
        }
27
      },
28
      { root: container, threshold: 0 },
29
    )
30

31
    // Observe sentinel elements at block positions
32
    return () => observer.disconnect()
33
  }, [blocks.length])
34

35
  return blocks.slice(visibleRange.start, visibleRange.end)
36
}

Notion’s approach:

Notion renders blocks in visible viewport plus buffer. Each block type has an estimated height. On render, actual height is measured and cached. Trade-off accepted: small scroll position jumps when estimates are wrong.

Slate.js Chunking

Slate can split large node children into memoized “chunks” for 10x rendering speedup:


4 collapsed lines
1
import { Editor, Element, Node } from "slate"
2

3
// Enable chunking for large lists
4
const withChunking = (editor: Editor): Editor => {
5
  editor.getChunkSize = (node: Node) => {
6
    if (Element.isElement(node) && node.children.length > 100) {
7
      return 50 // Chunk into groups of 50
8
    }
9
    return 0 // No chunking for small nodes
10
  }
11
  return editor
12
}

Combine with CSS content-visibility: auto to skip painting off-screen chunks.

Memory Management

For very large documents:

Technique	Use Case	Trade-off
Piece tables	Text-heavy documents	Complexity for queries
Lazy node loading	Outline views	Latency on expand
WeakMap caches	Computed values	GC unpredictability
LRU eviction	Undo history	Lost undo steps

Accessibility

ARIA Requirements

Rich text editors must announce their role and state to assistive technologies:

1
<div role="textbox" aria-multiline="true" aria-label="Document editor" contenteditable="true">
2
  <!-- Content -->
3
</div>

For custom rendering (non-contentEditable):

1
<!-- Hidden textarea for screen reader announcements -->
2
<textarea aria-label="Document editor" class="sr-only" readonly></textarea>
3

4
<!-- Visual rendering -->
5
<div role="application" aria-label="Rich text editor">
6
  <div role="document">
7
    <!-- Rendered blocks with aria-* attributes -->
8
  </div>
9
</div>

Required keyboard support:

Key	Action
Arrow keys	Move cursor
Shift + Arrow	Extend selection
Ctrl/Cmd + A	Select all
Ctrl/Cmd + Z/Y	Undo/redo
Ctrl/Cmd + B/I/U	Bold/italic/underline
Tab	Indent (in lists) or focus next element
Escape	Exit formatting mode or nested block

Test with actual screen readers:

Screen Reader	Browser	Platform
NVDA	Firefox, Chrome	Windows
JAWS	Chrome, Edge	Windows
VoiceOver	Safari	macOS, iOS
TalkBack	Chrome	Android

Known issue: JAWS 17 with Firefox doesn’t recognize some contentEditable structures as editable. Always test combinations.

Common Pitfalls

1. Fighting contentEditable

Problem: Trying to enforce structure in contentEditable that browsers don’t support.

Symptom: Pressing Enter in a heading creates <div> instead of . Backspace at start of list item behaves inconsistently.

Solution: Use a framework that abstracts contentEditable (ProseMirror, Tiptap) or abandon it entirely (Lexical).

2. Ignoring IME

Problem: Editor works perfectly with English but corrupts Chinese/Japanese/Korean input.

Symptom: Characters appear doubled, composition UI flickers, or text disappears.

Solution: Handle compositionstart/compositionend events. Don’t preventDefault on insertCompositionText.

3. Undo Granularity

Problem: Each character is a separate undo step, making undo tedious.

Solution: Debounce undo checkpoints. Group consecutive typing into single undo entries. Separate formatting changes from content changes.

1
let undoTimeout: number | null = null
2

3
function onInput() {
4
  if (undoTimeout) clearTimeout(undoTimeout)
5

6
  // Don't create undo entry yet
7
  applyChangeWithoutUndoEntry()
8

9
  // Create undo entry after 500ms of no input
10
  undoTimeout = window.setTimeout(() => {
11
    createUndoEntry()
12
    undoTimeout = null
13
  }, 500)
14
}

4. Memory Leaks in Collaboration

Problem: Real-time sync accumulates state without cleanup.

Symptom: Memory usage grows over long editing sessions.

Solution:

Garbage collect CRDT tombstones periodically
Limit undo history depth
Disconnect inactive presence cursors

5. Selection Restoration After Async Operations

Problem: Async operations (paste processing, mention lookup) lose cursor position.

Solution: Capture selection before async operation, restore after:

1
async function handlePaste(html: string) {
2
  // Capture current selection
3
  const savedSelection = editor.selection
4

5
  // Async processing
6
  const processed = await processHtml(html)
7

8
  // Restore selection before applying
9
  editor.withoutNormalizing(() => {
10
    if (savedSelection) {
11
      Transforms.select(editor, savedSelection)
12
    }
13
    Transforms.insertFragment(editor, processed)
14
  })
15
}

Real-World Examples

Notion: Block-Based Editor

Challenge: Pages with 10K+ blocks, each potentially different height.

Architecture:

Everything is a block (text, images, databases, embeds)
UUID v4 for unique IDs
Parent pointers enable permission inheritance
Content arrays enable nesting

Performance strategy:

Render visible viewport + buffer
Estimate heights by block type
Measure and cache on render

Outcome: < 100ms Time to Interactive (TTI) for most pages.

Linear: Offline-First Issue Tracker

Challenge: Real-time sync with offline support.

Architecture:

Yjs CRDT for all content
Local-first: changes apply immediately to local state
Sync happens opportunistically when online

Stack: Yjs + IndexedDB (local) + WebSocket (sync)

Trade-off accepted: Occasional merge conflicts in descriptions—acceptable for issue tracking where conflicts are rare.

Google Docs: OT at Scale

Challenge: Hundreds of millions of concurrent users.

Architecture:

Operational Transform with central server
Optimistic local application (user sees changes instantly)
Server transforms and broadcasts to other clients

Why not CRDT?: OT was mature when Docs was built. The team had deep OT expertise. At their scale, metadata overhead of CRDTs would be significant.

Conclusion

Rich text editor design comes down to matching constraints to trade-offs:

For simple content with fast time-to-market: Use Quill. Accept browser inconsistencies.
For complex structures with collaboration: Use ProseMirror/Tiptap. The schema enforces validity; prosemirror-collab handles sync.
For pixel-perfect control and performance: Use Lexical. Accept the cost of implementing IME and accessibility manually.
For offline-first applications: Use Yjs with any editor. CRDT handles merge automatically.

No editor framework solves all problems. The best choice depends on your document complexity, collaboration requirements, and team expertise.

Appendix

Prerequisites

DOM APIs (Selection, Range, MutationObserver)
Event handling (beforeinput, composition events)
React or framework fundamentals (for ProseMirror/Lexical/Slate)
Basic understanding of distributed systems (for collaboration)

Terminology

Term	Definition
contentEditable	Browser attribute enabling native text editing in any element
IME	Input Method Editor—system component for composing characters in CJK and other languages
OT	Operational Transform—algorithm for transforming concurrent operations to maintain consistency
CRDT	Conflict-free Replicated Data Type—data structure enabling automatic merge without coordination
Transaction	Atomic unit of change in ProseMirror/Lexical; contains operations and metadata
Schema	Definition of valid document structure (nodes, marks, constraints)
Mark	Inline annotation (bold, italic, link) applied to text without changing structure
Node	Structural element in document tree (paragraph, heading, list item)
Position mapping	Translating document positions across changes (critical for collaboration)
Tombstone	Marker for deleted items in CRDTs (enables merge but accumulates memory)

Summary

Document models: Linear (Quill Delta) is OT-friendly but limits nesting; hierarchical (ProseMirror, Slate, Lexical) enables complex structures.
Rendering: contentEditable provides native behavior but inconsistently; custom rendering gives control but requires reimplementing selection and IME.
Collaboration: OT is proven at scale with central server; CRDT enables peer-to-peer and offline-first but has metadata overhead.
Performance: Virtualization is essential for large documents; chunking and memoization reduce React/DOM overhead.
Accessibility: ARIA attributes are required; screen reader testing must cover NVDA, JAWS, and VoiceOver.

References

W3C Input Events Level 2 Specification - Authoritative reference for beforeinput event
W3C contentEditable Specification - Current state and known issues
ProseMirror Guide - Marijn Haverbeke’s comprehensive documentation
ProseMirror Collaborative Editing - Position mapping approach to collaboration
Collaborative Editing in ProseMirror - Marijn Haverbeke’s design rationale
Lexical Documentation - Meta’s editor framework
Slate.js Documentation - Plugin-first architecture
Quill Delta Format - Linear operation format
Yjs Documentation - High-performance CRDT library
Notion’s Data Model - Block-based architecture
CKEditor: Lessons from Real-Time Collaboration - Production implementation insights
Why contentEditable Is Terrible - Medium engineering’s experience
MDN: Selection API - Browser selection handling
MDN: beforeinput Event - Input event reference

Read more