Critical Rendering Path
12 min read

Critical Rendering Path: Draw

The Draw stage is the final phase of the browser’s rendering pipeline. The Viz process (Visuals) in Chromium takes abstract compositor frames—consisting of render passes and draw quads—and translates them into low-level GPU commands to produce actual pixels on the display.

Display Hardware

Viz Process (GPU)

Renderer Process (Multi-process)

SubmitFrame

SubmitFrame

Aggregated Frame

GL/Vulkan/Metal Commands

Compositor Frame 1

Compositor Frame 2

FrameSink Manager

Surface Aggregator

Skia / Graphite Engine

GPU Driver Interface

Swap Chain / Overlays

Display Controller

Physical Screen

The Viz architecture: Aggregating frames from multiple renderer processes and translating them into hardware-accelerated drawing commands.

The Draw stage exists to solve a fundamental problem: multiple isolated renderer processes (main page, cross-origin iframes, browser UI) each produce independent compositor frames, but the display requires a single, coherent image.

Single Output

Viz Process

Frame Sources

Browser UI

Renderer: Main Page

Renderer: iframe A

Renderer: iframe B

Surface Aggregator

GPU Commands

Display

Multiple frame sources converge in Viz to produce a single aggregated output.

Core mental model:

  • Aggregation: Surface Aggregator recursively walks surface references, merging frames from all sources into one compositor frame
  • Translation: Abstract draw quads become Skia API calls, which translate to Vulkan/Metal/D3D commands
  • Optimization: Overdraw removal, quad batching, and hardware overlays minimize GPU work
  • Timing: BeginFrame signals from VSync drive the entire pipeline; missed deadlines cause dropped frames

The key insight: Viz operates in a separate GPU process. This isolation means GPU driver crashes don’t take down the browser, and the aggregation logic handles missing frames gracefully—using stale frames or solid colors rather than blocking.


Viz runs in the GPU process and serves as Chromium’s central display compositor. It receives compositor frames from multiple sources and produces the final screen output.

Two-Thread Design:

ThreadResponsibilityWhy Separate
GPU Main ThreadRasterizes display lists into GPU texture tiles; draws compositor framesRasterization and drawing both compete for GPU resources
Display Compositor ThreadAggregates frames from all processes; optimizes compositingPrevents rasterization from blocking frame presentation

This separation prevents a bottleneck where slow rasterization delays frame presentation. The display compositor thread can present frames from the active tree while the GPU main thread rasterizes pending tiles.

Process Isolation Rationale:

Prior to Viz: The GPU process handled both rasterization and compositing in a tightly coupled manner. GPU driver bugs could destabilize the entire browser. Crashes during complex WebGL operations would terminate all tabs.

Modern Viz provides:

  1. Crash isolation: GPU driver crashes terminate only the Viz process; renderers survive and can reconnect
  2. Security boundary: Renderers never access GPU APIs directly—all commands go through Viz
  3. Resource management: Viz controls GPU memory allocation across all tabs, preventing runaway consumption

A single web page often comprises elements from multiple renderer processes. A page with two cross-origin iframes involves three renderer processes, each producing independent compositor frames.

Viz uses Surfaces to manage the frame hierarchy. Each surface represents a compositable unit that can receive frames.

  • SurfaceId: Unique identifier generated by SurfaceManager; used to issue frames or reference other surfaces for embedding
  • FrameSink: Interface through which renderers submit compositor frames to Viz
  • LocalSurfaceId: Monotonically increasing identifier that ensures frames are processed in order

The Surface Aggregator implements a recursive, nearly stateless algorithm:

1. Start with the most recent eligible frame from the display's root surface
2. Iterate through quads in draw order, tracking current clip and transform
3. For each quad:
- If NOT a surface reference → output directly to aggregated frame
- If IS a surface reference:
a. Find the most recent eligible frame for that surface
b. If no frame exists OR cycle detected → skip (use fallback)
c. Otherwise → recursively apply this algorithm
4. Output the aggregated compositor frame

Resilience Pattern: If an iframe’s frame hasn’t arrived, Viz uses a previous frame or solid color. This prevents the entire page from stuttering due to one slow process—critical for maintaining 60fps with site-isolated iframes.

Edge Case—Cycle Detection: Surface references can theoretically form cycles (A embeds B embeds A). The aggregator tracks visited surfaces and breaks cycles by skipping already-visited references.

Memory Pressure: Under memory constraints, Viz may evict old frames from surfaces. When this happens, the surface falls back to a solid color until a new frame arrives.


A compositor frame is not a bitmap—it’s a structured description of what to draw.

CompositorFrame
├── RenderPass (root - drawn last)
│ ├── DrawQuad (TextureDrawQuad: rasterized tile)
│ ├── DrawQuad (SolidColorDrawQuad: background)
│ └── DrawQuad (SurfaceDrawQuad: embedded iframe)
├── RenderPass (effect pass - intermediate texture)
│ └── DrawQuad (content for blur effect)
└── metadata (device scale, damage rect, etc.)

A Render Pass is a set of quads drawn into a target (the screen or an intermediate texture). Multiple passes enable layered effects:

Use CaseWhy Intermediate Pass Required
filter: blur()Must render content to texture, then apply blur kernel
opacity on groupChildren blend with each other, then group blends with background
mix-blend-modeRequires reading pixels from underlying content
Clip on rotated contentNon-axis-aligned clips require stencil/mask operations

Performance Implication: Each render pass adds GPU overhead (texture allocation, state changes, draw calls). Complex CSS effects that require intermediate passes are more expensive than compositor-only transforms.

Quad TypePurposeTypical Source
TextureDrawQuadRasterized tile with position transformTiled layer content
SolidColorDrawQuadColor fill without texture backingBackgrounds, fallbacks
SurfaceDrawQuadReference to another surface by SurfaceIdEmbedded iframes, browser UI
VideoDrawQuadVideo frame (often promoted to hardware overlay)<video> elements
TileDrawQuadSingle tile of a tiled layerLarge scrolling content
RenderPassDrawQuadOutput of another render passEffect layers

Each quad carries:

  • Geometry: Transform matrix, destination rect, clip rect
  • Material properties: Texture ID, color, blend mode
  • Layer information: Sorting order for overlap resolution

Viz translates draw quads into GPU-native commands through Skia.

DrawQuad (abstract)
Skia API calls (SkCanvas::drawRect, drawImage, etc.)
Skia backend (Ganesh or Graphite)
GPU command buffer (Vulkan, Metal, D3D, OpenGL)
GPU driver
Hardware execution

Skia is the cross-platform 2D graphics library used by Chrome and Android. It abstracts GPU API differences:

  • Shader management: Compiles and caches GPU shaders
  • State optimization: Batches state changes to minimize GPU commands
  • Resource handling: Manages textures, buffers, and GPU memory
  • Fallback paths: Provides software rasterization when GPU unavailable

As of 2025, Chrome is transitioning from Ganesh (Skia’s aging OpenGL-centric backend) to Graphite.

Why Graphite exists:

Ganesh accumulated technical debt:

  • Originally designed for OpenGL ES with GL-centric assumptions
  • Too many specialized code paths, making it hard to leverage modern graphics APIs
  • Single-threaded command recording caused bottlenecks
  • Shader compilation during browsing caused “shader jank”

Graphite’s design improvements:

AspectGaneshGraphite
ThreadingSingle-threaded recordingIndependent recorders across multiple threads
OverdrawSoftware occlusion cullingDepth buffer for 2D (hardware-accelerated)
ShadersDynamic compilationPre-compiled at startup; unified pipelines
APIsOpenGL-first, others bolted onMetal/Vulkan/D3D12 native from the start

Performance results (2025):

  • ~15% improvement on MotionMark 1.3 (MacBook Pro M3)
  • Reduced frame drops due to shader compilation
  • Lower power consumption from efficient GPU utilization

Rollout status:

  • macOS (including Apple Silicon): Enabled by default
  • Windows: Testing with Dawn’s D3D11 backend
  • Linux/Android: In development

If a quad is completely obscured by another opaque quad in front of it, Viz skips drawing it entirely.

Why this matters: Mobile devices are memory-bandwidth constrained. Drawing pixels that will be overwritten wastes precious GPU bandwidth. A common case: a full-screen opaque background covers all content below it—drawing that underlying content is pure waste.

Graphite enhancement: Uses GPU depth testing for 2D rendering. Each quad gets a depth value; the GPU’s depth buffer automatically rejects overdraw at the hardware level, eliminating software occlusion calculations.

Drawing 100 small quads separately is expensive due to GPU state change overhead (bind texture, set shader, draw, repeat). Batching combines similar quads into fewer draw calls.

Batching requirements:

  • Same texture (or atlas)
  • Same blend mode
  • Same shader
  • Compatible transforms (can be batched via instancing)

Real-world impact: A page with many small icons benefits dramatically from texture atlasing and batching. Without batching: 100+ draw calls. With batching: potentially 1 draw call.

Viz tracks which screen regions changed since the last frame (the “damage rect”). Unchanged regions may skip redrawing entirely.

Partial swap: Some platforms support SwapBuffersWithDamage, presenting only the damaged region to the compositor. This reduces memory bandwidth for small updates (e.g., blinking cursor).


The most efficient drawing is no drawing at all—at least not in the traditional sense.

Normally, the browser renders to a buffer, and the system compositor (DWM on Windows, CoreAnimation on macOS) composites it with other windows. Direct Scanout bypasses this intermediate step.

How it works:

  1. Browser produces a buffer meeting scanout requirements
  2. Viz passes the buffer handle directly to the display controller
  3. Display controller reads pixels straight from browser’s buffer
  4. No copy through system compositor

Requirements for Direct Scanout:

  • Content pixel-aligned with screen
  • No complex CSS effects (blend modes, non-integer opacity)
  • Buffer format compatible with display hardware
  • Full-screen or in a compatible overlay plane

For specific content types, the display hardware can composite without GPU involvement.

Video overlay path:

Platform decoder (VideoToolbox, MediaFoundation, VA-API)
Platform-specific buffer (IOSurface, DXGI, AHardwareBuffer)
Hardware overlay plane
Display controller composites at scanout time

Benefits:

MetricWithout OverlayWith Overlay
Power (fullscreen video)100%~50% (macOS measurements)
GPU copiesMultiple (decode → texture → composite → present)Zero (decode → overlay)
Latency+1 frame (GPU composite delay)Minimal (direct to display)

Real-world constraint: Overlays require the content to have no CSS effects applied. A <video> with filter: blur(1px) falls back to GPU compositing.

Format requirements: Overlay planes accept specific pixel formats (NV12, P010 for video). Content must match, or the browser falls back to GPU conversion.


The draw stage is strictly bound by the display’s refresh rate through VSync (Vertical Synchronization).

Chromium uses a BeginFrame message to coordinate the entire pipeline:

OS VSync signal (every 16.6ms at 60Hz)
Browser process receives VSync
Browser sends BeginFrame to Viz
Viz sends BeginFrame to compositor threads
Compositor may trigger BeginMainFrame to main thread
Pipeline work must complete before next VSync deadline

Deadline enforcement: If Viz doesn’t receive a compositor frame by the VSync deadline, it presents the previous frame—a “dropped frame” visible as jank.

At 60Hz, each frame has 16.67ms. At 120Hz, only 8.33ms.

StageTypical Budget (60Hz)
Input handling0-2ms
JavaScript0-6ms
Style/Layout0-4ms
Paint/Composite2-4ms
Draw2-4ms
Buffer margin2-4ms

Edge case—variable refresh rate (VRR): Displays supporting FreeSync/G-Sync allow variable frame presentation timing. Viz can present frames as they’re ready rather than waiting for fixed VSync intervals, reducing latency for interactive content.

PlatformVSync SourceNotes
WindowsDWM (Desktop Window Manager)Queries timebase/interval on each SwapBuffers
macOSCVDisplayLinkCallback-driven; integrates with CoreAnimation
LinuxDRM/KMSDirect kernel modesetting
AndroidChoreographerVSync callbacks via NDK; used for BeginFrame coordination

To maintain smoothness despite variable processing times, browsers use triple buffering.

BufferStateDescription
NFrontCurrently displayed on screen
N+1QueuedWaiting for next VSync to become front buffer
N+2BackCurrently being rendered by GPU
AspectDouble BufferingTriple Buffering
LatencyLower (1 frame)Higher (2 frames)
ThroughputLimited by slowest stageDecoupled; render-ahead allowed
JankVSync miss = full frame dropVSync miss = still present a frame
Memory2 × framebuffer3 × framebuffer

Design rationale: Triple buffering allows the GPU to always have work queued. Without it, if frame N+1 isn’t ready at VSync, the GPU idles. Triple buffering means frame N+2 can start immediately, keeping the GPU busy.

Input latency concern: A frame rendered now appears 2 VSync intervals later. At 60Hz, that’s 33ms of inherent latency. High-refresh-rate displays (120Hz+) reduce this to acceptable levels (16ms).


Causes:

  1. Main thread blocked: Long JavaScript task delays BeginMainFrame response
  2. Rasterization backlog: Too many tiles pending; pending tree can’t activate
  3. GPU saturation: Complex shaders or excessive draw calls exceed frame budget
  4. Memory pressure: Tile eviction forces re-rasterization

Detection: Chrome’s Frame Timing API exposes PerformanceFrameTiming entries showing frame presentation times. Gaps larger than expected VSync interval indicate drops.

Occurs when the display reads from a buffer while the GPU is still writing to it.

Prevention: VSync ensures buffer swap only during vertical blanking interval. Direct Scanout requires stricter synchronization—the display controller fence must signal completion before GPU writes to the buffer.

When scrolling reveals un-rasterized tiles, the browser shows a checkerboard pattern (or solid color) as a placeholder.

Mitigation strategies:

  1. Tile priority: Visible tiles rasterize first; off-screen tiles at lower priority
  2. Overscroll buffering: Rasterize beyond visible viewport
  3. Async scroll: Compositor scrolls immediately; main thread catches up

The Draw stage represents the culmination of the RenderingNG pipeline. By decoupling frame production (renderers) from frame presentation (Viz), modern browsers achieve resilience and performance that enables smooth 120Hz experiences even on complex, multi-process web applications.

The key architectural decisions—process isolation for Viz, surface aggregation for multi-process composition, hardware overlays for power efficiency, triple buffering for throughput—each represent deliberate trade-offs optimized for the web’s unique constraints: untrusted content, cross-origin isolation, and the expectation of 60fps on diverse hardware.


TermDefinition
Viz (Visuals)Chromium service responsible for frame aggregation and GPU display
Compositor FrameUnit of data submitted by a renderer to Viz, containing render passes and quads
SurfaceCompositable unit that can receive compositor frames, identified by SurfaceId
FrameSinkInterface through which renderers submit frames to Viz
Draw QuadRectangular drawing primitive (e.g., texture tile, video frame, solid color)
Render PassSet of quads drawn to a target (screen or intermediate texture)
VSyncSynchronization of frame presentation with monitor’s refresh rate
Direct ScanoutOptimization where display reads directly from browser’s buffer
Hardware OverlayDisplay plane that composites at scanout without GPU involvement
SkiaCross-platform 2D graphics library used by Chrome and Android
GraphiteNext-generation Skia backend optimized for modern GPU APIs
  • Viz process aggregates compositor frames from all renderer processes into a single display output
  • Surface Aggregator recursively walks surface references, handling missing frames gracefully
  • Draw quads are abstract primitives translated through Skia to GPU commands
  • Graphite (2025+) replaces Ganesh with multithreaded recording and hardware depth testing
  • Hardware overlays bypass GPU compositing for video, cutting power consumption by ~50%
  • Triple buffering trades latency for throughput, preventing jank from variable processing times
  • VSync coordination via BeginFrame ensures frame presentation aligns with display refresh

Read more