Critical Rendering Path
10 min read

Critical Rendering Path: Rasterization

Rasterization is the process where the browser converts recorded display lists into actual pixels—bitmaps for software raster or GPU textures for hardware-accelerated paths. This stage marks the transition from abstract paint commands to concrete visual data. In Chromium, rasterization is managed by the compositor thread and executed by worker threads, ensuring smooth interactions even when the main thread is saturated with JavaScript or layout work.

Viz Process

Raster Worker Pool

Compositor Thread

Main Thread

Paint Stage

Record Display Lists

Commit

Sync Layer Tree to Pending

Tiling

Split Layers into Tiles

Schedule

Prioritize by Viewport

Activate

Pending → Active Tree

Software Raster

CPU Bitmaps

GPU Raster

Skia → GPU Textures

GPU Texture Storage

Display Compositor

Draw Quads

The Rasterization Pipeline: Display lists flow from the main thread through tiling and scheduling on the compositor thread, then to worker threads for rasterization, with final textures consumed by the Viz process's display compositor.

Rasterization converts display lists into pixels through a tiled, prioritized, multi-threaded architecture designed around three constraints: GPU memory is finite, users scroll faster than pixels can be drawn, and the main thread must remain free for JavaScript.

The Core Model:

  • Tiling: Layers are divided into fixed-size tiles (256×256 px for software, viewport-width × ¼ viewport-height for GPU raster) to bound memory usage and enable incremental rasterization
  • Dual-Tree Architecture: Pending tree rasterizes new content while active tree continues drawing—activation only occurs when tiles are ready, preventing checkerboard artifacts
  • Prioritization: Tiles are binned by urgency (NOW, SOON, EVENTUALLY, NEVER) based on viewport distance and scroll velocity, with GPU memory budget distributed in priority order
  • Out-of-Process Execution: The Viz process owns GPU resources; renderer processes serialize paint commands over IPC (Inter-Process Communication) for security isolation

The Evolution: Chromium is transitioning from Ganesh (OpenGL-centric, single-threaded) to Graphite (Vulkan/Metal/D3D12, multi-threaded with depth testing). Graphite achieves ~15% performance gains through reduced overdraw and pipeline pre-compilation.


Rasterization does not happen all at once for the entire page. The browser employs a tiling system, dual-tree architecture, and priority scheduling to handle documents that may be orders of magnitude larger than available GPU memory.

Once the Paint stage records drawing commands into cc::PaintRecord (a serializable sequence of Skia operations), the compositor thread takes ownership. Rather than rasterizing entire layers, it decomposes them into tiles.

Why Tiling? A long-scrolling page might be 50,000 pixels tall. Rasterizing this into a single texture would:

  • Exceed GPU memory limits (a 4K layer consumes ~33MB of VRAM (Video Random Access Memory))
  • Create massive latency before any pixels appear
  • Waste resources on off-screen content the user may never see

Tile Sizes vary by rasterization mode:

ModeTile DimensionsRationale
Software Raster256×256 pxSmall tiles allow fine-grained priority; worker threads can complete tiles quickly
GPU RasterViewport width × ¼ viewport heightLarger tiles reduce draw call overhead; GPU handles large textures efficiently

Tiles are managed by PictureLayerImpl, which maintains multiple PictureLayerTiling objects at different scale factors. A 512×512 layer at 1:1 scale produces four 256×256 tiles, but the same layer might have a single 256×256 tile at 1:2 scale for lower-resolution fallbacks during fast scrolling.

A critical design decision in Chromium’s compositor is the separation between pending and active layer trees:

  • Pending Tree: Receives new content from commits; tiles are rasterized here
  • Active Tree: Currently being displayed; animations and scrolling read from this tree
  • Recycle Tree: Cached pending tree for allocation reuse

Why Two Trees? Without this separation, activating a new commit would immediately expose partially-rasterized content as “checkerboard” artifacts (gray or white rectangles where tiles haven’t finished). The dual-tree design ensures activation only occurs when the pending tree’s tiles are sufficiently rasterized.

The TileManager controls activation timing. It tracks which tiles are required for the current viewport, which are desirable for smooth scrolling, and which can be deferred. Activation proceeds only when “NOW” priority tiles are ready.

Tiles are assigned to priority bins based on heuristics:

Priority BinCriteriaTreatment
NOWVisible in viewportMust complete before activation
SOONWithin ~1 viewport distanceHigh priority; prevents checkerboard on scroll
EVENTUALLYFurther from viewportRasterized when workers are idle
NEVEROff-screen, no scroll pathSkipped entirely; memory reclaimed

The TileManager distributes the GPU memory budget in priority order. If memory is constrained, lower-priority tiles are evicted or never rasterized. Scroll velocity affects binning—fast scrolls increase the “SOON” radius.


Chromium supports two rasterization paths, selected based on device capabilities, content complexity, and compositor settings.

Worker threads execute paint commands using Skia’s CPU rasterizer, producing bitmaps in shared memory. The SoftwareImageDecodeCache handles image decode, scaling, and color correction as prerequisite tasks.

Buffer Providers:

  • ZeroCopyRasterBufferProvider: Maps GPU memory directly; zero CPU-to-GPU copy
  • OneCopyRasterBufferProvider: Rasterizes to shared memory, then uploads to GPU; required when direct mapping isn’t available

Software raster remains the fallback when GPU acceleration fails (driver bugs, unsupported content) and is sometimes faster for very simple content where GPU overhead dominates.

Modern Chromium uses Out-of-Process Rasterization (OOP-R) (Out-of-Process Rasterization). The renderer process doesn’t execute GPU commands directly—it serializes paint operations into a command buffer that the Viz process (GPU process) executes.

Why OOP-R?

  1. Security: Renderer processes are sandboxed; they cannot access platform 3D APIs directly
  2. Stability: GPU driver crashes don’t take down the renderer
  3. Parallelism: CPU and GPU work can overlap across process boundaries

The Viz process uses Skia to execute commands against the actual GPU. Resources are shared via mailboxes (opaque identifiers) and synchronized through sync tokens.

Skia’s GPU backend has evolved significantly:

Ganesh (Legacy):

  • Designed around OpenGL semantics
  • Single-threaded command submission
  • No depth testing—overdraw handled by painter’s algorithm
  • Specialized shader pipelines created performance cliffs (unpredictable hitches when new pipelines compile during animation)

Graphite (Current/Future):

  • Built for Vulkan, Metal, and D3D12
  • Multi-threaded by default: independent Recorder objects produce Recording instances on worker threads
  • Depth testing for 2D: each draw receives a z-value, allowing opaque objects to be reordered while the depth buffer maintains correctness—reduces overdraw
  • Consolidated pipeline compilation at startup, not during animation

As of Chrome 125+ (mid-2024), Graphite is enabled by default on Apple Silicon Macs. Windows support (via Dawn’s D3D11/D3D12 backends) is in progress. The flag --skia-graphite-backend enables it on other platforms.

Performance Impact: Graphite achieves ~15% improvement on MotionMark 1.3 benchmarks on M3 MacBooks, with measurable improvements in INP (Interaction to Next Paint), LCP (Largest Contentful Paint), and dropped frame rates.


The compositor decides which parts of the DOM (Document Object Model) should live on their own composited layers. This decision, called layer promotion, trades memory for animation performance.

Elements are promoted when they meet criteria suggesting frequent changes or GPU-native content:

TriggerExampleWhy
Explicit hintwill-change: transformDeveloper signals intent to animate
3D transformstranslate3d(), perspectiveAlready GPU-native operations
Hardware content<video>, <canvas>, <iframe>Content rendered by GPU or separate process
Opacity animationopacity in CSS animationOpacity is a compositor-only property
Overlap correctionElement above a promoted layerPrevents incorrect z-ordering artifacts

When a promoted element sits above a non-promoted element in z-order, the browser may force-promote the lower element to maintain correctness. Without this, the compositor would blend layers incorrectly since it doesn’t have z-buffer information across layers.

This can cascade: one promoted element causes neighbors to promote, which causes their neighbors to promote—resulting in layer explosion.

To mitigate layer explosion, the compositor uses squashing: multiple overlapping elements that would be promoted for overlap reasons are merged into a single composited layer when possible. This bounds memory growth while preserving correctness.

When Squashing Fails:

  • Different blend modes between elements
  • Different opacity values
  • Transform animations that would require re-rasterization
  • Elements requiring different scroll containers

The primary benefit of layerization is bypassing the main thread for common interactions.

When an element is on its own layer and you animate compositor-only properties (transform, opacity), the pipeline shortcuts to:

  1. Main Thread: Idle (or busy with JS (JavaScript))
  2. Compositor Thread: Receives animation tick, updates transform/opacity values in property trees
  3. GPU: Re-composites existing textures with new transformation matrix

No re-layout, no re-paint, no re-raster. The textures already exist; only the final blend changes.

Compositor-Only vs Full-Pipeline Animation
/* ❌ Triggers Layout → Paint → Raster → Composite */
.animate-position {
transition: top 0.2s;
}
/* ✅ Triggers only Composite */
.animate-transform {
transition: transform 0.2s;
}

A position: fixed header is typically promoted to its own layer. During scroll:

  • The content layer’s texture is shifted by the scroll offset
  • The header layer’s texture remains static at viewport position
  • The display compositor blends these textures at different offsets

The GPU performs a simple texture blend—no main thread involvement, no rasterization.


Layer promotion and GPU rasterization are not free. Understanding the costs prevents performance regressions.

Each composited layer consumes GPU memory proportional to its pixel area:

Formula: Width × Height × 4 bytes (RGBA (Red, Green, Blue, Alpha))

Layer SizeMemory (RGBA)
1920×1080 (Full HD)~8 MB
2560×1440 (QHD)~14 MB
3840×2160 (4K)~33 MB

Mobile devices with 2-4 GB total RAM and shared GPU memory can exhaust resources quickly. Symptoms: compositor falls back to software raster, animations stutter, browser process crashes.

When content changes (e.g., background-color), the browser must:

  1. Re-rasterize affected tiles
  2. Upload new texture data from CPU to GPU memory

On memory-bandwidth-constrained devices, uploading a single 512×512 texture can take 3-5ms—enough to drop a frame. The compositor throttles uploads to avoid this, but rapid content changes can still cause jank.

When scroll velocity exceeds rasterization throughput, the compositor displays the active tree’s textures while the pending tree hasn’t finished rasterizing new viewport content. Users see empty rectangles (historically gray checkerboard patterns).

Mitigations:

  • Pre-rasterize tiles outside the viewport (configurable radius)
  • Use lower-resolution fallback tilings during fast scroll
  • Async image decode prevents blocking raster on image data

When Mitigations Fail:

  • Extremely fast flings (flick-scroll)
  • Complex content (heavy SVG, many layers)
  • Memory pressure evicting pre-rasterized tiles

Since rasterization runs in the Viz process with real GPU access, driver bugs can crash it. Chromium handles this:

  • Viz process restarts automatically
  • Renderer processes reconnect
  • Visible as brief black flash or texture loss

Frequent crashes may trigger fallback to software rendering for stability.


Rasterization is the heavy-lifting stage where abstract paint commands become concrete pixels. The tiled, dual-tree, priority-based architecture exists to solve fundamental constraints: GPU memory is limited, users scroll unpredictably fast, and the main thread must remain available for JavaScript.

For production optimization:

  1. Minimize layer count: Each layer costs memory; use will-change sparingly
  2. Prefer compositor-only animations: transform and opacity skip the entire main-thread pipeline
  3. Avoid unnecessary repaints: Color and background changes trigger texture re-upload
  4. Test on constrained devices: Desktop GPU memory hides problems that manifest on mobile

The transition from Ganesh to Graphite represents Chromium’s investment in modern GPU APIs, multi-threaded rasterization, and reduced overdraw—improvements that compound as web content grows more complex.


  • Paint Stage: Understanding how display lists (paint records) are generated
  • Compositing: How layers are assembled into the final frame
  • GPU Architecture Basics: Distinction between system RAM and VRAM; texture upload costs
TermDefinition
OOP-ROut-of-Process Rasterization; Skia executes in the Viz (GPU) process, not the renderer
Viz ProcessChromium’s GPU process; owns GPU resources, runs display compositor
VRAMVideo RAM; dedicated GPU memory for textures
SkiaOpen-source 2D graphics library used by Chrome, Android, and Flutter
GaneshSkia’s legacy OpenGL-based GPU backend
GraphiteSkia’s modern GPU backend for Vulkan/Metal/D3D12; multi-threaded with depth testing
TilingDividing large surfaces into fixed-size rectangles for independent rasterization
CheckerboardingVisual artifact when tiles haven’t been rasterized before becoming visible
Layer SquashingMerging multiple overlapping promoted elements into a single layer to reduce memory
Compositor-Only PropertyCSS properties (transform, opacity) that can be animated without main thread involvement
  • Rasterization converts paint records into GPU textures (or CPU bitmaps)
  • Tiling bounds memory usage and enables incremental rasterization (256×256 px software, viewport-based GPU)
  • Dual-tree architecture (pending/active) prevents checkerboard artifacts during commits
  • Tile prioritization (NOW/SOON/EVENTUALLY/NEVER) ensures viewport content renders first
  • OOP-R isolates GPU operations in the Viz process for security and stability
  • Graphite replaces Ganesh with multi-threaded, depth-tested rasterization (~15% gains)
  • Layer promotion enables 60fps animations but costs memory; squashing mitigates explosion
  • Trade-offs: VRAM pressure, texture upload latency, checkerboarding on fast scroll

Read more