Infrastructure & Delivery
15 min read

Edge Delivery and Cache Invalidation

Production CDN caching architecture for balancing content freshness against cache efficiency. Covers cache key design, invalidation strategies (path-based, tag-based, versioned URLs), stale-while-revalidate patterns, and edge compute use cases—with specific focus on design tradeoffs, operational failure modes, and the thundering herd problem that senior engineers encounter during cache-related incidents.

Clients

Edge PoPs

Origin Shield

Origin Servers

Primary Origin

Failover Origin

Shield PoP

Edge Location 1

Edge Location 2

Edge Location N

Browser

Mobile App

API Consumer

CDN topology: clients hit edge PoPs, misses route through origin shield (single cache layer) before reaching origin. This architecture reduces origin load and enables request coalescing.

CDN caching reduces to three interconnected decisions: what to cache (cache key design), how long to cache (TTL strategy), and when to invalidate (freshness vs availability tradeoff).

DecisionOptimizes ForSacrifices
Aggressive caching (long TTL)Origin load reduction, latencyContent freshness
Conservative caching (short TTL)Content freshnessCache hit ratio, origin load
Cache key expansion (more Vary headers)Content correctness per variantCache fragmentation
Versioned URLsEliminates invalidation needURL management complexity
Tag-based purgeGranular invalidationOperational complexity

Key architectural insight: The cache key determines correctness; the TTL determines performance. A misconfigured cache key serves wrong content to users. A misconfigured TTL either hammers your origin or serves stale content.

Invalidation hierarchy (prefer higher):

  1. Versioned/fingerprinted URLs (e.g., main.abc123.js)—no invalidation needed
  2. Stale-while-revalidate—async refresh, no user-visible staleness
  3. Tag-based purge—granular, handles complex dependencies
  4. Path-based purge—simple but coarse-grained
  5. Full cache clear—last resort, triggers thundering herd

Edge compute shifts personalization from origin to edge—decisions made in <1ms at 225+ locations vs 200ms+ round-trip to origin.

HTTP caching behavior is defined by RFC 9111 (June 2022), which obsoletes RFC 7234. The specification distinguishes between private caches (browser) and shared caches (CDN, proxy). This distinction is critical: directives like private and s-maxage exist specifically to control behavior differences between these cache types.

The Cache-Control header is the primary mechanism for controlling caching behavior. Key directives and their design rationale:

DirectiveTargetBehaviorDesign Rationale
max-age=NAll cachesResponse fresh for N secondsSimple TTL control
s-maxage=NShared caches onlyOverrides max-age for CDN/proxyCDN often needs different TTL than browser
no-cacheAll cachesMust revalidate before servingFreshness guarantee (not “don’t cache”)
no-storeAll cachesNever store in any cacheSensitive data protection
privateBrowser onlyExclude from shared cachesUser-specific content
publicAll cachesCacheable even for authenticated requestsOverride default behavior
must-revalidateAll cachesCannot serve stale after TTLStrict freshness requirement
immutableAll cachesContent won’t change during freshnessAvoid conditional requests for fingerprinted assets

Common misconception: no-cache does NOT mean “don’t cache.” It means “cache, but always revalidate before serving.” Use no-store to prevent caching entirely.

Example for versioned assets:

Cache-Control: public, max-age=31536000, immutable

This tells caches: cache for 1 year, any cache can store it, and don’t bother revalidating (the fingerprinted URL guarantees immutability).

Example for HTML documents:

Cache-Control: no-cache, must-revalidate

Cache the document, but always check with origin before serving. If origin is unreachable, return error rather than stale content.

The cache key uniquely identifies cached objects. A poorly designed cache key either:

  • Fragments the cache (too many keys) → low hit ratio, high origin load
  • Serves wrong content (insufficient keys) → users see incorrect responses

Default cache key components (most CDNs):

  • HTTP method (GET, HEAD)
  • Host header
  • URL path
  • Query string

The cache key correctness rule: If any request header affects the response content, that header must be part of the cache key (or handled via Vary).

Example: Language-based content

If /products returns different content based on Accept-Language:

# Response header
Vary: Accept-Language

The CDN now caches separate responses for Accept-Language: en-US, Accept-Language: de-DE, etc.

Cache fragmentation problem: Accept-Language has thousands of variations (en-US, en-GB, en, en-US,en;q=0.9). Each variation creates a separate cache entry. Solutions:

  1. Normalize headers at edge (collapse en-US, en-GBen)
  2. Use URL-based routing (/en/products, /de/products)
  3. Limit supported languages and serve default for others

Best practices for cache key design:

DoDon’t
Include only headers that affect responseInclude User-Agent (thousands of variations)
Normalize headers at edge before cachingPass raw headers to cache key
Use URL-based variants when possibleRely on Vary for high-cardinality headers
Whitelist query parametersInclude all query parameters (tracking IDs fragment cache)

TTL selection balances freshness against hit ratio. The right TTL depends on content volatility and staleness tolerance:

Content TypeRecommended TTLCache-Control Example
Fingerprinted assets (JS, CSS, images with hash)1 yearmax-age=31536000, immutable
Static images without fingerprint1 day to 1 weekmax-age=86400
HTML documentsRevalidate or short TTLno-cache or max-age=300, stale-while-revalidate=60
API responses (read-heavy)Minutes to hourss-maxage=300, stale-while-revalidate=60
User-specific contentDon’t cache at CDNprivate, no-store
Real-time dataDon’t cacheno-store

Design rationale for fingerprinted assets: Content-addressed URLs (e.g., main.abc123.js) guarantee immutability—the URL changes when content changes. This enables maximum caching without staleness risk. The immutable directive tells browsers not to revalidate even on reload.

The HTML document problem: HTML references other assets by URL. If you cache HTML for 1 hour and deploy new CSS, users get old HTML pointing to old CSS URL, then CSS fingerprint changes, causing broken styles until HTML cache expires. Solutions:

  1. Use no-cache for HTML (always revalidate)
  2. Use very short TTL with stale-while-revalidate
  3. Purge HTML on every deployment

Cache invalidation is one of the two hard problems in computer science (along with naming things and off-by-one errors). The challenge: how do you tell globally distributed caches that content has changed?

The best invalidation strategy is avoiding invalidation. Fingerprinted URLs make cache entries naturally obsolete:

# Old version
/assets/main.abc123.js
# New deployment
/assets/main.def456.js

Why this works: The URL is the cache key. A new URL is a cache miss, fetched fresh from origin. Old URLs can stay cached forever—they’ll naturally expire or be evicted under memory pressure.

Implementation with build tools:

vite.config.js
3 collapsed lines
// Vite configuration for content-hashed filenames
// Produces: main.abc123.js (hash changes when content changes)
export default {
build: {
rollupOptions: {
output: {
entryFileNames: "[name].[hash].js",
chunkFileNames: "[name].[hash].js",
assetFileNames: "[name].[hash][extname]",
},
},
},
}

Limitation: Only works for assets referenced by other files. HTML documents at fixed URLs (/, /products/123) cannot use this pattern—they need explicit invalidation.

The simplest invalidation: tell the CDN to remove specific URLs.

Exact path purge:

Terminal window
# Purge single URL
aws cloudfront create-invalidation \
--distribution-id E1234567890AB \
--paths "/products/123"

Wildcard purge:

Terminal window
# Purge all products
aws cloudfront create-invalidation \
--distribution-id E1234567890AB \
--paths "/products/*"

Limitations:

  • No relationship awareness: Purging /products/123 doesn’t purge /categories/electronics even if it displays product 123
  • Rate limits: Google Cloud CDN limits to 500 invalidations/minute
  • Propagation delay: CloudFront takes 30s-3min; Google Cloud CDN takes 5-10min

When to use: Simple sites with direct URL-to-content mapping. Emergency removal of specific content.

Tag-based purging enables many-to-many relationships between content and cache entries. When content changes, purge by tag—all entries with that tag are invalidated regardless of URL.

How it works:

  1. Origin adds tags to response headers:
Surrogate-Key: product-123 category-electronics homepage
  1. CDN indexes entries by tags

  2. On product update, purge by tag:

Terminal window
curl -X POST "https://api.fastly.com/service/{id}/purge/product-123" \
-H "Fastly-Key: {api_key}"
  1. All URLs tagged with product-123 are invalidated:
    • /products/123
    • /categories/electronics (if it displays product 123)
    • /homepage (if product 123 is featured)

Fastly implementation (from Fastly documentation):

# Response headers
Surrogate-Key: post/1234 category/news author/jane
Surrogate-Control: max-age=86400

Limits: Individual keys max 1024 bytes, total header max 16,384 bytes.

CloudFront implementation: CloudFront doesn’t support surrogate keys natively. Workaround: use Lambda@Edge to manage a DynamoDB index mapping tags to URLs, then purge URLs programmatically.

Akamai implementation (Cache Tags):

Edge-Cache-Tag: product-123, category-electronics

Purge via API or Property Manager rules.

Design tradeoff: Tag-based purging requires:

  1. Application code to generate tags
  2. CDN that supports the feature (Fastly, Akamai, Cloudflare with Enterprise)
  3. Operational tooling to trigger purges

The complexity is justified when content relationships are complex (CMS, e-commerce with product listings).

Purge is not instant. Time from purge request to global effect:

CDN ProviderTypical PropagationNotes
Cloudflare<150ms (P50)“Instant Purge” via distributed invalidation
Fastly~150ms globalSub-second for most requests
AWS CloudFront30s - 3minVaries by distribution size
Google Cloud CDN5-10 minRate limited (500/min)
AkamaiSeconds to minutesDepends on product tier

Operational implication: Don’t assume purge is instant. If you purge and immediately test, you may see cached content. Build delays into deployment pipelines or use cache-busting query params for verification.

Cost:

  • CloudFront: First 1,000 paths/month free, $0.005/path beyond
  • Wildcard /* counts as one path but invalidates everything

RFC 5861 defines two Cache-Control extensions that fundamentally change the freshness vs availability tradeoff.

Cache-Control: max-age=600, stale-while-revalidate=30

Behavior:

  1. 0-600s: Content fresh, serve from cache
  2. 600-630s: Content stale, serve from cache immediately, trigger async revalidation
  3. >630s: Content truly stale, synchronous fetch required

Design rationale: Hides revalidation latency from users. The first request after TTL expires gets served instantly from cache while triggering background refresh. Subsequent requests get fresh content.

OriginCDNClientOriginCDNClientContent cached, TTL expired, within SWR windowCache updatedGET /api/data200 OK (stale content, instant)Background revalidation200 OK (fresh content)GET /api/data200 OK (fresh content)
Stale-while-revalidate flow: first request after TTL expires serves stale content instantly while triggering async revalidation. Next request gets fresh content.

Browser support: Chrome 75+, Firefox 68+, Safari 13+, Edge 79+.

CDN support: Cloudflare, Fastly, KeyCDN, Varnish. CloudFront requires Lambda@Edge for full implementation.

Edge case: If no traffic arrives during the SWR window, content becomes truly stale. High-traffic endpoints benefit most; low-traffic endpoints may still experience synchronous fetches.

Cache-Control: max-age=600, stale-if-error=86400

Behavior: If origin returns 5xx error or is unreachable, serve stale content for the specified duration instead of propagating the error.

Design rationale: Availability over freshness. Users see slightly old content rather than error pages during origin outages.

Combined pattern for production APIs:

Cache-Control: max-age=300, stale-while-revalidate=60, stale-if-error=86400
  • Fresh for 5 minutes
  • Serve stale + async revalidate for 1 minute after
  • Serve stale on error for 24 hours

Operational benefit: Origin deployments become safer. If a bad deploy causes 500 errors, users continue seeing cached content while you fix the issue.

Varnish implements similar functionality with more control via VCL (Varnish Configuration Language):

grace.vcl
4 collapsed lines
# Varnish grace mode configuration
# beresp.grace: how long to serve stale while revalidating
# req.grace: how long client accepts stale content
sub vcl_backend_response {
set beresp.ttl = 300s; # Fresh for 5 minutes
set beresp.grace = 1h; # Serve stale for 1 hour while revalidating
}
sub vcl_recv {
# Extend grace period when backend is unhealthy
if (std.healthy(req.backend_hint)) {
set req.grace = 10s;
} else {
set req.grace = 24h; # Extended grace during outages
}
}

Key insight: Varnish separates object grace (how long to keep stale content) from request grace (how long a specific request accepts stale content). This enables dynamic behavior based on backend health.

Edge compute moves logic from origin to CDN edge locations, reducing latency from 200ms+ (origin round-trip) to <1ms (edge execution).

PlatformRuntimeCold StartMax ExecutionUse Case
CloudFront FunctionsJavaScript<1ms<1ms CPUSimple transforms, redirects
Lambda@EdgeNode.js, Python50-100ms5-30sComplex logic, API calls
Cloudflare WorkersJavaScript/WASM<1ms10-30ms CPUFull applications
Fastly ComputeWASM (Rust, Go, JS)35μsNo hard limitHigh-performance compute

Cost comparison (per 1M invocations):

  • CloudFront Functions: $0.10
  • Lambda@Edge: $0.60 + execution time
  • Cloudflare Workers: $0.50 (included in paid plans)

Traditional personalization requires origin processing per request. Edge compute enables personalization at cache layer:

Pattern: Cookie-based variant selection

personalization.js
5 collapsed lines
// CloudFront Function for cookie-based personalization
// Routes to different cached content based on user segment
// Cache key includes the segment, so variants are cached separately
function handler(event) {
var request = event.request
var cookies = request.cookies
// Determine user segment from cookie
var segment = "default"
if (cookies.user_segment) {
segment = cookies.user_segment.value
}
// Add segment to cache key via custom header
request.headers["x-user-segment"] = { value: segment }
return request
}

CloudFront configuration: Include x-user-segment header in cache key policy. Each segment gets its own cached variant.

Result: 3 user segments × 1000 pages = 3000 cache entries, all served from edge without origin involvement.

Edge-based A/B testing eliminates the latency and complexity of client-side testing libraries.

Pattern: Consistent assignment via cookie

ab-testing.js
6 collapsed lines
// Cloudflare Worker for A/B testing
// Assigns users to variants consistently via cookie
// Routes to variant-specific origin path
export default {
async fetch(request) {
const url = new URL(request.url)
let variant = getCookie(request, "ab_variant")
if (!variant) {
// New user: randomly assign variant
variant = Math.random() < 0.5 ? "a" : "b"
}
// Route to variant-specific origin
url.pathname = `/${variant}${url.pathname}`
const response = await fetch(url.toString(), request)
// Set cookie for consistent future assignments
const newResponse = new Response(response.body, response)
if (!getCookie(request, "ab_variant")) {
newResponse.headers.set("Set-Cookie", `ab_variant=${variant}; Path=/; Max-Age=86400`)
}
return newResponse
},
}

Why edge beats client-side:

  • No layout shift (content decided before HTML sent)
  • No JavaScript dependency
  • Consistent assignment across page loads
  • Works for users with JS disabled

Edge compute enables geographic routing for latency optimization or compliance:

geo-routing.js
4 collapsed lines
// Lambda@Edge geo-routing for GDPR compliance
// Routes EU users to EU-based origin
exports.handler = async (event) => {
const request = event.Records[0].cf.request
const country = request.headers["cloudfront-viewer-country"][0].value
const euCountries = ["DE", "FR", "IT", "ES", "NL", "BE", "AT", "PL"]
if (euCountries.includes(country)) {
request.origin.custom.domainName = "eu-origin.example.com"
} else {
request.origin.custom.domainName = "us-origin.example.com"
}
return request
}

Compliance use case: GDPR requires certain data to stay within EU. Route EU users to EU origins; their requests never touch US infrastructure.

Cache hit ratio (CHR) is the primary health metric for CDN effectiveness:

Formula: CHR = Cache Hits / (Cache Hits + Cache Misses) × 100

Target thresholds:

  • Static assets: >95%
  • Overall site: >85%
  • Alert threshold: <80% (investigate cache key issues or TTL misconfiguration)

Segmented monitoring is critical: A global 90% CHR can mask a 50% CHR for a specific content type or region. Monitor by:

  • Content type (HTML, JS, CSS, images, API)
  • Geographic region
  • URL pattern

Common CHR killers:

  • High-cardinality Vary headers (cache fragmentation)
  • Query parameters in cache key (tracking IDs create unique keys)
  • Short TTLs on high-traffic content
  • Origin returning Cache-Control: no-store unexpectedly

Problem: When a popular cache entry expires, all concurrent requests miss cache and hit origin simultaneously.

OriginCDNClient NClient 2Client 1OriginCDNClient NClient 2Client 1Cache entry expires3x load spike!GET /popular-itemGET /popular-itemGET /popular-itemGET /popular-itemGET /popular-itemGET /popular-item
Cache stampede: multiple concurrent requests after cache expiry all hit origin, causing load spike proportional to concurrency.

Real-world impact: A cache entry with 98% hit ratio expiring means 50x origin load spike (2% misses become 100% misses during revalidation window).

Mitigation strategies:

  1. Request coalescing (CDN feature): CDN holds duplicate requests while one fetches from origin

    • Fastly: Enabled by default
    • CloudFront: Limited support via Origin Shield
    • Cloudflare: “Tiered Cache” provides similar behavior
  2. Stale-while-revalidate: First request serves stale, triggers async refresh—no stampede because subsequent requests still hit cache

  3. Probabilistic early expiration: Refresh before TTL expires:

    actual_ttl = ttl - (random() * jitter_factor)

    Spreads revalidation across time window instead of thundering at exact TTL

  4. Origin Shield: Centralized cache layer between edge PoPs and origin. Misses from multiple edges coalesce at shield.

Origin Shield adds a single cache layer between globally distributed edge PoPs and origin:

Without shield:

Edge NYC miss → Origin
Edge London miss → Origin
Edge Tokyo miss → Origin
= 3 origin requests

With shield:

Edge NYC miss → Shield (Virginia) miss → Origin
Edge London miss → Shield (Virginia) hit
Edge Tokyo miss → Shield (Virginia) hit
= 1 origin request

When to enable:

  • High traffic with moderate cache hit ratio (<95%)
  • Origin cannot handle traffic spikes
  • Global audience (many edge PoPs)

Cost tradeoff: Additional per-request charge at shield. Justified when origin protection value exceeds shield cost.

Design for origin failure:

  1. Extended stale-if-error: Serve stale content for 24-48 hours during outages

    Cache-Control: max-age=300, stale-if-error=172800
  2. Static fallback at edge: If origin returns 5xx, serve static fallback page from edge storage

  3. Health check integration: CDN monitors origin health, extends grace period when origin is unhealthy (Varnish pattern shown earlier)

  4. Circuit breaker at edge: After N consecutive origin errors, stop sending traffic for cooldown period

Edge delivery and cache invalidation is fundamentally about managing the tension between content freshness and system performance. The mature approach:

  1. Prefer versioned URLs for assets—eliminates invalidation entirely
  2. Use stale-while-revalidate for HTML/API responses—hides latency, prevents stampedes
  3. Implement tag-based purging for complex content relationships—surgical invalidation without full cache clear
  4. Monitor cache hit ratio by segment—global metrics hide localized problems
  5. Design for origin failure—extended grace periods turn partial outages into non-events

The cache key determines correctness; the TTL determines performance. Get the cache key wrong, and users see incorrect content. Get the TTL wrong, and you either hammer your origin or serve stale content.

Edge compute shifts the personalization boundary from origin to edge—decisions that previously required 200ms origin round-trips now execute in <1ms at the nearest edge location. This isn’t just an optimization; it fundamentally changes what’s architecturally possible for latency-sensitive applications.

  • HTTP caching model (request/response headers, freshness, validation)
  • CDN concepts (edge PoPs, origin, cache keys)
  • Basic understanding of distributed systems failure modes
  • Cache key: Unique identifier for cached content (typically URL + selected headers)
  • TTL (Time to Live): Duration content is considered fresh
  • CHR (Cache Hit Ratio): Percentage of requests served from cache
  • PoP (Point of Presence): Edge location where CDN serves content
  • Origin Shield: Intermediate cache layer between edge PoPs and origin
  • Surrogate key: Tag associated with cached content for grouped invalidation
  • Thundering herd: Multiple simultaneous requests overwhelming origin after cache expiry
  • SWR (Stale-While-Revalidate): Serve stale content while asynchronously fetching fresh
  • SIE (Stale-If-Error): Serve stale content when origin returns error
  • Cache key design determines correctness; TTL determines performance—both can cause production incidents if misconfigured
  • Versioned/fingerprinted URLs eliminate invalidation need for assets—use immutable directive for 1-year TTL
  • Tag-based purging (surrogate keys) handles complex content relationships—supported by Fastly, Akamai, Cloudflare Enterprise
  • Stale-while-revalidate hides revalidation latency and prevents cache stampedes—combine with stale-if-error for origin failure protection
  • Origin Shield collapses multi-PoP misses into single origin request—essential for stampede protection
  • Edge compute enables <1ms personalization decisions vs 200ms+ origin round-trip

Specifications

CDN Provider Documentation

Educational Resources

Engineering Blogs

Read more

  • Previous

    Logging, Metrics, and Tracing Fundamentals

    Platform Engineering / Observability & Reliability 13 min read

    Observability in distributed systems rests on three complementary signals: logs capture discrete events with full context, metrics quantify system behavior over time, and traces reconstruct request paths across service boundaries. Each signal answers different questions, and choosing wrong defaults for cardinality, sampling, or retention can render your observability pipeline either useless or prohibitively expensive. This article covers the design reasoning behind each signal type, OpenTelemetry’s unified data model, and the operational trade-offs that determine whether your system remains debuggable at scale.

  • Next

    Deployment Strategies: Blue-Green, Canary, and Rolling

    Platform Engineering / Infrastructure & Delivery 19 min read

    Production deployment strategies for balancing release velocity against blast radius. Covers the architectural trade-offs between blue-green, canary, and rolling deployments—with specific focus on traffic shifting mechanics, database migration coordination, automated rollback criteria, and operational failure modes that senior engineers encounter during incident response.