Edge Delivery and Cache Invalidation

A modern CDN is two systems welded together: a globally distributed read-through cache governed by HTTP caching semantics, and a programmable purge plane that lets you reach into hundreds of points of presence to expire content on demand. Both halves have to be designed deliberately; defaults will give you a hot origin, fragmented cache, or stale pages on incident day. This article is the mental model and the failure-mode catalogue a senior engineer needs to design cache keys, pick TTLs, choose an invalidation strategy, and make edge compute pay its way.

CDN topology — clients hit edge PoPs; edge misses funnel through Origin Shield, which collapses concurrent fetches into a single origin request. — CDN topology: clients hit edge PoPs; edge misses funnel through Origin Shield, which collapses concurrent fetches into a single origin request.

Thesis

Cache delivery reduces to three coupled decisions: what to cache (cache key design), how long (TTL strategy), and how to invalidate (path, tag, version, or wait). Get the cache key wrong and users see other users’ content. Get the TTL wrong and you either hammer your origin or serve stale pages through an incident. Get invalidation wrong and you can’t ship.

Decision	Optimises	Sacrifices
Long TTL	Origin load, latency	Freshness; needs deliberate invalidation
Short TTL	Freshness	Hit ratio, origin load
Wide cache key (more `Vary`)	Variant correctness	Fragmentation; lower hit ratio
Versioned / fingerprinted URLs	Eliminates invalidation	URL management at build / deploy time
Surrogate-key (tag) purge	Granular invalidation	Application + CDN tooling
Path purge	Simplicity	No relationship awareness; coarse
Stale-while-revalidate	User-visible latency on refresh	Brief staleness; only useful with steady traffic

Two non-negotiables that keep coming back through the rest of the article:

The cache key controls correctness; the TTL controls cost and freshness. They fail in different ways and need separate review.
Prefer the highest invalidation method on this ladder you can reach: versioned URLs → stale-while-revalidate → surrogate-key purge → path purge → full clear.

Mental model

A request fans out through three caches before it reaches origin: the browser (private cache, governed by Cache-Control), the edge PoP (shared cache geographically nearest to the client), and optionally an origin shield (a single regional cache layer that collapses misses from many edges into one origin request). RFC 9111 calls these “private” and “shared” caches and uses the directive surface — private vs. s-maxage, in particular — to make the distinction enforceable.¹

Each cache layer answers two questions on every request: do I have a cached response under this key? and is it still fresh? The cache key is built from the request (default: method + host + path + query string, plus whatever the cache configuration adds via Vary or explicit policy). Freshness comes from the max-age / s-maxage directives, the response date, and the various stale-* extensions. Everything else — purges, surrogate keys, edge compute — is a way to bend those two answers without rewriting your application.

Cache fundamentals

`Cache-Control` directives that actually matter at the edge

Directive	Target	Behaviour	When to reach for it
`max-age=N`	All caches	Response is fresh for N seconds	The default knob; everything else modifies it.
`s-maxage=N`	Shared caches	Overrides `max-age` for CDNs and proxies	When the CDN should cache longer than the browser does.
`no-cache`	All caches	Cache, but revalidate before serving	Documents you must serve fresh on every navigation.
`no-store`	All caches	Never store	PII, auth-bearing responses you cannot risk leaking.
`private`	Browser only	Excluded from shared caches	Per-user content that may still be browser-cached.
`public`	All caches	Cacheable even with `Authorization`	Explicit override for authenticated-but-shareable responses.
`must-revalidate`	All caches	Cannot serve stale once expired	Strict-freshness requirements; e.g. financial pages.
`immutable`	All caches	Will not change during the freshness window	Fingerprinted assets — skips conditional revalidation entirely.

Caution

no-cache does not mean “do not cache”. It means “cache, but revalidate before each use” (RFC 9111 §5.2.2.4). Use no-store if you need to forbid caching entirely. Mixing the two up has been the root cause of more than one credential leak.

Two header recipes cover most production responses:

1Cache-Control: public, max-age=31536000, immutable

1Cache-Control: no-cache, must-revalidate

Targeted cache control: `CDN-Cache-Control` and `Surrogate-Control`

The standard Cache-Control header is shared by the browser and every intermediary, which makes it awkward when you want different semantics at the edge versus the client. Two header families exist for that split:

Surrogate-Control — defined in the W3C Edge Architecture Specification 1.0 (2001) by Akamai and Oracle. Targets only “surrogates” (CDN nodes); compliant surrogates strip the header before forwarding to the client.² Honored by Fastly and parts of Akamai; not by Cloudflare or CloudFront.
CDN-Cache-Control — standardized as RFC 9213 (Targeted HTTP Cache Control, June 2022). Same directive grammar as Cache-Control; explicitly addresses the CDN tier and is co-supported by Cloudflare,³ Vercel, and other modern CDNs. Use this for greenfield work — Surrogate-Control is the older, vendor-fragmented sibling.

Cache key design

The cache key uniquely identifies a stored response. Most CDNs default to method + host + path + query string. Anything else that affects the response — the Accept-Language, the device class, the user segment — has to be in the key, either explicitly via the cache policy or implicitly via the Vary response header.

Important

The cache-key correctness rule: if a request header changes the response body, that header has to participate in the cache key. Skip this and the cache will serve user A’s response to user B; the bug is silent until someone notices.

A poorly designed key fails in two directions:

Too narrow → the cache returns the wrong variant. (Auth user sees anon shell; English user sees German content.)
Too wide → the cache fragments. Hit ratio collapses, origin load climbs, and you debug a cost spike instead of a correctness bug.

The textbook fragmentation case is Vary: Accept-Language. Browsers send raw locale strings (en-US, en-US,en;q=0.9, en-GB,en;q=0.8,fr;q=0.5) and the cardinality is effectively unbounded — Fastly’s data shows this header alone can produce thousands of variants per URL.⁴ Worse, browsers don’t actually store multiple variants per URL the way intermediaries do; they treat Vary as a validator and refetch on mismatch, so the win you wanted at the edge often doesn’t materialise in the browser at all.⁵

Three working strategies, in rough order of preference:

Normalise at the edge. Collapse Accept-Language to a closed set (en, de, fr, fallback) in a request-side edge function or VCL block, then Vary on the normalised value.
Encode the variant in the URL. /en/products, /de/products. Google explicitly recommends locale-specific URLs over Accept-Language for international SEO,⁶ and it makes the cache key obvious.
Move the dimension to a custom header. X-Device-Class: mobile|tablet|desktop, populated by a device-detection layer, then Vary: X-Device-Class.

Do	Don’t
Include only headers that change the response body	`Vary: User-Agent` — high-cardinality, effectively disables caching
Normalise high-cardinality headers before they hit cache	Pass raw locale, UA, or cookie blobs into the key
Whitelist allowed query parameters	Include the entire query string verbatim — tracking IDs fragment cache
Use URL path variants for stable dimensions	Lean on `Vary` for anything with more than ~10 distinct values

TTL strategies by content type

Pick TTL by content volatility and staleness tolerance, not by gut feel:

Content type	Recommended TTL	Cache-Control example
Fingerprinted JS/CSS/images (`main.abc123.js`)	1 year	`max-age=31536000, immutable`
Static images, no fingerprint	Hours to days	`max-age=86400`
HTML documents	Revalidate or short TTL	`no-cache` or `max-age=60, stale-while-revalidate=600`
API responses, read-heavy	Minutes	`s-maxage=300, stale-while-revalidate=60`
User-specific responses	Don’t share-cache	`private, no-store`
Real-time data	Don’t cache	`no-store`

The interesting failure mode is the HTML-with-fingerprinted-assets trap. The HTML references assets by URL; if the HTML is cached for an hour and you deploy new CSS, users get yesterday’s HTML pointing at yesterday’s CSS path until the HTML cache turns over. Three reliable workarounds: serve HTML with no-cache, use a short max-age paired with stale-while-revalidate, or purge the HTML key on every deploy. Pick one and write it down — this is one of the most common “site looks broken after deploy” classes.

Cache invalidation strategies

Note

The mental shortcut: prefer mechanisms that don’t require invalidation, then mechanisms that hide invalidation latency, then mechanisms that purge precisely. Path purges and full clears are the bottom of the ladder.

Decision tree — pick an invalidation strategy by whether the URL is fingerprinted, how many entries depend on the change, your staleness tolerance, and CDN feature support. — Decision tree: pick an invalidation strategy by URL shape, dependency fan-out, staleness tolerance, and CDN feature support.

Versioned URLs — avoid invalidation

The cheapest invalidation is the one you never issue. Content-addressed URLs (main.abc123.js) make a new version a cache miss by definition; old versions stay cached until they’re evicted under memory pressure. The lifecycle is mechanical:

Content-addressed asset lifecycle — hashing the source produces a new immutable URL on every deploy; new HTML references the new URL, cache MISS pulls and fills it, and old hashes age out under LRU. — Content-addressed asset lifecycle: hashing produces a new immutable URL each deploy; new HTML references the new URL, the cache fills on first MISS, and old hashes age out under LRU pressure.

1export default {2  build: {3    rollupOptions: {4      output: {5        entryFileNames: "[name].[hash].js",6        chunkFileNames: "[name].[hash].js",7        assetFileNames: "[name].[hash][extname]",8      },9    },10  },11}

Combined with Cache-Control: public, max-age=31536000, immutable, browsers don’t even ask for revalidation during the freshness window. This pattern only works for assets referenced by another file (the HTML or the previous JS chunk needs to know the new URL). Documents at fixed URLs (/, /products/123) cannot use it and need explicit invalidation.

Path-based purge

The simplest active invalidation: tell the CDN to drop specific URLs.

1aws cloudfront create-invalidation \2  --distribution-id E1234567890AB \3  --paths "/products/123"45aws cloudfront create-invalidation \6  --distribution-id E1234567890AB \7  --paths "/products/*"

Limitations to plan around:

No relationship awareness. Purging /products/123 does nothing to /categories/electronics even if the listing renders that product.
Rate and cost. CloudFront charges $0.005 per path beyond 1,000 free paths/month per account; a wildcard counts as one path no matter how many objects it covers. Google Cloud CDN caps invalidations at 500/minute.
Propagation isn’t instant. See the next section.

Path purge is the right answer for emergency removals, simple sites with direct URL-to-content mapping, or as the fallback layer underneath a tag-based scheme.

Tag-based purge (surrogate keys)

When one entity (a product, a user, a feature flag) feeds many cached responses, surrogate keys let you invalidate by relationship. The origin tags responses; the CDN indexes entries by tag; one purge call fans out to every URL bound to the tag. The mechanism predates RFC-track standardization — the closest specification is the W3C Edge Architecture Specification 1.0 (2001), which defined Surrogate-Capability and Surrogate-Control for CDN-targeted directives. Each vendor layered its own tag header (Surrogate-Key, Cache-Tag, Edge-Cache-Tag) on top, and tag-based invalidation is now the workhorse of every serious CMS-on-CDN deployment.

Tag-based purge — the origin attaches surrogate keys, the CDN binds keys to URLs at insert time, and a single purge call invalidates every URL that shared the tag. — Tag-based purge: the origin attaches surrogate keys, the CDN binds keys to URLs at insert time, and one purge invalidates every URL that shared the tag.

The header conventions and limits differ enough across vendors that you should never assume portability:

CDN	Tag header	Per-key limit	Per-header limit	Notes
Fastly	`Surrogate-Key`	1,024 bytes	16,384 bytes	Native; instant purge ~150 ms global P50.⁷⁸
Akamai	`Edge-Cache-Tag`	128 chars	128 tags per object	Fast Purge API; 5,000 tag purges/hour, 10,000 objects/min per account.⁹
Cloudflare	`Cache-Tag`	1,024 chars (API)	16 KB total per response	Purge by tag is now available on all plans, not just Enterprise.¹⁰¹¹
Varnish	`xkey` VCL module	implementation-defined	implementation-defined	Operator-controlled secondary index.
CloudFront	none (no native support)	n/a	n/a	Workaround: maintain a tag→URL index in your app or a Lambda@Edge-fronted DynamoDB and purge paths.

A worked example. The origin emits:

1HTTP/1.1 200 OK2Content-Type: text/html; charset=utf-83Surrogate-Key: product-123 cat-electronics homepage4Surrogate-Control: max-age=864005Cache-Control: public, max-age=60, stale-while-revalidate=600

When the catalogue editor saves a price change, the application issues:

1curl -X POST "https://api.fastly.com/service/${SERVICE_ID}/purge/product-123" \2  -H "Fastly-Key: ${FASTLY_API_KEY}"

That single call invalidates /products/123, /categories/electronics, and /homepage — anywhere product-123 was attached. The trade-off is real: you pay for it with origin code that has to compute and emit the right tags on every response, and you constrain yourself to vendors that support the feature.

Soft purge vs hard purge

A purge can mean two materially different things:

Hard purge — the entry is dropped. The next request is a guaranteed cache MISS and pays the full origin RTT synchronously. This is the only behavior that satisfies “must not serve this content again” (legal takedowns, leaked secrets, broken responses).
Soft purge — the entry is marked stale but kept in cache, so it remains eligible for stale-while-revalidate and stale-if-error. The next request serves the stale body instantly while the CDN refetches in the background. Origin load step is bounded to one request per key per region, not N.

Soft vs hard purge — hard purge deletes the entry so the next request blocks on origin; soft purge keeps the entry, serves it stale, and refetches asynchronously. — Soft vs hard purge: hard purge deletes the entry and the next request blocks on origin; soft purge keeps the entry, serves it stale, and refetches asynchronously.

Vendor support for the soft variant differs:

CDN	Soft purge mechanism
Fastly	`Fastly-Soft-Purge: 1` request header on URL or surrogate-key purge.¹² Combine with `stale-while-revalidate` for SWR fan-out.
Akamai	Fast Purge `Invalidate` action marks objects stale (next request triggers conditional GET); `Delete` evicts immediately.⁹
Cloudflare	Purge always evicts; closest equivalent is `stale-while-revalidate` on the origin response (no soft-purge API).
CloudFront	Invalidations always evict; no native soft-purge primitive.

Tip

Default to soft purge for CMS edits, deploys, and content updates. Reserve hard purge for the cases where serving the old body even once would be wrong (PII, security, legal takedown). The user-visible latency difference on the first post-purge request is the entire point of the distinction.

Invalidation propagation timing

Purge is asynchronous. Plan for it.

CDN	Typical global propagation	Notes
Cloudflare	< 150 ms P50	”Instant Purge” via distributed coreless architecture.¹³
Fastly	~ 150 ms global	Sub-second for most regions; primarily bounded by speed-of-light to PoPs.⁸
AWS CloudFront	Typically < 2 min, up to 10–15 min globally	Per-edge propagation; varies with distribution size.¹⁴
Google Cloud CDN	~ 10 s per request, full propagation in minutes	500 invalidations/minute account-level rate limit.¹⁵
Akamai	Seconds to minutes via Fast Purge API	”Invalidate” (conditional GET) and “Delete” (force fetch) modes.⁹

Warning

Don’t deploy a “purge then immediately verify in CI” step that assumes the purge is global by the time the verification request lands. On AWS this can fail intermittently for tens of minutes. Either wait, or query a known-cold edge with a cache-busting query string for verification.

Deploy + purge race conditions

The classic incident is “deployed at T, purged at T+1, but a PoP that hadn’t seen the purge yet re-pulled the old origin between T and T+1 and re-cached it for the full TTL.” Three rules contain it:

Order matters. Make the new origin authoritative before issuing the purge — atomic-swap behind a load balancer, blue/green deploy, or content-addressed origin paths. A purge against an origin still serving the old body just refills the cache with stale.
Purge after deploy completes globally, not after the deploy step kicks off. CD systems that fire purges from the build runner before the rollout finishes ship this bug routinely.
For HTML that references content-addressed assets, purge the HTML key, not the assets. The asset URLs change with the deploy and never collide.

Stale-while-revalidate and stale-if-error

RFC 5861 adds two Cache-Control extensions that change the freshness-vs-availability calculus. They are the closest thing the cache layer has to a free lunch.

Stale-while-revalidate (SWR)

1Cache-Control: max-age=600, stale-while-revalidate=30

The lifecycle of a cached response under this directive:

0 – 600 s. Fresh. Served from cache.
600 – 630 s. Stale but inside the SWR window. The cache returns the stale response immediately and triggers an asynchronous revalidation in the background. The next request gets fresh content.
> 630 s. Truly stale. The cache must fetch synchronously before responding.

stale-while-revalidate window — the first request after TTL serves stale content instantly while triggering an async revalidation; subsequent requests get fresh content. — stale-while-revalidate window: the first request after TTL serves stale content instantly while triggering an async revalidation; subsequent requests get fresh content.

The point is to push revalidation latency off the user-visible path. It also de-fangs the thundering herd: the first concurrent request serves stale and fires one background fetch; the rest still see cache hits.

Browser support landed in Chrome 75+, Edge 79+, Firefox 68+, and Safari 14+.¹⁶ Browsers that don’t recognise the directive simply ignore it and fall back to max-age semantics. CDN support is mature on Cloudflare, Fastly, KeyCDN, and Varnish; CloudFront needs Lambda@Edge to implement true SWR. One subtle constraint: if no traffic arrives during the SWR window, the entry truly expires and the next request pays the full origin RTT — SWR is only a steady-state win for warm endpoints.

Stale-if-error (SIE)

1Cache-Control: max-age=600, stale-if-error=86400

Origin returned a 5xx or is unreachable? Serve the stale response for the SIE window instead of propagating the error. This trades freshness for availability, and it makes a bad deploy look like a slightly stale page rather than a 503 storm.

The combined production recipe:

1Cache-Control: max-age=300, stale-while-revalidate=60, stale-if-error=86400

Five minutes fresh; one minute of async-refresh window; one day of error-survival cushion. This single header turns a 30-minute origin outage into a non-event for read-heavy endpoints.

Varnish grace mode

Varnish predates RFC 5861 and exposes the same idea with finer control via VCL:

1sub vcl_backend_response {2    set beresp.ttl = 300s;3    set beresp.grace = 1h;4}56sub vcl_recv {7    if (std.healthy(req.backend_hint)) {8        set req.grace = 10s;9    } else {10        set req.grace = 24h;11    }12}

The interesting move is the split between beresp.grace (how long the cache retains a stale object after expiry) and req.grace (how stale a given request is willing to accept). When the backend health probe flips, you can extend the request-side grace from 10 s to 24 h on the fly without re-emitting any responses.

Operational guardrails

Cache hit ratio (CHR)

CHR is the primary health metric for a CDN. The formula is trivial:

Useful target bands, from operating sites at scale:

Static assets: > 95 %.
Mixed-content sites: > 85 %.
Investigate: anything below 80 % on a previously healthy endpoint.

A global average is a bad target because it hides the failure that hurts. Always segment by content type, region, and URL pattern. A 90 % global CHR with 50 % CHR on /api/* is a hot origin waiting to happen.

The recurring CHR killers, in order of how often they show up in real incidents:

High-cardinality Vary headers (Accept-Language, User-Agent) — the cache splits into thousands of variants per URL.
Tracking parameters in the cache key — ?utm_source=...&fbclid=... makes every share a unique key. Strip them with a query-string allow-list.
Aggressive Vary: Cookie — usually unintended, usually catastrophic; one session cookie effectively turns the cache off.
TTL too short for actual change rate — e.g. 60 s TTL on content that changes hourly.
Origin emitting Cache-Control: no-store unintentionally — common after a deploy of an auth middleware that “secures” everything.

Cache stampede (thundering herd)

When a popular cached entry expires, every concurrent request misses the cache and hits the origin in the same instant.

Cache stampede — concurrent requests after expiry all hit origin without coalescing; with coalescing, the CDN holds duplicates while one fetch returns and serves them all. — Cache stampede: without coalescing, concurrent requests after TTL expiry all hit origin; with coalescing, the CDN holds duplicates while one fetch returns and serves them all.

A 98 % CHR endpoint at 50k RPS sees a load step from ~1k RPS to 50k RPS the moment a hot key expires — a 50× origin spike. The mitigations stack:

Request coalescing at the edge. CloudFront does this natively: concurrent requests for the same key are paused while one fetch goes upstream, and the response is fanned back out. Fastly does the same by default. Cloudflare’s Tiered Cache adds upper-tier collapsing.
Stale-while-revalidate. First miss after expiry serves stale and fires one background refresh; subsequent requests still hit the cache. No stampede.
Probabilistic early expiration (XFetch). Refresh slightly before TTL with a probability that grows as expiry approaches. Vattani et al. show this is optimal under reasonable assumptions:¹⁷ for a key with recompute cost , expiry time , and a tunable , refresh whenever . The effect is to spread re-fetches across a window instead of bunching them at the boundary.
Origin Shield. A single mid-tier cache that funnels misses from many edges. Multi-region misses become one origin request.

Tip

Coalescing has one ugly footgun. If your origin response varies by cookie or other private dimension and you have request collapsing on, two users can end up sharing one origin response. CloudFront’s documented rule is: collapsing is disabled when cookie forwarding is enabled, and is otherwise opt-out only by setting Min TTL to 0 and having the origin emit Cache-Control: private, no-store, no-cache, max-age=0, or s-maxage=0. If you depend on per-user variants, verify the equivalent rule on your CDN before turning collapsing on.

Origin Shield architecture

Without a shield, every edge PoP that misses goes directly to origin:

1Edge NYC miss → Origin2Edge London miss → Origin3Edge Tokyo miss → Origin4= 3 origin requests

With a shield (a single regional cache between edges and origin):

1Edge NYC miss → Shield → Origin2Edge London miss → Shield (HIT)3Edge Tokyo miss → Shield (HIT)4= 1 origin request

When it earns its keep:

High traffic with mediocre CHR (< 95 %) — the absolute miss volume is what hurts.
Origins that cannot scale past a known ceiling (legacy systems, expensive databases).
Globally distributed traffic over many edge PoPs — the multiplier on coalescing is largest.

The cost is real: AWS charges 0.0160 per 10K requests at the shield depending on region. Justify it by measuring origin RPS reduction, not by intuition.

Graceful degradation patterns

Designing for origin failure is a full second axis on top of the freshness/cost trade-off. Three patterns that should be in every production playbook:

Extended stale-if-error — stale-if-error=172800 keeps cached content alive for two days during outages.
Static fallback at the edge — if origin returns 5xx, serve a baked-in static page from edge KV/storage. Cloudflare Workers, Lambda@Edge, and Fastly Compute can all do this.
Health-aware grace — Varnish’s pattern above: extend the stale-acceptance window when the backend probe flips unhealthy.
Edge circuit breaker — after N consecutive origin errors, stop sending traffic for a cooldown period and serve stale or a static fallback instead.

Edge compute use cases

Edge compute moves logic from origin (typically 100–300 ms RTT) to the edge PoP closest to the client (sub-ms). Used well, it shifts the personalisation boundary outward and lets you cache responses you couldn’t cache before. Used badly, it’s a slower, more expensive way to do work the origin was already doing.

Platform comparison

Platform	Runtime	Cold start	Per-invocation execution limit	Best for
CloudFront Functions	JavaScript (subset)	Effectively none (executes in request path)	< 1 ms CPU	Header rewrites, redirects, simple URL transforms
Lambda@Edge	Node.js, Python	Tens to hundreds of ms on cold path	5 s viewer / 30 s origin	Complex logic, async I/O, surrogate-key workarounds
Cloudflare Workers	V8 isolate	~5 ms isolate startup; often ~0 user-visible via TLS-handshake prewarm¹⁸	10 ms (free) / 30 ms+ (paid) CPU	Full edge applications, KV-backed APIs
Fastly Compute	WASM (Rust, Go, AssemblyScript, …)	~35 µs Wasmtime instance instantiation¹⁹	No hard cap; single-tenant per request	High-performance compute, structured data transforms

Platform	Per 1M invocations	Notes
CloudFront Functions	$0.10	First 2M invocations/month free.
Lambda@Edge	$0.60 + GB-seconds	No free tier; viewer-request and origin-request have different limits.
Cloudflare Workers	$0.30 (Standard)	$5/month minimum on the paid plan covers 10M requests; CPU billed separately.
Fastly Compute	Bundled with Compute@Edge plan	Pricing model is bundled, not per-invocation; consult your contract.

Personalisation without origin load

Classic personalisation needs an origin per request. Edge compute lets you cache the variants:

1function handler(event) {2  const request = event.request;3  const cookies = request.cookies;45  const segment = cookies.user_segment?.value ?? "default";67  request.headers["x-user-segment"] = { value: segment };8  return request;9}

Wire x-user-segment into the cache key policy. Three segments × 1,000 pages = 3,000 cached variants — all served from the edge, none from origin. The trade-off is variant explosion: if you also vary by language, device, and feature flag, segments multiply quickly. Cap your dimensions at three or four with explicit fallbacks.

A/B testing at the edge

1export default {2  async fetch(request) {3    const url = new URL(request.url);4    let variant = getCookie(request, "ab_variant");56    if (!variant) {7      variant = Math.random() < 0.5 ? "a" : "b";8    }910    url.pathname = `/${variant}${url.pathname}`;1112    const response = await fetch(url.toString(), request);13    const newResponse = new Response(response.body, response);14    if (!getCookie(request, "ab_variant")) {15      newResponse.headers.set("Set-Cookie", `ab_variant=${variant}; Path=/; Max-Age=86400`);16    }17    return newResponse;18  },19};

Why move A/B at the edge:

No layout shift — the variant decision is made before HTML is sent.
No JavaScript dependency on the client.
Consistent assignment across page loads via cookie persistence.
Works for users with JS disabled.

Geo-routing and compliance

Edge compute is the cleanest place to express data-locality rules:

1exports.handler = async (event) => {2  const request = event.Records[0].cf.request;3  const country = request.headers["cloudfront-viewer-country"][0].value;45  const euCountries = ["DE", "FR", "IT", "ES", "NL", "BE", "AT", "PL"];67  if (euCountries.includes(country)) {8    request.origin.custom.domainName = "eu-origin.example.com";9  } else {10    request.origin.custom.domainName = "us-origin.example.com";11  }1213  return request;14};

For compliance-driven routing (GDPR data residency, sectoral regulation) the contract is: the request never reaches an origin outside the allowed region. Verify this with synthetic probes from each region, not just code review.

Practical takeaways

Cache key controls correctness; TTL controls cost and freshness. Review them on different cadences and treat them as separate failure domains.
Climb the invalidation ladder. Versioned URLs first; SWR for documents and APIs; surrogate keys for content with relationships; path purges for emergencies; full clears never.
Default to Cache-Control: max-age=N, stale-while-revalidate=M, stale-if-error=K for HTML and JSON APIs. The combination hides revalidation latency, prevents stampedes, and survives origin outages.
Segment your CHR dashboards. Global numbers hide the next incident.
Buy request coalescing. Whether through native CDN behaviour, Origin Shield, Tiered Cache, or SWR — make sure a popular key expiring cannot send N requests to your origin.
Treat purge as eventually consistent. Build for it in deploy pipelines; verify against cold edges with cache-busting query strings.
Pick the cheapest edge runtime that fits the work. CloudFront Functions for header munging; Workers / Compute when you need real logic; Lambda@Edge when you need AWS APIs and you’ve measured the latency.

Appendix

Prerequisites

HTTP caching model (request/response headers, freshness, validation).
CDN concepts (edge PoPs, origin, cache keys).
Basic familiarity with distributed-systems failure modes.

Glossary

Cache key. Identifier used to look up a stored response — typically method + host + path + query plus selected headers.
TTL (time to live). Duration content is considered fresh.
CHR (cache hit ratio). Hits divided by total requests.
PoP (point of presence). Edge data centre where a CDN serves traffic.
Origin Shield. Centralised cache layer between edges and origin.
Surrogate key (cache tag). Tag attached to a response so groups of cached entries can be invalidated together.
Thundering herd / stampede. Concurrent miss storm at origin after a popular cache entry expires.
SWR. stale-while-revalidate — serve stale content while asynchronously refreshing.
SIE. stale-if-error — serve stale content when origin returns errors.

References

Specifications

RFC 9111 — HTTP Caching, STD 98 (June 2022).
RFC 9110 — HTTP Semantics.
RFC 5861 — HTTP Cache-Control extensions for stale content.
RFC 8246 — immutable Cache-Control extension.
RFC 9213 — Targeted HTTP Cache Control (CDN-Cache-Control), June 2022.
W3C Edge Architecture Specification 1.0 — origin of Surrogate-Control and Surrogate-Capability.

CDN provider documentation

Educational and primary-source practitioner

RFC 9111 — HTTP Caching, STD 98, June 2022. Obsoletes RFC 7234. ↩
Edge Architecture Specification 1.0, W3C Note, August 2001. The header set (Surrogate-Control, Surrogate-Capability) was authored by Mark Nottingham (Akamai) and Xiang Liu (Oracle); it is a W3C Note, not a W3C Recommendation. ↩
CDN-Cache-Control — Cloudflare docs; CDN-Cache-Control: precision control for your CDN(s), Cloudflare blog. ↩
Best practices for using the Vary header, Fastly engineering blog. ↩
Understanding the Vary header in the browser, Fastly engineering blog. ↩
Managing multi-regional and multilingual sites, Google Search Central. ↩
Surrogate-Key — Fastly HTTP headers. ↩
Fastly Instant Purge: under 150 ms for over a decade. ↩ ↩²
Purge content by cache tag — Akamai. ↩ ↩² ↩³
Cloudflare — Instant Purge for All, Sep 2024. Tag, prefix, and hostname purge are no longer Enterprise-only. ↩
Purge cache by cache-tags — Cloudflare. ↩
Soft purges — Fastly documentation; Introducing Soft Purge, Fastly blog. ↩
Instant Purge: invalidating cached content in under 150 ms, Cloudflare. ↩
Pay for file invalidation — Amazon CloudFront; see also Invalidating files. ↩
Cache invalidation overview — Google Cloud CDN. ↩
stale-while-revalidate browser support — Can I use; Keeping things fresh with stale-while-revalidate, web.dev. ↩
A. Vattani, F. Chierichetti, K. Lowenstein. Optimal Probabilistic Cache Stampede Prevention. VLDB 2015. ↩
How Workers works; Eliminating Cold Starts 2: shard and conquer. ↩
Performance matters: why Compute does not yet support JavaScript, Fastly. Note: 35 µs is hot-instance instantiation; cold-path latency is higher in practice. ↩