Skip to main content
On this page

Edge Delivery and Cache Invalidation

A modern CDN is two systems welded together: a globally distributed read-through cache governed by HTTP caching semantics, and a programmable purge plane that lets you reach into hundreds of points of presence to expire content on demand. Both halves have to be designed deliberately; defaults will give you a hot origin, fragmented cache, or stale pages on incident day. This article is the mental model and the failure-mode catalogue a senior engineer needs to design cache keys, pick TTLs, choose an invalidation strategy, and make edge compute pay its way.

CDN topology — clients hit edge PoPs; edge misses funnel through Origin Shield, which collapses concurrent fetches into a single origin request.
CDN topology: clients hit edge PoPs; edge misses funnel through Origin Shield, which collapses concurrent fetches into a single origin request.

Thesis

Cache delivery reduces to three coupled decisions: what to cache (cache key design), how long (TTL strategy), and how to invalidate (path, tag, version, or wait). Get the cache key wrong and users see other users’ content. Get the TTL wrong and you either hammer your origin or serve stale pages through an incident. Get invalidation wrong and you can’t ship.

Decision Optimises Sacrifices
Long TTL Origin load, latency Freshness; needs deliberate invalidation
Short TTL Freshness Hit ratio, origin load
Wide cache key (more Vary) Variant correctness Fragmentation; lower hit ratio
Versioned / fingerprinted URLs Eliminates invalidation URL management at build / deploy time
Surrogate-key (tag) purge Granular invalidation Application + CDN tooling
Path purge Simplicity No relationship awareness; coarse
Stale-while-revalidate User-visible latency on refresh Brief staleness; only useful with steady traffic

Two non-negotiables that keep coming back through the rest of the article:

  1. The cache key controls correctness; the TTL controls cost and freshness. They fail in different ways and need separate review.
  2. Prefer the highest invalidation method on this ladder you can reach: versioned URLs → stale-while-revalidate → surrogate-key purge → path purge → full clear.

Mental model

A request fans out through three caches before it reaches origin: the browser (private cache, governed by Cache-Control), the edge PoP (shared cache geographically nearest to the client), and optionally an origin shield (a single regional cache layer that collapses misses from many edges into one origin request). RFC 9111 calls these “private” and “shared” caches and uses the directive surface — private vs. s-maxage, in particular — to make the distinction enforceable.1

Each cache layer answers two questions on every request: do I have a cached response under this key? and is it still fresh? The cache key is built from the request (default: method + host + path + query string, plus whatever the cache configuration adds via Vary or explicit policy). Freshness comes from the max-age / s-maxage directives, the response date, and the various stale-* extensions. Everything else — purges, surrogate keys, edge compute — is a way to bend those two answers without rewriting your application.

Cache fundamentals

Cache-Control directives that actually matter at the edge

Directive Target Behaviour When to reach for it
max-age=N All caches Response is fresh for N seconds The default knob; everything else modifies it.
s-maxage=N Shared caches Overrides max-age for CDNs and proxies When the CDN should cache longer than the browser does.
no-cache All caches Cache, but revalidate before serving Documents you must serve fresh on every navigation.
no-store All caches Never store PII, auth-bearing responses you cannot risk leaking.
private Browser only Excluded from shared caches Per-user content that may still be browser-cached.
public All caches Cacheable even with Authorization Explicit override for authenticated-but-shareable responses.
must-revalidate All caches Cannot serve stale once expired Strict-freshness requirements; e.g. financial pages.
immutable All caches Will not change during the freshness window Fingerprinted assets — skips conditional revalidation entirely.

Caution

no-cache does not mean “do not cache”. It means “cache, but revalidate before each use” (RFC 9111 §5.2.2.4). Use no-store if you need to forbid caching entirely. Mixing the two up has been the root cause of more than one credential leak.

Two header recipes cover most production responses:

Fingerprinted asset (1 year, no revalidation)
Cache-Control: public, max-age=31536000, immutable
HTML document (always revalidate)
Cache-Control: no-cache, must-revalidate

Targeted cache control: CDN-Cache-Control and Surrogate-Control

The standard Cache-Control header is shared by the browser and every intermediary, which makes it awkward when you want different semantics at the edge versus the client. Two header families exist for that split:

  • Surrogate-Control — defined in the W3C Edge Architecture Specification 1.0 (2001) by Akamai and Oracle. Targets only “surrogates” (CDN nodes); compliant surrogates strip the header before forwarding to the client.2 Honored by Fastly and parts of Akamai; not by Cloudflare or CloudFront.
  • CDN-Cache-Control — standardized as RFC 9213 (Targeted HTTP Cache Control, June 2022). Same directive grammar as Cache-Control; explicitly addresses the CDN tier and is co-supported by Cloudflare,3 Vercel, and other modern CDNs. Use this for greenfield work — Surrogate-Control is the older, vendor-fragmented sibling.

Cache key design

The cache key uniquely identifies a stored response. Most CDNs default to method + host + path + query string. Anything else that affects the response — the Accept-Language, the device class, the user segment — has to be in the key, either explicitly via the cache policy or implicitly via the Vary response header.

Important

The cache-key correctness rule: if a request header changes the response body, that header has to participate in the cache key. Skip this and the cache will serve user A’s response to user B; the bug is silent until someone notices.

A poorly designed key fails in two directions:

  • Too narrow → the cache returns the wrong variant. (Auth user sees anon shell; English user sees German content.)
  • Too wide → the cache fragments. Hit ratio collapses, origin load climbs, and you debug a cost spike instead of a correctness bug.

The textbook fragmentation case is Vary: Accept-Language. Browsers send raw locale strings (en-US, en-US,en;q=0.9, en-GB,en;q=0.8,fr;q=0.5) and the cardinality is effectively unbounded — Fastly’s data shows this header alone can produce thousands of variants per URL.4 Worse, browsers don’t actually store multiple variants per URL the way intermediaries do; they treat Vary as a validator and refetch on mismatch, so the win you wanted at the edge often doesn’t materialise in the browser at all.5

Three working strategies, in rough order of preference:

  1. Normalise at the edge. Collapse Accept-Language to a closed set (en, de, fr, fallback) in a request-side edge function or VCL block, then Vary on the normalised value.
  2. Encode the variant in the URL. /en/products, /de/products. Google explicitly recommends locale-specific URLs over Accept-Language for international SEO,6 and it makes the cache key obvious.
  3. Move the dimension to a custom header. X-Device-Class: mobile|tablet|desktop, populated by a device-detection layer, then Vary: X-Device-Class.
Do Don’t
Include only headers that change the response body Vary: User-Agent — high-cardinality, effectively disables caching
Normalise high-cardinality headers before they hit cache Pass raw locale, UA, or cookie blobs into the key
Whitelist allowed query parameters Include the entire query string verbatim — tracking IDs fragment cache
Use URL path variants for stable dimensions Lean on Vary for anything with more than ~10 distinct values

TTL strategies by content type

Pick TTL by content volatility and staleness tolerance, not by gut feel:

Content type Recommended TTL Cache-Control example
Fingerprinted JS/CSS/images (main.abc123.js) 1 year max-age=31536000, immutable
Static images, no fingerprint Hours to days max-age=86400
HTML documents Revalidate or short TTL no-cache or max-age=60, stale-while-revalidate=600
API responses, read-heavy Minutes s-maxage=300, stale-while-revalidate=60
User-specific responses Don’t share-cache private, no-store
Real-time data Don’t cache no-store

The interesting failure mode is the HTML-with-fingerprinted-assets trap. The HTML references assets by URL; if the HTML is cached for an hour and you deploy new CSS, users get yesterday’s HTML pointing at yesterday’s CSS path until the HTML cache turns over. Three reliable workarounds: serve HTML with no-cache, use a short max-age paired with stale-while-revalidate, or purge the HTML key on every deploy. Pick one and write it down — this is one of the most common “site looks broken after deploy” classes.

Cache invalidation strategies

Note

The mental shortcut: prefer mechanisms that don’t require invalidation, then mechanisms that hide invalidation latency, then mechanisms that purge precisely. Path purges and full clears are the bottom of the ladder.

Decision tree — pick an invalidation strategy by whether the URL is fingerprinted, how many entries depend on the change, your staleness tolerance, and CDN feature support.
Decision tree: pick an invalidation strategy by URL shape, dependency fan-out, staleness tolerance, and CDN feature support.

Versioned URLs — avoid invalidation

The cheapest invalidation is the one you never issue. Content-addressed URLs (main.abc123.js) make a new version a cache miss by definition; old versions stay cached until they’re evicted under memory pressure. The lifecycle is mechanical:

Content-addressed asset lifecycle — hashing the source produces a new immutable URL on every deploy; new HTML references the new URL, cache MISS pulls and fills it, and old hashes age out under LRU.
Content-addressed asset lifecycle: hashing produces a new immutable URL each deploy; new HTML references the new URL, the cache fills on first MISS, and old hashes age out under LRU pressure.

vite.config.js
export default {  build: {    rollupOptions: {      output: {        entryFileNames: "[name].[hash].js",        chunkFileNames: "[name].[hash].js",        assetFileNames: "[name].[hash][extname]",      },    },  },}

Combined with Cache-Control: public, max-age=31536000, immutable, browsers don’t even ask for revalidation during the freshness window. This pattern only works for assets referenced by another file (the HTML or the previous JS chunk needs to know the new URL). Documents at fixed URLs (/, /products/123) cannot use it and need explicit invalidation.

Path-based purge

The simplest active invalidation: tell the CDN to drop specific URLs.

path purge — single URL and wildcard
aws cloudfront create-invalidation \  --distribution-id E1234567890AB \  --paths "/products/123"aws cloudfront create-invalidation \  --distribution-id E1234567890AB \  --paths "/products/*"

Limitations to plan around:

  • No relationship awareness. Purging /products/123 does nothing to /categories/electronics even if the listing renders that product.
  • Rate and cost. CloudFront charges $0.005 per path beyond 1,000 free paths/month per account; a wildcard counts as one path no matter how many objects it covers. Google Cloud CDN caps invalidations at 500/minute.
  • Propagation isn’t instant. See the next section.

Path purge is the right answer for emergency removals, simple sites with direct URL-to-content mapping, or as the fallback layer underneath a tag-based scheme.

Tag-based purge (surrogate keys)

When one entity (a product, a user, a feature flag) feeds many cached responses, surrogate keys let you invalidate by relationship. The origin tags responses; the CDN indexes entries by tag; one purge call fans out to every URL bound to the tag. The mechanism predates RFC-track standardization — the closest specification is the W3C Edge Architecture Specification 1.0 (2001), which defined Surrogate-Capability and Surrogate-Control for CDN-targeted directives. Each vendor layered its own tag header (Surrogate-Key, Cache-Tag, Edge-Cache-Tag) on top, and tag-based invalidation is now the workhorse of every serious CMS-on-CDN deployment.

Tag-based purge — the origin attaches surrogate keys, the CDN binds keys to URLs at insert time, and a single purge call invalidates every URL that shared the tag.
Tag-based purge: the origin attaches surrogate keys, the CDN binds keys to URLs at insert time, and one purge invalidates every URL that shared the tag.

The header conventions and limits differ enough across vendors that you should never assume portability:

CDN Tag header Per-key limit Per-header limit Notes
Fastly Surrogate-Key 1,024 bytes 16,384 bytes Native; instant purge ~150 ms global P50.78
Akamai Edge-Cache-Tag 128 chars 128 tags per object Fast Purge API; 5,000 tag purges/hour, 10,000 objects/min per account.9
Cloudflare Cache-Tag 1,024 chars (API) 16 KB total per response Purge by tag is now available on all plans, not just Enterprise.1011
Varnish xkey VCL module implementation-defined implementation-defined Operator-controlled secondary index.
CloudFront none (no native support) n/a n/a Workaround: maintain a tag→URL index in your app or a Lambda@Edge-fronted DynamoDB and purge paths.

A worked example. The origin emits:

response from /products/123
HTTP/1.1 200 OKContent-Type: text/html; charset=utf-8Surrogate-Key: product-123 cat-electronics homepageSurrogate-Control: max-age=86400Cache-Control: public, max-age=60, stale-while-revalidate=600

When the catalogue editor saves a price change, the application issues:

purge by surrogate key (Fastly)
curl -X POST "https://api.fastly.com/service/${SERVICE_ID}/purge/product-123" \  -H "Fastly-Key: ${FASTLY_API_KEY}"

That single call invalidates /products/123, /categories/electronics, and /homepage — anywhere product-123 was attached. The trade-off is real: you pay for it with origin code that has to compute and emit the right tags on every response, and you constrain yourself to vendors that support the feature.

Soft purge vs hard purge

A purge can mean two materially different things:

  • Hard purge — the entry is dropped. The next request is a guaranteed cache MISS and pays the full origin RTT synchronously. This is the only behavior that satisfies “must not serve this content again” (legal takedowns, leaked secrets, broken responses).
  • Soft purge — the entry is marked stale but kept in cache, so it remains eligible for stale-while-revalidate and stale-if-error. The next request serves the stale body instantly while the CDN refetches in the background. Origin load step is bounded to one request per key per region, not N.

Soft vs hard purge — hard purge deletes the entry so the next request blocks on origin; soft purge keeps the entry, serves it stale, and refetches asynchronously.
Soft vs hard purge: hard purge deletes the entry and the next request blocks on origin; soft purge keeps the entry, serves it stale, and refetches asynchronously.

Vendor support for the soft variant differs:

CDN Soft purge mechanism
Fastly Fastly-Soft-Purge: 1 request header on URL or surrogate-key purge.12 Combine with stale-while-revalidate for SWR fan-out.
Akamai Fast Purge Invalidate action marks objects stale (next request triggers conditional GET); Delete evicts immediately.9
Cloudflare Purge always evicts; closest equivalent is stale-while-revalidate on the origin response (no soft-purge API).
CloudFront Invalidations always evict; no native soft-purge primitive.

Tip

Default to soft purge for CMS edits, deploys, and content updates. Reserve hard purge for the cases where serving the old body even once would be wrong (PII, security, legal takedown). The user-visible latency difference on the first post-purge request is the entire point of the distinction.

Invalidation propagation timing

Purge is asynchronous. Plan for it.

CDN Typical global propagation Notes
Cloudflare < 150 ms P50 ”Instant Purge” via distributed coreless architecture.13
Fastly ~ 150 ms global Sub-second for most regions; primarily bounded by speed-of-light to PoPs.8
AWS CloudFront Typically < 2 min, up to 10–15 min globally Per-edge propagation; varies with distribution size.14
Google Cloud CDN ~ 10 s per request, full propagation in minutes 500 invalidations/minute account-level rate limit.15
Akamai Seconds to minutes via Fast Purge API ”Invalidate” (conditional GET) and “Delete” (force fetch) modes.9

Warning

Don’t deploy a “purge then immediately verify in CI” step that assumes the purge is global by the time the verification request lands. On AWS this can fail intermittently for tens of minutes. Either wait, or query a known-cold edge with a cache-busting query string for verification.

Deploy + purge race conditions

The classic incident is “deployed at T, purged at T+1, but a PoP that hadn’t seen the purge yet re-pulled the old origin between T and T+1 and re-cached it for the full TTL.” Three rules contain it:

  1. Order matters. Make the new origin authoritative before issuing the purge — atomic-swap behind a load balancer, blue/green deploy, or content-addressed origin paths. A purge against an origin still serving the old body just refills the cache with stale.
  2. Purge after deploy completes globally, not after the deploy step kicks off. CD systems that fire purges from the build runner before the rollout finishes ship this bug routinely.
  3. For HTML that references content-addressed assets, purge the HTML key, not the assets. The asset URLs change with the deploy and never collide.

Stale-while-revalidate and stale-if-error

RFC 5861 adds two Cache-Control extensions that change the freshness-vs-availability calculus. They are the closest thing the cache layer has to a free lunch.

Stale-while-revalidate (SWR)

Http
Cache-Control: max-age=600, stale-while-revalidate=30

The lifecycle of a cached response under this directive:

  1. 0 – 600 s. Fresh. Served from cache.
  2. 600 – 630 s. Stale but inside the SWR window. The cache returns the stale response immediately and triggers an asynchronous revalidation in the background. The next request gets fresh content.
  3. > 630 s. Truly stale. The cache must fetch synchronously before responding.

stale-while-revalidate window — the first request after TTL serves stale content instantly while triggering an async revalidation; subsequent requests get fresh content.
stale-while-revalidate window: the first request after TTL serves stale content instantly while triggering an async revalidation; subsequent requests get fresh content.

The point is to push revalidation latency off the user-visible path. It also de-fangs the thundering herd: the first concurrent request serves stale and fires one background fetch; the rest still see cache hits.

Browser support landed in Chrome 75+, Edge 79+, Firefox 68+, and Safari 14+.16 Browsers that don’t recognise the directive simply ignore it and fall back to max-age semantics. CDN support is mature on Cloudflare, Fastly, KeyCDN, and Varnish; CloudFront needs Lambda@Edge to implement true SWR. One subtle constraint: if no traffic arrives during the SWR window, the entry truly expires and the next request pays the full origin RTT — SWR is only a steady-state win for warm endpoints.

Stale-if-error (SIE)

Http
Cache-Control: max-age=600, stale-if-error=86400

Origin returned a 5xx or is unreachable? Serve the stale response for the SIE window instead of propagating the error. This trades freshness for availability, and it makes a bad deploy look like a slightly stale page rather than a 503 storm.

The combined production recipe:

Http
Cache-Control: max-age=300, stale-while-revalidate=60, stale-if-error=86400

Five minutes fresh; one minute of async-refresh window; one day of error-survival cushion. This single header turns a 30-minute origin outage into a non-event for read-heavy endpoints.

Varnish grace mode

Varnish predates RFC 5861 and exposes the same idea with finer control via VCL:

grace.vcl
sub vcl_backend_response {    set beresp.ttl = 300s;    set beresp.grace = 1h;}sub vcl_recv {    if (std.healthy(req.backend_hint)) {        set req.grace = 10s;    } else {        set req.grace = 24h;    }}

The interesting move is the split between beresp.grace (how long the cache retains a stale object after expiry) and req.grace (how stale a given request is willing to accept). When the backend health probe flips, you can extend the request-side grace from 10 s to 24 h on the fly without re-emitting any responses.

Operational guardrails

Cache hit ratio (CHR)

CHR is the primary health metric for a CDN. The formula is trivial:

Useful target bands, from operating sites at scale:

  • Static assets: > 95 %.
  • Mixed-content sites: > 85 %.
  • Investigate: anything below 80 % on a previously healthy endpoint.

A global average is a bad target because it hides the failure that hurts. Always segment by content type, region, and URL pattern. A 90 % global CHR with 50 % CHR on /api/* is a hot origin waiting to happen.

The recurring CHR killers, in order of how often they show up in real incidents:

  1. High-cardinality Vary headers (Accept-Language, User-Agent) — the cache splits into thousands of variants per URL.
  2. Tracking parameters in the cache key?utm_source=...&fbclid=... makes every share a unique key. Strip them with a query-string allow-list.
  3. Aggressive Vary: Cookie — usually unintended, usually catastrophic; one session cookie effectively turns the cache off.
  4. TTL too short for actual change rate — e.g. 60 s TTL on content that changes hourly.
  5. Origin emitting Cache-Control: no-store unintentionally — common after a deploy of an auth middleware that “secures” everything.

Cache stampede (thundering herd)

When a popular cached entry expires, every concurrent request misses the cache and hits the origin in the same instant.

Cache stampede — concurrent requests after expiry all hit origin without coalescing; with coalescing, the CDN holds duplicates while one fetch returns and serves them all.
Cache stampede: without coalescing, concurrent requests after TTL expiry all hit origin; with coalescing, the CDN holds duplicates while one fetch returns and serves them all.

A 98 % CHR endpoint at 50k RPS sees a load step from ~1k RPS to 50k RPS the moment a hot key expires — a 50× origin spike. The mitigations stack:

  1. Request coalescing at the edge. CloudFront does this natively: concurrent requests for the same key are paused while one fetch goes upstream, and the response is fanned back out. Fastly does the same by default. Cloudflare’s Tiered Cache adds upper-tier collapsing.
  2. Stale-while-revalidate. First miss after expiry serves stale and fires one background refresh; subsequent requests still hit the cache. No stampede.
  3. Probabilistic early expiration (XFetch). Refresh slightly before TTL with a probability that grows as expiry approaches. Vattani et al. show this is optimal under reasonable assumptions:17 for a key with recompute cost , expiry time , and a tunable , refresh whenever . The effect is to spread re-fetches across a window instead of bunching them at the boundary.
  4. Origin Shield. A single mid-tier cache that funnels misses from many edges. Multi-region misses become one origin request.

Tip

Coalescing has one ugly footgun. If your origin response varies by cookie or other private dimension and you have request collapsing on, two users can end up sharing one origin response. CloudFront’s documented rule is: collapsing is disabled when cookie forwarding is enabled, and is otherwise opt-out only by setting Min TTL to 0 and having the origin emit Cache-Control: private, no-store, no-cache, max-age=0, or s-maxage=0. If you depend on per-user variants, verify the equivalent rule on your CDN before turning collapsing on.

Origin Shield architecture

Without a shield, every edge PoP that misses goes directly to origin:

Text
Edge NYC miss → OriginEdge London miss → OriginEdge Tokyo miss → Origin= 3 origin requests

With a shield (a single regional cache between edges and origin):

Text
Edge NYC miss → Shield → OriginEdge London miss → Shield (HIT)Edge Tokyo miss → Shield (HIT)= 1 origin request

When it earns its keep:

  • High traffic with mediocre CHR (< 95 %) — the absolute miss volume is what hurts.
  • Origins that cannot scale past a known ceiling (legacy systems, expensive databases).
  • Globally distributed traffic over many edge PoPs — the multiplier on coalescing is largest.

The cost is real: AWS charges 0.0160 per 10K requests at the shield depending on region. Justify it by measuring origin RPS reduction, not by intuition.

Graceful degradation patterns

Designing for origin failure is a full second axis on top of the freshness/cost trade-off. Three patterns that should be in every production playbook:

  1. Extended stale-if-errorstale-if-error=172800 keeps cached content alive for two days during outages.
  2. Static fallback at the edge — if origin returns 5xx, serve a baked-in static page from edge KV/storage. Cloudflare Workers, Lambda@Edge, and Fastly Compute can all do this.
  3. Health-aware grace — Varnish’s pattern above: extend the stale-acceptance window when the backend probe flips unhealthy.
  4. Edge circuit breaker — after N consecutive origin errors, stop sending traffic for a cooldown period and serve stale or a static fallback instead.

Edge compute use cases

Edge compute moves logic from origin (typically 100–300 ms RTT) to the edge PoP closest to the client (sub-ms). Used well, it shifts the personalisation boundary outward and lets you cache responses you couldn’t cache before. Used badly, it’s a slower, more expensive way to do work the origin was already doing.

Platform comparison

Platform Runtime Cold start Per-invocation execution limit Best for
CloudFront Functions JavaScript (subset) Effectively none (executes in request path) < 1 ms CPU Header rewrites, redirects, simple URL transforms
Lambda@Edge Node.js, Python Tens to hundreds of ms on cold path 5 s viewer / 30 s origin Complex logic, async I/O, surrogate-key workarounds
Cloudflare Workers V8 isolate ~5 ms isolate startup; often ~0 user-visible via TLS-handshake prewarm18 10 ms (free) / 30 ms+ (paid) CPU Full edge applications, KV-backed APIs
Fastly Compute WASM (Rust, Go, AssemblyScript, …) ~35 µs Wasmtime instance instantiation19 No hard cap; single-tenant per request High-performance compute, structured data transforms
Platform Per 1M invocations Notes
CloudFront Functions $0.10 First 2M invocations/month free.
Lambda@Edge $0.60 + GB-seconds No free tier; viewer-request and origin-request have different limits.
Cloudflare Workers $0.30 (Standard) $5/month minimum on the paid plan covers 10M requests; CPU billed separately.
Fastly Compute Bundled with Compute@Edge plan Pricing model is bundled, not per-invocation; consult your contract.

Personalisation without origin load

Classic personalisation needs an origin per request. Edge compute lets you cache the variants:

personalisation.js — CloudFront Functions
function handler(event) {  const request = event.request;  const cookies = request.cookies;  const segment = cookies.user_segment?.value ?? "default";  request.headers["x-user-segment"] = { value: segment };  return request;}

Wire x-user-segment into the cache key policy. Three segments × 1,000 pages = 3,000 cached variants — all served from the edge, none from origin. The trade-off is variant explosion: if you also vary by language, device, and feature flag, segments multiply quickly. Cap your dimensions at three or four with explicit fallbacks.

A/B testing at the edge

ab-testing.js — Cloudflare Workers
export default {  async fetch(request) {    const url = new URL(request.url);    let variant = getCookie(request, "ab_variant");    if (!variant) {      variant = Math.random() < 0.5 ? "a" : "b";    }    url.pathname = `/${variant}${url.pathname}`;    const response = await fetch(url.toString(), request);    const newResponse = new Response(response.body, response);    if (!getCookie(request, "ab_variant")) {      newResponse.headers.set("Set-Cookie", `ab_variant=${variant}; Path=/; Max-Age=86400`);    }    return newResponse;  },};

Why move A/B at the edge:

  • No layout shift — the variant decision is made before HTML is sent.
  • No JavaScript dependency on the client.
  • Consistent assignment across page loads via cookie persistence.
  • Works for users with JS disabled.

Geo-routing and compliance

Edge compute is the cleanest place to express data-locality rules:

geo-routing.js — Lambda@Edge
exports.handler = async (event) => {  const request = event.Records[0].cf.request;  const country = request.headers["cloudfront-viewer-country"][0].value;  const euCountries = ["DE", "FR", "IT", "ES", "NL", "BE", "AT", "PL"];  if (euCountries.includes(country)) {    request.origin.custom.domainName = "eu-origin.example.com";  } else {    request.origin.custom.domainName = "us-origin.example.com";  }  return request;};

For compliance-driven routing (GDPR data residency, sectoral regulation) the contract is: the request never reaches an origin outside the allowed region. Verify this with synthetic probes from each region, not just code review.

Practical takeaways

  • Cache key controls correctness; TTL controls cost and freshness. Review them on different cadences and treat them as separate failure domains.
  • Climb the invalidation ladder. Versioned URLs first; SWR for documents and APIs; surrogate keys for content with relationships; path purges for emergencies; full clears never.
  • Default to Cache-Control: max-age=N, stale-while-revalidate=M, stale-if-error=K for HTML and JSON APIs. The combination hides revalidation latency, prevents stampedes, and survives origin outages.
  • Segment your CHR dashboards. Global numbers hide the next incident.
  • Buy request coalescing. Whether through native CDN behaviour, Origin Shield, Tiered Cache, or SWR — make sure a popular key expiring cannot send N requests to your origin.
  • Treat purge as eventually consistent. Build for it in deploy pipelines; verify against cold edges with cache-busting query strings.
  • Pick the cheapest edge runtime that fits the work. CloudFront Functions for header munging; Workers / Compute when you need real logic; Lambda@Edge when you need AWS APIs and you’ve measured the latency.

Appendix

Prerequisites

  • HTTP caching model (request/response headers, freshness, validation).
  • CDN concepts (edge PoPs, origin, cache keys).
  • Basic familiarity with distributed-systems failure modes.

Glossary

  • Cache key. Identifier used to look up a stored response — typically method + host + path + query plus selected headers.
  • TTL (time to live). Duration content is considered fresh.
  • CHR (cache hit ratio). Hits divided by total requests.
  • PoP (point of presence). Edge data centre where a CDN serves traffic.
  • Origin Shield. Centralised cache layer between edges and origin.
  • Surrogate key (cache tag). Tag attached to a response so groups of cached entries can be invalidated together.
  • Thundering herd / stampede. Concurrent miss storm at origin after a popular cache entry expires.
  • SWR. stale-while-revalidate — serve stale content while asynchronously refreshing.
  • SIE. stale-if-error — serve stale content when origin returns errors.

References

Specifications

CDN provider documentation

Educational and primary-source practitioner

Footnotes

  1. RFC 9111 — HTTP Caching, STD 98, June 2022. Obsoletes RFC 7234.

  2. Edge Architecture Specification 1.0, W3C Note, August 2001. The header set (Surrogate-Control, Surrogate-Capability) was authored by Mark Nottingham (Akamai) and Xiang Liu (Oracle); it is a W3C Note, not a W3C Recommendation.

  3. CDN-Cache-Control — Cloudflare docs; CDN-Cache-Control: precision control for your CDN(s), Cloudflare blog.

  4. Best practices for using the Vary header, Fastly engineering blog.

  5. Understanding the Vary header in the browser, Fastly engineering blog.

  6. Managing multi-regional and multilingual sites, Google Search Central.

  7. Surrogate-Key — Fastly HTTP headers.

  8. Fastly Instant Purge: under 150 ms for over a decade. 2

  9. Purge content by cache tag — Akamai. 2 3

  10. Cloudflare — Instant Purge for All, Sep 2024. Tag, prefix, and hostname purge are no longer Enterprise-only.

  11. Purge cache by cache-tags — Cloudflare.

  12. Soft purges — Fastly documentation; Introducing Soft Purge, Fastly blog.

  13. Instant Purge: invalidating cached content in under 150 ms, Cloudflare.

  14. Pay for file invalidation — Amazon CloudFront; see also Invalidating files.

  15. Cache invalidation overview — Google Cloud CDN.

  16. stale-while-revalidate browser support — Can I use; Keeping things fresh with stale-while-revalidate, web.dev.

  17. A. Vattani, F. Chierichetti, K. Lowenstein. Optimal Probabilistic Cache Stampede Prevention. VLDB 2015.

  18. How Workers works; Eliminating Cold Starts 2: shard and conquer.

  19. Performance matters: why Compute does not yet support JavaScript, Fastly. Note: 35 µs is hot-instance instantiation; cold-path latency is higher in practice.