Edge Delivery and Cache Invalidation
A modern CDN is two systems welded together: a globally distributed read-through cache governed by HTTP caching semantics, and a programmable purge plane that lets you reach into hundreds of points of presence to expire content on demand. Both halves have to be designed deliberately; defaults will give you a hot origin, fragmented cache, or stale pages on incident day. This article is the mental model and the failure-mode catalogue a senior engineer needs to design cache keys, pick TTLs, choose an invalidation strategy, and make edge compute pay its way.
Thesis
Cache delivery reduces to three coupled decisions: what to cache (cache key design), how long (TTL strategy), and how to invalidate (path, tag, version, or wait). Get the cache key wrong and users see other users’ content. Get the TTL wrong and you either hammer your origin or serve stale pages through an incident. Get invalidation wrong and you can’t ship.
| Decision | Optimises | Sacrifices |
|---|---|---|
| Long TTL | Origin load, latency | Freshness; needs deliberate invalidation |
| Short TTL | Freshness | Hit ratio, origin load |
Wide cache key (more Vary) |
Variant correctness | Fragmentation; lower hit ratio |
| Versioned / fingerprinted URLs | Eliminates invalidation | URL management at build / deploy time |
| Surrogate-key (tag) purge | Granular invalidation | Application + CDN tooling |
| Path purge | Simplicity | No relationship awareness; coarse |
| Stale-while-revalidate | User-visible latency on refresh | Brief staleness; only useful with steady traffic |
Two non-negotiables that keep coming back through the rest of the article:
- The cache key controls correctness; the TTL controls cost and freshness. They fail in different ways and need separate review.
- Prefer the highest invalidation method on this ladder you can reach: versioned URLs →
stale-while-revalidate→ surrogate-key purge → path purge → full clear.
Mental model
A request fans out through three caches before it reaches origin: the browser (private cache, governed by Cache-Control), the edge PoP (shared cache geographically nearest to the client), and optionally an origin shield (a single regional cache layer that collapses misses from many edges into one origin request). RFC 9111 calls these “private” and “shared” caches and uses the directive surface — private vs. s-maxage, in particular — to make the distinction enforceable.1
Each cache layer answers two questions on every request: do I have a cached response under this key? and is it still fresh? The cache key is built from the request (default: method + host + path + query string, plus whatever the cache configuration adds via Vary or explicit policy). Freshness comes from the max-age / s-maxage directives, the response date, and the various stale-* extensions. Everything else — purges, surrogate keys, edge compute — is a way to bend those two answers without rewriting your application.
Cache fundamentals
Cache-Control directives that actually matter at the edge
| Directive | Target | Behaviour | When to reach for it |
|---|---|---|---|
max-age=N |
All caches | Response is fresh for N seconds | The default knob; everything else modifies it. |
s-maxage=N |
Shared caches | Overrides max-age for CDNs and proxies |
When the CDN should cache longer than the browser does. |
no-cache |
All caches | Cache, but revalidate before serving | Documents you must serve fresh on every navigation. |
no-store |
All caches | Never store | PII, auth-bearing responses you cannot risk leaking. |
private |
Browser only | Excluded from shared caches | Per-user content that may still be browser-cached. |
public |
All caches | Cacheable even with Authorization |
Explicit override for authenticated-but-shareable responses. |
must-revalidate |
All caches | Cannot serve stale once expired | Strict-freshness requirements; e.g. financial pages. |
immutable |
All caches | Will not change during the freshness window | Fingerprinted assets — skips conditional revalidation entirely. |
Caution
no-cache does not mean “do not cache”. It means “cache, but revalidate before each use” (RFC 9111 §5.2.2.4). Use no-store if you need to forbid caching entirely. Mixing the two up has been the root cause of more than one credential leak.
Two header recipes cover most production responses:
Cache-Control: public, max-age=31536000, immutableCache-Control: no-cache, must-revalidateTargeted cache control: CDN-Cache-Control and Surrogate-Control
The standard Cache-Control header is shared by the browser and every intermediary, which makes it awkward when you want different semantics at the edge versus the client. Two header families exist for that split:
Surrogate-Control— defined in the W3C Edge Architecture Specification 1.0 (2001) by Akamai and Oracle. Targets only “surrogates” (CDN nodes); compliant surrogates strip the header before forwarding to the client.2 Honored by Fastly and parts of Akamai; not by Cloudflare or CloudFront.CDN-Cache-Control— standardized as RFC 9213 (Targeted HTTP Cache Control, June 2022). Same directive grammar asCache-Control; explicitly addresses the CDN tier and is co-supported by Cloudflare,3 Vercel, and other modern CDNs. Use this for greenfield work —Surrogate-Controlis the older, vendor-fragmented sibling.
Cache key design
The cache key uniquely identifies a stored response. Most CDNs default to method + host + path + query string. Anything else that affects the response — the Accept-Language, the device class, the user segment — has to be in the key, either explicitly via the cache policy or implicitly via the Vary response header.
Important
The cache-key correctness rule: if a request header changes the response body, that header has to participate in the cache key. Skip this and the cache will serve user A’s response to user B; the bug is silent until someone notices.
A poorly designed key fails in two directions:
- Too narrow → the cache returns the wrong variant. (Auth user sees anon shell; English user sees German content.)
- Too wide → the cache fragments. Hit ratio collapses, origin load climbs, and you debug a cost spike instead of a correctness bug.
The textbook fragmentation case is Vary: Accept-Language. Browsers send raw locale strings (en-US, en-US,en;q=0.9, en-GB,en;q=0.8,fr;q=0.5) and the cardinality is effectively unbounded — Fastly’s data shows this header alone can produce thousands of variants per URL.4 Worse, browsers don’t actually store multiple variants per URL the way intermediaries do; they treat Vary as a validator and refetch on mismatch, so the win you wanted at the edge often doesn’t materialise in the browser at all.5
Three working strategies, in rough order of preference:
- Normalise at the edge. Collapse
Accept-Languageto a closed set (en,de,fr, fallback) in a request-side edge function or VCL block, thenVaryon the normalised value. - Encode the variant in the URL.
/en/products,/de/products. Google explicitly recommends locale-specific URLs overAccept-Languagefor international SEO,6 and it makes the cache key obvious. - Move the dimension to a custom header.
X-Device-Class: mobile|tablet|desktop, populated by a device-detection layer, thenVary: X-Device-Class.
| Do | Don’t |
|---|---|
| Include only headers that change the response body | Vary: User-Agent — high-cardinality, effectively disables caching |
| Normalise high-cardinality headers before they hit cache | Pass raw locale, UA, or cookie blobs into the key |
| Whitelist allowed query parameters | Include the entire query string verbatim — tracking IDs fragment cache |
| Use URL path variants for stable dimensions | Lean on Vary for anything with more than ~10 distinct values |
TTL strategies by content type
Pick TTL by content volatility and staleness tolerance, not by gut feel:
| Content type | Recommended TTL | Cache-Control example |
|---|---|---|
Fingerprinted JS/CSS/images (main.abc123.js) |
1 year | max-age=31536000, immutable |
| Static images, no fingerprint | Hours to days | max-age=86400 |
| HTML documents | Revalidate or short TTL | no-cache or max-age=60, stale-while-revalidate=600 |
| API responses, read-heavy | Minutes | s-maxage=300, stale-while-revalidate=60 |
| User-specific responses | Don’t share-cache | private, no-store |
| Real-time data | Don’t cache | no-store |
The interesting failure mode is the HTML-with-fingerprinted-assets trap. The HTML references assets by URL; if the HTML is cached for an hour and you deploy new CSS, users get yesterday’s HTML pointing at yesterday’s CSS path until the HTML cache turns over. Three reliable workarounds: serve HTML with no-cache, use a short max-age paired with stale-while-revalidate, or purge the HTML key on every deploy. Pick one and write it down — this is one of the most common “site looks broken after deploy” classes.
Cache invalidation strategies
Note
The mental shortcut: prefer mechanisms that don’t require invalidation, then mechanisms that hide invalidation latency, then mechanisms that purge precisely. Path purges and full clears are the bottom of the ladder.
Versioned URLs — avoid invalidation
The cheapest invalidation is the one you never issue. Content-addressed URLs (main.abc123.js) make a new version a cache miss by definition; old versions stay cached until they’re evicted under memory pressure. The lifecycle is mechanical:
export default { build: { rollupOptions: { output: { entryFileNames: "[name].[hash].js", chunkFileNames: "[name].[hash].js", assetFileNames: "[name].[hash][extname]", }, }, },}Combined with Cache-Control: public, max-age=31536000, immutable, browsers don’t even ask for revalidation during the freshness window. This pattern only works for assets referenced by another file (the HTML or the previous JS chunk needs to know the new URL). Documents at fixed URLs (/, /products/123) cannot use it and need explicit invalidation.
Path-based purge
The simplest active invalidation: tell the CDN to drop specific URLs.
aws cloudfront create-invalidation \ --distribution-id E1234567890AB \ --paths "/products/123"aws cloudfront create-invalidation \ --distribution-id E1234567890AB \ --paths "/products/*"Limitations to plan around:
- No relationship awareness. Purging
/products/123does nothing to/categories/electronicseven if the listing renders that product. - Rate and cost. CloudFront charges $0.005 per path beyond 1,000 free paths/month per account; a wildcard counts as one path no matter how many objects it covers. Google Cloud CDN caps invalidations at 500/minute.
- Propagation isn’t instant. See the next section.
Path purge is the right answer for emergency removals, simple sites with direct URL-to-content mapping, or as the fallback layer underneath a tag-based scheme.
Tag-based purge (surrogate keys)
When one entity (a product, a user, a feature flag) feeds many cached responses, surrogate keys let you invalidate by relationship. The origin tags responses; the CDN indexes entries by tag; one purge call fans out to every URL bound to the tag. The mechanism predates RFC-track standardization — the closest specification is the W3C Edge Architecture Specification 1.0 (2001), which defined Surrogate-Capability and Surrogate-Control for CDN-targeted directives. Each vendor layered its own tag header (Surrogate-Key, Cache-Tag, Edge-Cache-Tag) on top, and tag-based invalidation is now the workhorse of every serious CMS-on-CDN deployment.
The header conventions and limits differ enough across vendors that you should never assume portability:
| CDN | Tag header | Per-key limit | Per-header limit | Notes |
|---|---|---|---|---|
| Fastly | Surrogate-Key |
1,024 bytes | 16,384 bytes | Native; instant purge ~150 ms global P50.78 |
| Akamai | Edge-Cache-Tag |
128 chars | 128 tags per object | Fast Purge API; 5,000 tag purges/hour, 10,000 objects/min per account.9 |
| Cloudflare | Cache-Tag |
1,024 chars (API) | 16 KB total per response | Purge by tag is now available on all plans, not just Enterprise.1011 |
| Varnish | xkey VCL module |
implementation-defined | implementation-defined | Operator-controlled secondary index. |
| CloudFront | none (no native support) | n/a | n/a | Workaround: maintain a tag→URL index in your app or a Lambda@Edge-fronted DynamoDB and purge paths. |
A worked example. The origin emits:
HTTP/1.1 200 OKContent-Type: text/html; charset=utf-8Surrogate-Key: product-123 cat-electronics homepageSurrogate-Control: max-age=86400Cache-Control: public, max-age=60, stale-while-revalidate=600When the catalogue editor saves a price change, the application issues:
curl -X POST "https://api.fastly.com/service/${SERVICE_ID}/purge/product-123" \ -H "Fastly-Key: ${FASTLY_API_KEY}"That single call invalidates /products/123, /categories/electronics, and /homepage — anywhere product-123 was attached. The trade-off is real: you pay for it with origin code that has to compute and emit the right tags on every response, and you constrain yourself to vendors that support the feature.
Soft purge vs hard purge
A purge can mean two materially different things:
- Hard purge — the entry is dropped. The next request is a guaranteed cache MISS and pays the full origin RTT synchronously. This is the only behavior that satisfies “must not serve this content again” (legal takedowns, leaked secrets, broken responses).
- Soft purge — the entry is marked stale but kept in cache, so it remains eligible for
stale-while-revalidateandstale-if-error. The next request serves the stale body instantly while the CDN refetches in the background. Origin load step is bounded to one request per key per region, not N.
Vendor support for the soft variant differs:
| CDN | Soft purge mechanism |
|---|---|
| Fastly | Fastly-Soft-Purge: 1 request header on URL or surrogate-key purge.12 Combine with stale-while-revalidate for SWR fan-out. |
| Akamai | Fast Purge Invalidate action marks objects stale (next request triggers conditional GET); Delete evicts immediately.9 |
| Cloudflare | Purge always evicts; closest equivalent is stale-while-revalidate on the origin response (no soft-purge API). |
| CloudFront | Invalidations always evict; no native soft-purge primitive. |
Tip
Default to soft purge for CMS edits, deploys, and content updates. Reserve hard purge for the cases where serving the old body even once would be wrong (PII, security, legal takedown). The user-visible latency difference on the first post-purge request is the entire point of the distinction.
Invalidation propagation timing
Purge is asynchronous. Plan for it.
| CDN | Typical global propagation | Notes |
|---|---|---|
| Cloudflare | < 150 ms P50 | ”Instant Purge” via distributed coreless architecture.13 |
| Fastly | ~ 150 ms global | Sub-second for most regions; primarily bounded by speed-of-light to PoPs.8 |
| AWS CloudFront | Typically < 2 min, up to 10–15 min globally | Per-edge propagation; varies with distribution size.14 |
| Google Cloud CDN | ~ 10 s per request, full propagation in minutes | 500 invalidations/minute account-level rate limit.15 |
| Akamai | Seconds to minutes via Fast Purge API | ”Invalidate” (conditional GET) and “Delete” (force fetch) modes.9 |
Warning
Don’t deploy a “purge then immediately verify in CI” step that assumes the purge is global by the time the verification request lands. On AWS this can fail intermittently for tens of minutes. Either wait, or query a known-cold edge with a cache-busting query string for verification.
Deploy + purge race conditions
The classic incident is “deployed at T, purged at T+1, but a PoP that hadn’t seen the purge yet re-pulled the old origin between T and T+1 and re-cached it for the full TTL.” Three rules contain it:
- Order matters. Make the new origin authoritative before issuing the purge — atomic-swap behind a load balancer, blue/green deploy, or content-addressed origin paths. A purge against an origin still serving the old body just refills the cache with stale.
- Purge after deploy completes globally, not after the deploy step kicks off. CD systems that fire purges from the build runner before the rollout finishes ship this bug routinely.
- For HTML that references content-addressed assets, purge the HTML key, not the assets. The asset URLs change with the deploy and never collide.
Stale-while-revalidate and stale-if-error
RFC 5861 adds two Cache-Control extensions that change the freshness-vs-availability calculus. They are the closest thing the cache layer has to a free lunch.
Stale-while-revalidate (SWR)
Cache-Control: max-age=600, stale-while-revalidate=30The lifecycle of a cached response under this directive:
- 0 – 600 s. Fresh. Served from cache.
- 600 – 630 s. Stale but inside the SWR window. The cache returns the stale response immediately and triggers an asynchronous revalidation in the background. The next request gets fresh content.
- > 630 s. Truly stale. The cache must fetch synchronously before responding.
The point is to push revalidation latency off the user-visible path. It also de-fangs the thundering herd: the first concurrent request serves stale and fires one background fetch; the rest still see cache hits.
Browser support landed in Chrome 75+, Edge 79+, Firefox 68+, and Safari 14+.16 Browsers that don’t recognise the directive simply ignore it and fall back to max-age semantics. CDN support is mature on Cloudflare, Fastly, KeyCDN, and Varnish; CloudFront needs Lambda@Edge to implement true SWR. One subtle constraint: if no traffic arrives during the SWR window, the entry truly expires and the next request pays the full origin RTT — SWR is only a steady-state win for warm endpoints.
Stale-if-error (SIE)
Cache-Control: max-age=600, stale-if-error=86400Origin returned a 5xx or is unreachable? Serve the stale response for the SIE window instead of propagating the error. This trades freshness for availability, and it makes a bad deploy look like a slightly stale page rather than a 503 storm.
The combined production recipe:
Cache-Control: max-age=300, stale-while-revalidate=60, stale-if-error=86400Five minutes fresh; one minute of async-refresh window; one day of error-survival cushion. This single header turns a 30-minute origin outage into a non-event for read-heavy endpoints.
Varnish grace mode
Varnish predates RFC 5861 and exposes the same idea with finer control via VCL:
sub vcl_backend_response { set beresp.ttl = 300s; set beresp.grace = 1h;}sub vcl_recv { if (std.healthy(req.backend_hint)) { set req.grace = 10s; } else { set req.grace = 24h; }}The interesting move is the split between beresp.grace (how long the cache retains a stale object after expiry) and req.grace (how stale a given request is willing to accept). When the backend health probe flips, you can extend the request-side grace from 10 s to 24 h on the fly without re-emitting any responses.
Operational guardrails
Cache hit ratio (CHR)
CHR is the primary health metric for a CDN. The formula is trivial:
Useful target bands, from operating sites at scale:
- Static assets: > 95 %.
- Mixed-content sites: > 85 %.
- Investigate: anything below 80 % on a previously healthy endpoint.
A global average is a bad target because it hides the failure that hurts. Always segment by content type, region, and URL pattern. A 90 % global CHR with 50 % CHR on /api/* is a hot origin waiting to happen.
The recurring CHR killers, in order of how often they show up in real incidents:
- High-cardinality
Varyheaders (Accept-Language, User-Agent) — the cache splits into thousands of variants per URL. - Tracking parameters in the cache key —
?utm_source=...&fbclid=...makes every share a unique key. Strip them with a query-string allow-list. - Aggressive
Vary: Cookie— usually unintended, usually catastrophic; one session cookie effectively turns the cache off. - TTL too short for actual change rate — e.g. 60 s TTL on content that changes hourly.
- Origin emitting
Cache-Control: no-storeunintentionally — common after a deploy of an auth middleware that “secures” everything.
Cache stampede (thundering herd)
When a popular cached entry expires, every concurrent request misses the cache and hits the origin in the same instant.
A 98 % CHR endpoint at 50k RPS sees a load step from ~1k RPS to 50k RPS the moment a hot key expires — a 50× origin spike. The mitigations stack:
- Request coalescing at the edge. CloudFront does this natively: concurrent requests for the same key are paused while one fetch goes upstream, and the response is fanned back out. Fastly does the same by default. Cloudflare’s Tiered Cache adds upper-tier collapsing.
- Stale-while-revalidate. First miss after expiry serves stale and fires one background refresh; subsequent requests still hit the cache. No stampede.
- Probabilistic early expiration (XFetch). Refresh slightly before TTL with a probability that grows as expiry approaches. Vattani et al. show this is optimal under reasonable assumptions:17 for a key with recompute cost
, expiry time , and a tunable , refresh whenever . The effect is to spread re-fetches across a window instead of bunching them at the boundary. - Origin Shield. A single mid-tier cache that funnels misses from many edges. Multi-region misses become one origin request.
Tip
Coalescing has one ugly footgun. If your origin response varies by cookie or other private dimension and you have request collapsing on, two users can end up sharing one origin response. CloudFront’s documented rule is: collapsing is disabled when cookie forwarding is enabled, and is otherwise opt-out only by setting Min TTL to 0 and having the origin emit Cache-Control: private, no-store, no-cache, max-age=0, or s-maxage=0. If you depend on per-user variants, verify the equivalent rule on your CDN before turning collapsing on.
Origin Shield architecture
Without a shield, every edge PoP that misses goes directly to origin:
Edge NYC miss → OriginEdge London miss → OriginEdge Tokyo miss → Origin= 3 origin requestsWith a shield (a single regional cache between edges and origin):
Edge NYC miss → Shield → OriginEdge London miss → Shield (HIT)Edge Tokyo miss → Shield (HIT)= 1 origin requestWhen it earns its keep:
- High traffic with mediocre CHR (< 95 %) — the absolute miss volume is what hurts.
- Origins that cannot scale past a known ceiling (legacy systems, expensive databases).
- Globally distributed traffic over many edge PoPs — the multiplier on coalescing is largest.
The cost is real: AWS charges
Graceful degradation patterns
Designing for origin failure is a full second axis on top of the freshness/cost trade-off. Three patterns that should be in every production playbook:
- Extended
stale-if-error—stale-if-error=172800keeps cached content alive for two days during outages. - Static fallback at the edge — if origin returns 5xx, serve a baked-in static page from edge KV/storage. Cloudflare Workers, Lambda@Edge, and Fastly Compute can all do this.
- Health-aware grace — Varnish’s pattern above: extend the stale-acceptance window when the backend probe flips unhealthy.
- Edge circuit breaker — after N consecutive origin errors, stop sending traffic for a cooldown period and serve stale or a static fallback instead.
Edge compute use cases
Edge compute moves logic from origin (typically 100–300 ms RTT) to the edge PoP closest to the client (sub-ms). Used well, it shifts the personalisation boundary outward and lets you cache responses you couldn’t cache before. Used badly, it’s a slower, more expensive way to do work the origin was already doing.
Platform comparison
| Platform | Runtime | Cold start | Per-invocation execution limit | Best for |
|---|---|---|---|---|
| CloudFront Functions | JavaScript (subset) | Effectively none (executes in request path) | < 1 ms CPU | Header rewrites, redirects, simple URL transforms |
| Lambda@Edge | Node.js, Python | Tens to hundreds of ms on cold path | 5 s viewer / 30 s origin | Complex logic, async I/O, surrogate-key workarounds |
| Cloudflare Workers | V8 isolate | ~5 ms isolate startup; often ~0 user-visible via TLS-handshake prewarm18 | 10 ms (free) / 30 ms+ (paid) CPU | Full edge applications, KV-backed APIs |
| Fastly Compute | WASM (Rust, Go, AssemblyScript, …) | ~35 µs Wasmtime instance instantiation19 | No hard cap; single-tenant per request | High-performance compute, structured data transforms |
| Platform | Per 1M invocations | Notes |
|---|---|---|
| CloudFront Functions | $0.10 | First 2M invocations/month free. |
| Lambda@Edge | $0.60 + GB-seconds | No free tier; viewer-request and origin-request have different limits. |
| Cloudflare Workers | $0.30 (Standard) | $5/month minimum on the paid plan covers 10M requests; CPU billed separately. |
| Fastly Compute | Bundled with Compute@Edge plan | Pricing model is bundled, not per-invocation; consult your contract. |
Personalisation without origin load
Classic personalisation needs an origin per request. Edge compute lets you cache the variants:
function handler(event) { const request = event.request; const cookies = request.cookies; const segment = cookies.user_segment?.value ?? "default"; request.headers["x-user-segment"] = { value: segment }; return request;}Wire x-user-segment into the cache key policy. Three segments × 1,000 pages = 3,000 cached variants — all served from the edge, none from origin. The trade-off is variant explosion: if you also vary by language, device, and feature flag, segments multiply quickly. Cap your dimensions at three or four with explicit fallbacks.
A/B testing at the edge
export default { async fetch(request) { const url = new URL(request.url); let variant = getCookie(request, "ab_variant"); if (!variant) { variant = Math.random() < 0.5 ? "a" : "b"; } url.pathname = `/${variant}${url.pathname}`; const response = await fetch(url.toString(), request); const newResponse = new Response(response.body, response); if (!getCookie(request, "ab_variant")) { newResponse.headers.set("Set-Cookie", `ab_variant=${variant}; Path=/; Max-Age=86400`); } return newResponse; },};Why move A/B at the edge:
- No layout shift — the variant decision is made before HTML is sent.
- No JavaScript dependency on the client.
- Consistent assignment across page loads via cookie persistence.
- Works for users with JS disabled.
Geo-routing and compliance
Edge compute is the cleanest place to express data-locality rules:
exports.handler = async (event) => { const request = event.Records[0].cf.request; const country = request.headers["cloudfront-viewer-country"][0].value; const euCountries = ["DE", "FR", "IT", "ES", "NL", "BE", "AT", "PL"]; if (euCountries.includes(country)) { request.origin.custom.domainName = "eu-origin.example.com"; } else { request.origin.custom.domainName = "us-origin.example.com"; } return request;};For compliance-driven routing (GDPR data residency, sectoral regulation) the contract is: the request never reaches an origin outside the allowed region. Verify this with synthetic probes from each region, not just code review.
Practical takeaways
- Cache key controls correctness; TTL controls cost and freshness. Review them on different cadences and treat them as separate failure domains.
- Climb the invalidation ladder. Versioned URLs first; SWR for documents and APIs; surrogate keys for content with relationships; path purges for emergencies; full clears never.
- Default to
Cache-Control: max-age=N, stale-while-revalidate=M, stale-if-error=Kfor HTML and JSON APIs. The combination hides revalidation latency, prevents stampedes, and survives origin outages. - Segment your CHR dashboards. Global numbers hide the next incident.
- Buy request coalescing. Whether through native CDN behaviour, Origin Shield, Tiered Cache, or SWR — make sure a popular key expiring cannot send N requests to your origin.
- Treat purge as eventually consistent. Build for it in deploy pipelines; verify against cold edges with cache-busting query strings.
- Pick the cheapest edge runtime that fits the work. CloudFront Functions for header munging; Workers / Compute when you need real logic; Lambda@Edge when you need AWS APIs and you’ve measured the latency.
Appendix
Prerequisites
- HTTP caching model (request/response headers, freshness, validation).
- CDN concepts (edge PoPs, origin, cache keys).
- Basic familiarity with distributed-systems failure modes.
Glossary
- Cache key. Identifier used to look up a stored response — typically
method + host + path + queryplus selected headers. - TTL (time to live). Duration content is considered fresh.
- CHR (cache hit ratio). Hits divided by total requests.
- PoP (point of presence). Edge data centre where a CDN serves traffic.
- Origin Shield. Centralised cache layer between edges and origin.
- Surrogate key (cache tag). Tag attached to a response so groups of cached entries can be invalidated together.
- Thundering herd / stampede. Concurrent miss storm at origin after a popular cache entry expires.
- SWR.
stale-while-revalidate— serve stale content while asynchronously refreshing. - SIE.
stale-if-error— serve stale content when origin returns errors.
References
Specifications
- RFC 9111 — HTTP Caching, STD 98 (June 2022).
- RFC 9110 — HTTP Semantics.
- RFC 5861 — HTTP Cache-Control extensions for stale content.
- RFC 8246 —
immutableCache-Control extension. - RFC 9213 — Targeted HTTP Cache Control (
CDN-Cache-Control), June 2022. - W3C Edge Architecture Specification 1.0 — origin of
Surrogate-ControlandSurrogate-Capability.
CDN provider documentation
- CloudFront — Cache key and origin requests.
- CloudFront — Edge functions: choose between CloudFront Functions and Lambda@Edge.
- CloudFront — Origin Shield.
- CloudFront — Pay for file invalidation.
- Fastly — Working with surrogate keys.
- Fastly — Serving stale content.
- Cloudflare — Instant Purge architecture.
- Cloudflare — Purge cache by cache-tags.
- Cloudflare — Tiered Cache.
- Cloudflare Workers — How Workers works.
- Google Cloud CDN — Cache invalidation overview.
- Akamai — Purge content by cache tag.
Educational and primary-source practitioner
- MDN — HTTP caching.
- MDN —
Cache-Control. - web.dev — Keeping things fresh with stale-while-revalidate.
- Varnish — Grace mode.
- Cloudflare — Rethinking cache purge (coreless purge).
- Fastly — Best practices for using the
Varyheader. - Vattani et al. — Optimal Probabilistic Cache Stampede Prevention (VLDB 2015).
- Philip Walton — Performant A/B testing with Cloudflare Workers.
Footnotes
-
RFC 9111 — HTTP Caching, STD 98, June 2022. Obsoletes RFC 7234. ↩
-
Edge Architecture Specification 1.0, W3C Note, August 2001. The header set (
Surrogate-Control,Surrogate-Capability) was authored by Mark Nottingham (Akamai) and Xiang Liu (Oracle); it is a W3C Note, not a W3C Recommendation. ↩ -
CDN-Cache-Control— Cloudflare docs; CDN-Cache-Control: precision control for your CDN(s), Cloudflare blog. ↩ -
Best practices for using the
Varyheader, Fastly engineering blog. ↩ -
Understanding the
Varyheader in the browser, Fastly engineering blog. ↩ -
Managing multi-regional and multilingual sites, Google Search Central. ↩
-
Soft purges — Fastly documentation; Introducing Soft Purge, Fastly blog. ↩
-
Instant Purge: invalidating cached content in under 150 ms, Cloudflare. ↩
-
Pay for file invalidation — Amazon CloudFront; see also Invalidating files. ↩
-
stale-while-revalidatebrowser support — Can I use; Keeping things fresh with stale-while-revalidate, web.dev. ↩ -
A. Vattani, F. Chierichetti, K. Lowenstein. Optimal Probabilistic Cache Stampede Prevention. VLDB 2015. ↩
-
How Workers works; Eliminating Cold Starts 2: shard and conquer. ↩
-
Performance matters: why Compute does not yet support JavaScript, Fastly. Note: 35 µs is hot-instance instantiation; cold-path latency is higher in practice. ↩