Design a Flash Sale System

A flash sale must serve millions of buyers competing for a fixed pool of inventory in seconds, with zero tolerance for overselling. Treat it as four constraints chained together: a CDN-hosted waiting room absorbs the spike, a token gate meters admission to backend capacity, an atomic inventory store prevents overselling, and an async order queue decouples user-visible latency from durable fulfilment. Each layer’s job is to shield the next.

System overview — CDN waiting room → API gateway → queue, inventory and order services → Redis, message queue and PostgreSQL.

Mental model

Three constraints fight each other in any flash sale and the architecture is the negotiated truce.

Traffic absorption. Millions of users arriving in seconds cannot hit origin. A CDN-hosted waiting room absorbs the spike at the edge; a queue service meters admission to backend capacity.
Inventory accuracy. Overselling destroys trust and forces refunds, returns, or worse — legal exposure for ticketing. Atomic Redis Lua scripts give “check-and-decrement” without races. Pre-allocating tokens equal to inventory turns “did we oversell?” into “did we issue more tokens than items?”, which is trivially false by construction.
Order durability under load. Synchronous payment + write paths cannot scale with the spike. A durable message queue decouples order receipt from order completion: the user gets a fast 202 Accepted; a worker pool drains the queue at the database’s pace, with retries and a dead-letter queue for poison messages.

The mental model is waiting room → token gate → atomic inventory → async order queue. Every section below either implements one of those four boxes or explains how to harden it under load.

Admission funnel — 10M raw arrivals shaped down to 10K confirmed orders by the four layers. — Admission funnel — each layer drops a quantitative order of magnitude so the inventory tier never sees raw traffic.

Design decision	Trade-off
CDN waiting room	Absorbs traffic cheaply; adds user-facing latency and a polling tax
Token-based admission	Prevents overselling by construction; requires accurate pre-allocation
Redis atomic counters	Sub-millisecond inventory checks; hot-key risk on a single-product surge
Async order processing	Handles 100x spikes; delayed confirmation and harder UX expectations

Requirements

Functional Requirements

Feature	Scope	Notes
Virtual waiting room	Core	Absorbs traffic spike before backend
Queue management	Core	FIFO admission with position tracking
Inventory reservation	Core	Atomic decrement, no overselling
Order placement	Core	Async processing with durability
Bot detection	Core	Multi-layer defense
Payment processing	Core	Idempotent, timeout-aware
Order confirmation	Core	Email/push notification
Purchase limits	Extended	1-2 units per customer
VIP early access	Extended	Tiered queue priority
Real-time inventory display	Extended	Eventually consistent display

Non-Functional Requirements

Requirement	Target	Rationale
Availability	99.99%	Revenue and reputational impact during a public on-sale window; the Alibaba flash-sale playbook is the canonical reference here.
Waiting room latency	< 100ms	Static asset served by CDN; users compare it to opening a website.
Inventory check latency	< 50ms	On the critical path, gates checkout. Sub-ms in Redis, but budget for network and serialisation.
Checkout latency	< 5s p99	User-acceptable; async order processing hides downstream payment + DB time.
Queue position accuracy	Real-time	Trust requires visible progress; stale numbers are worse than slow numbers.
Inventory accuracy	100%	Zero tolerance for overselling — refund cost, regulatory risk for ticketing, brand damage.
Order durability	Zero loss	Queued orders must survive worker, broker, and AZ failures.

Scale Estimation

Traffic Profile:

Metric	Normal	Flash Sale Peak	Multiplier
Concurrent users	100K	10M	100x
Page requests/sec	10K RPS	1M RPS	100x
Inventory checks/sec	1K RPS	500K RPS	500x
Orders/sec	100 TPS	10K TPS	100x

Back-of-envelope (1M users, 10K inventory, 30-min sale window):

1Users arriving in first minute:   1,000,0002Waiting-room HTML hits (CDN):     1M × 3 refreshes = 3M req/min ≈ 50K RPS at edge3Queue-status polls (CDN-bypass):  1M × 1 poll / 5s = 200K RPS to queue API4Admission rate (gate-controlled): 10K inventory / 30 min ≈ 6 admits/s typical;5                                   burst-shaped to ~83/s during the first 2 min6Inventory reserves (admitted):    same ~6-83 RPS; bursts shaped by gate, not raw traffic7Orders attempted:                 ~12K (a few percent abandon at payment)8Orders confirmed:                 10K (inventory limit, by construction)

Note

The 200K RPS poll figure is the work the queue API must do; the 50K RPS waiting-room hits are absorbed at the edge. The inventory and order tiers see whatever the admission gate lets through, not the raw arrival rate. Sizing the inventory tier off the headline 1M is the most common over-provisioning mistake.

Storage:

1Queue state: 1M users × 100 bytes = 100 MB (DynamoDB or Redis)2Order records: 10K orders × 5 KB = 50 MB (PostgreSQL)3Event logs: 10M events × 200 bytes = 2 GB / sale

Design Paths

Path A: Pre-Allocation Model (Token-Based)

Best when:

Fixed, known inventory quantity
Fairness is paramount (ticketing, limited editions)
High-value items where overselling is catastrophic

Architecture:

Key characteristics:

Tokens minted equal to inventory before sale starts.
Each admitted user receives one token.
A token guarantees a checkout opportunity, not a purchase — the user may still abandon.
Tokens expire if unused and return to the pool for the next admittee.

Trade-offs:

✅ Zero overselling by construction.
✅ Predictable admission rate (admit at backend capacity, not request rate).
✅ Fair: pure FIFO, or FIFO with a randomised pre-sale window.
❌ Requires an accurate inventory count before the sale opens.
❌ Token lifecycle management (expiry, reclaim, double-submission) is non-trivial.
❌ Abandoned tokens dent conversion if expiry is too short or the queue is too aggressive.

Note

“Pre-mint one token per unit of inventory” is one specific implementation of Path A and works well for low-stock, high-fairness sales (drops, ticketing). The other common variant — used by SeatGeek and Cloudflare Waiting Room — issues admission tokens decoupled from inventory, sized to backend capacity instead. The actual decrement still happens atomically at checkout. Pick by where you want the failure to land: at admission (“you didn’t get a ticket”) or at checkout (“we sold out while you were typing your card number”).

Real-world example: SeatGeek’s virtual waiting room on AWS uses Lambda + DynamoDB to manage two layers of tokens: a visitor token assigned at entry that captures arrival timestamp for FIFO ordering, and an access token exchanged at the front of the queue that authorises checkout. Tokens expire when the purchase completes or the user’s session ends, returning capacity to the protected zone via a leaky-bucket counter.

Path B: Real-Time Inventory Model (Counter-Based)

Best when:

Dynamic inventory (multiple warehouses, restocking)
E-commerce flash sales with variable stock
Lower-stakes items where occasional overselling is recoverable

Architecture:

Path B — counter-based inventory: rate limiter fronts a Redis counter, atomic decrement on checkout.

Key characteristics:

No pre-allocation — inventory is checked in real time on the checkout path.
Atomic decrement happens at checkout, not at admission.
Rate limiting protects backend capacity; it doesn’t guarantee the user a purchase.
Inventory can be restocked mid-sale (dynamic counters).

Trade-offs:

✅ Native support for dynamic inventory and mid-sale restocks.
✅ Simpler pre-sale setup — no token mint job, no token registry.
✅ Easier integration with multi-warehouse fulfilment systems.
❌ Overselling risk if counter writes desync from order persistence.
❌ Users admitted without a guarantee — visible “sold out at checkout” UX is harsh.
❌ Hot-key risk on a single popular SKU — one shard, one CPU, one tail latency.

Real-world example: Alibaba’s flash-sale playbook on ApsaraDB for Redis (Tair) uses a Lua script over a hash that encodes Total and Booked per SKU; the script performs HMGET to read both and HINCRBY to increment Booked only if Booked + qty <= Total. A master-replica Tair instance is documented to sustain >100K QPS for inventory writes and a read/write-split instance >600K QPS for cached reads. The all-up Tmall platform peaked at 583K orders per second on Singles Day 2020 — that is a fleet-wide order TPS figure, not a single Redis instance, and it is reached by sharding hot SKUs and front-loading admission control.

Path Comparison

Factor	Path A (Token)	Path B (Counter)
Overselling risk	Zero	Low (with proper atomicity)
Setup complexity	Higher	Lower
Dynamic inventory	Difficult	Native
User expectation	Guaranteed opportunity	Best effort
Fairness	Explicit (token order)	Implicit (first to checkout)
Best for	Ticketing, limited releases	E-commerce, restockable goods

This Article’s Focus

This article implements Path A (Token-Based) for the core flow because:

Flash sales typically have fixed, high-value inventory
Fairness is a differentiator (users accept waiting if fair)
Zero overselling is non-negotiable for most use cases

Path B implementation details are covered in the Variations section.

High-Level Design

Component Overview

Component	Responsibility	Technology
Virtual Waiting Room	Absorb traffic spike, display queue position	Static HTML on CDN
Queue Service	Manage admission, assign tokens	Lambda + DynamoDB
Inventory Service	Atomic inventory operations	Redis Cluster
Order Service	Process orders asynchronously	ECS + SQS
Payment Service	Handle payments idempotently	Stripe/Adyen integration
Notification Service	Send confirmations	SES + SNS
Bot Detection	Filter non-human traffic	WAF + Custom rules

Request Flow

API Design

Queue Service APIs

Join Queue

1POST /api/v1/queue/join2Authorization: Bearer {user_token}3X-Device-Fingerprint: {fingerprint}45{6  "sale_id": "flash-sale-2024-001",7  "product_ids": ["sku-001", "sku-002"]8}

Response (202 Accepted):

1{2  "queue_ticket": "qt_abc123xyz",3  "position": 15234,4  "estimated_wait_seconds": 180,5  "status_url": "/api/v1/queue/status/qt_abc123xyz"6}

Error responses:

400 Bad Request: Invalid sale_id or product not in flash sale
403 Forbidden: Bot detected or user already in queue
429 Too Many Requests: Rate limit exceeded

Check Queue Status

1GET /api/v1/queue/status/{queue_ticket}

Response (200 OK):

1{2  "queue_ticket": "qt_abc123xyz",3  "status": "waiting",4  "position": 8234,5  "estimated_wait_seconds": 90,6  "poll_interval_seconds": 57}

Status values:

waiting: In queue, not yet admitted
admitted: Token assigned, can proceed to checkout
expired: Waited too long, removed from queue
completed: Purchased or abandoned checkout

Token Admission (Internal)

When user reaches front of queue:

1{2  "queue_ticket": "qt_abc123xyz",3  "status": "admitted",4  "checkout_token": "ct_xyz789abc",5  "checkout_url": "/checkout?token=ct_xyz789abc",6  "token_expires_at": "2024-03-15T10:05:00Z"7}

Checkout Service APIs

Start Checkout Session

1POST /api/v1/checkout/start2Authorization: Bearer {user_token}34{5  "checkout_token": "ct_xyz789abc",6  "product_id": "sku-001",7  "quantity": 18}

Response (201 Created):

1{2  "session_id": "cs_def456",3  "reserved_until": "2024-03-15T10:05:00Z",4  "product": {5    "id": "sku-001",6    "name": "Limited Edition Sneaker",7    "price": 299.0,8    "currency": "USD"9  },10  "next_step": "payment"11}

Error responses:

400 Bad Request: Invalid token or product
409 Conflict: Token already used
410 Gone: Token expired
422 Unprocessable: Inventory exhausted (token invalid)

Submit Order

1POST /api/v1/orders2Authorization: Bearer {user_token}34{5  "session_id": "cs_def456",6  "shipping_address": {7    "line1": "123 Main St",8    "city": "San Francisco",9    "state": "CA",10    "postal_code": "94102",11    "country": "US"12  },13  "payment_method_id": "pm_card_visa"14}

Response (202 Accepted):

1{2  "order_id": "ord_789xyz",3  "status": "processing",4  "estimated_confirmation": "< 60 seconds",5  "tracking_url": "/api/v1/orders/ord_789xyz"6}

Design note: Returns 202 (not 201) because order processing is asynchronous. The order is durably queued but not yet confirmed.

Pagination Strategy

Queue status uses cursor-based polling, not traditional pagination:

1{2  "position": 1234,3  "poll_interval_seconds": 5,4  "next_poll_after": "2024-03-15T10:01:05Z"5}

Rationale: Queue position changes continuously. Polling interval increases as position improves (less uncertainty near front).

Data Modeling

Queue State (DynamoDB)

1Table: FlashSaleQueue2Partition Key: sale_id3Sort Key: queue_ticket45Attributes:6- user_id: string7- position: number (GSI for ordering)8- status: enum [waiting, admitted, expired, completed]9- joined_at: ISO860110- admitted_at: ISO8601 | null11- checkout_token: string | null12- token_expires_at: ISO8601 | null13- device_fingerprint: string14- ip_address: string

GSI: sale_id-position-index for efficient position lookups.

Why DynamoDB: Single-digit millisecond latency at any scale, automatic scaling, TTL for expired entries.

Inventory Counter (Redis)

1SET inventory:sku-001 1000023EVAL "4  local count = tonumber(redis.call('GET', KEYS[1]) or 0)5  if count >= tonumber(ARGV[1]) then6    return redis.call('DECRBY', KEYS[1], ARGV[1])7  else8    return -19  end10" 1 inventory:sku-001 1

Why Lua: Redis runs each EVAL script atomically on a single shard — no other commands interleave. Without that, two concurrent clients can both see count = 1, both decrement, and oversell. Naïve WATCH/MULTI/EXEC works but burns retries under load; a Lua script is the canonical pattern. The pattern is sometimes called single-flight: the cluster serialises contending writes for the same key into one in-flight execution at a time, which is exactly what an inventory counter needs.

Important

One product key is one Redis shard. A flash on a single SKU is a hot-key problem disguised as a counter problem. Mitigations include client-side bucketing (split inventory:sku-001 into inventory:sku-001:{0..15}, decrement a random bucket, accept slightly less even depletion) and Tair-style read/write split for warm-up traffic. See the ApsaraDB Redis hot-key guide for the production playbook. The same pattern applies on DynamoDB via write sharding the partition key; adaptive capacity will not save you above the per-item ceiling (~1,000 WCU / 3,000 RCU).

Hot-key sharded counter — one SKU split into N sub-keys to spread writes across Redis shards. — Hot-key sharded counter — clients pick a bucket by hashing the user; reads scatter-gather to compute remaining stock.

Token Registry (Redis)

1# Token → user mapping with TTL2SETEX token:ct_xyz789abc 300 "user_123"34# Used tokens (prevent replay)5SADD used_tokens:flash-sale-2024-001 ct_xyz789abc

TTL: 5 minutes for checkout tokens. Expired tokens return to the pool.

Order Schema (PostgreSQL)

1CREATE TABLE orders (2    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),3    user_id UUID NOT NULL REFERENCES users(id),4    sale_id VARCHAR(50) NOT NULL,5    checkout_token VARCHAR(100) NOT NULL UNIQUE,6    status VARCHAR(20) DEFAULT 'pending',78    -- Order details9    product_id VARCHAR(50) NOT NULL,10    quantity INT NOT NULL DEFAULT 1,11    unit_price DECIMAL(10,2) NOT NULL,12    total_amount DECIMAL(10,2) NOT NULL,13    currency VARCHAR(3) DEFAULT 'USD',1415    -- Shipping16    shipping_address JSONB NOT NULL,1718    -- Payment19    payment_intent_id VARCHAR(100),20    payment_status VARCHAR(20),2122    -- Timestamps23    created_at TIMESTAMPTZ DEFAULT NOW(),24    confirmed_at TIMESTAMPTZ,2526    -- Idempotency27    idempotency_key VARCHAR(100) UNIQUE28);2930CREATE INDEX idx_orders_user ON orders(user_id, created_at DESC);31CREATE INDEX idx_orders_sale ON orders(sale_id, status);32CREATE INDEX idx_orders_payment ON orders(payment_intent_id);

Idempotency key: Prevents duplicate orders if user retries during network issues. Typically {user_id}:{checkout_token}.

Database Selection Matrix

Data	Store	Rationale
Queue state	DynamoDB	Single-digit ms latency, auto-scale, TTL
Inventory counters	Redis Cluster	Sub-ms atomic operations
Tokens	Redis	TTL, fast lookup
Orders	PostgreSQL	ACID, complex queries, durability
Event logs	Kinesis → S3	High throughput, analytics
User sessions	Redis	Fast auth checks

Low-Level Design

Virtual Waiting Room

The waiting room is the first line of defense. It must:

Absorb millions of requests without backend load
Provide fair queue positioning
Communicate progress transparently

Architecture:

Static HTML design:

1<!DOCTYPE html>2<html>3  <head>4    <title>Flash Sale - Please Wait</title>5    <meta http-equiv="Cache-Control" content="no-cache" />6  </head>7  <body>8    <div id="waiting-room">9      <h1>You're in the queue</h1>1011      <!-- Key UI elements -->12      <div id="position">Position: <span id="pos-number">--</span></div>13      <div id="estimate">Estimated wait: <span id="wait-time">--</span></div>14      <div id="progress-bar">15        <div id="progress-fill" style="width: 0%"></div>16      </div>1718      <!-- Status messages -->19      <div id="status-message">Please keep this tab open</div>20      <div id="redirect-notice" style="display:none">Redirecting to checkout...</div>21    </div>2223    <script src="/queue-client.js"></script>24  </body>25</html>

Queue polling logic:

1// queue-client.ts2interface QueueStatus {3  status: "waiting" | "admitted" | "expired"4  position?: number5  estimated_wait_seconds?: number6  checkout_url?: string7  poll_interval_seconds: number8}910async function pollQueueStatus(ticket: string): Promise<void> {11  const response = await fetch(`/api/v1/queue/status/${ticket}`)12  const status: QueueStatus = await response.json()1314  switch (status.status) {15    case "waiting":16      updateUI(status.position, status.estimated_wait_seconds)17      // Exponential backoff near front of queue18      const interval = status.poll_interval_seconds * 100019      setTimeout(() => pollQueueStatus(ticket), interval)20      break2122    case "admitted":23      showRedirectNotice()24      // Small delay for user to see the message25      setTimeout(() => {26        window.location.href = status.checkout_url27      }, 1500)28      break2930    case "expired":31      showExpiredMessage()32      break33  }34}3536// Start polling on page load37const ticket = new URLSearchParams(window.location.search).get("ticket")38if (ticket) {39  pollQueueStatus(ticket)40}

Design decisions:

Decision	Rationale
Static HTML on CDN	Millions of users hitting origin would saturate it; CDN absorbs at edge
Client-side polling	Push (WebSocket) at this scale requires massive connection management
Exponential backoff	Users near front poll more frequently; reduces total requests
No refresh needed	Single-page polling prevents users from losing position by refreshing

Queue Service (Token Management)

The queue service manages the FIFO queue and token assignment.

Lambda handler:

1// queue-service.ts2import { DynamoDB } from "@aws-sdk/client-dynamodb"3import { DynamoDBDocument } from "@aws-sdk/lib-dynamodb"45const ddb = DynamoDBDocument.from(new DynamoDB({}))67interface QueueEntry {8  sale_id: string9  queue_ticket: string10  user_id: string11  position: number12  status: "waiting" | "admitted" | "expired" | "completed"13  checkout_token?: string14}1516export async function joinQueue(17  saleId: string,18  userId: string,19  deviceFingerprint: string,20): Promise<{ ticket: string; position: number }> {21  // Check if user already in queue22  const existing = await findUserInQueue(saleId, userId)23  if (existing) {24    return { ticket: existing.queue_ticket, position: existing.position }25  }2627  // Get current queue length (approximate, for position)28  const position = await getNextPosition(saleId)2930  const ticket = generateTicket()3132  await ddb.put({33    TableName: "FlashSaleQueue",34    Item: {35      sale_id: saleId,36      queue_ticket: ticket,37      user_id: userId,38      position: position,39      status: "waiting",40      joined_at: new Date().toISOString(),41      device_fingerprint: deviceFingerprint,42      ttl: Math.floor(Date.now() / 1000) + 3600, // 1 hour TTL43    },44    ConditionExpression: "attribute_not_exists(queue_ticket)",45  })4647  return { ticket, position }48}4950export async function admitNextUsers(saleId: string, count: number): Promise<void> {51  // Invoked by EventBridge at fixed rate (e.g., every second)52  // Admits 'count' users from front of queue5354  const waiting = await ddb.query({55    TableName: "FlashSaleQueue",56    IndexName: "sale_id-position-index",57    KeyConditionExpression: "sale_id = :sid",58    FilterExpression: "#status = :waiting",59    ExpressionAttributeNames: { "#status": "status" },60    ExpressionAttributeValues: {61      ":sid": saleId,62      ":waiting": "waiting",63    },64    Limit: count,65    ScanIndexForward: true, // Ascending by position (FIFO)66  })6768  for (const entry of waiting.Items || []) {69    await admitUser(entry as QueueEntry)70  }71}7273async function admitUser(entry: QueueEntry): Promise<void> {74  const token = generateCheckoutToken()75  const expiresAt = new Date(Date.now() + 5 * 60 * 1000) // 5 min7677  await ddb.update({78    TableName: "FlashSaleQueue",79    Key: { sale_id: entry.sale_id, queue_ticket: entry.queue_ticket },80    UpdateExpression: "SET #status = :admitted, checkout_token = :token, token_expires_at = :exp",81    ExpressionAttributeNames: { "#status": "status" },82    ExpressionAttributeValues: {83      ":admitted": "admitted",84      ":token": token,85      ":exp": expiresAt.toISOString(),86    },87  })8889  // Also store token in Redis for fast lookup during checkout90  await redis.setex(`token:${token}`, 300, entry.user_id)91}

Admission rate control:

The admission rate must match backend capacity. EventBridge triggers admitNextUsers every second:

1Admission rate = min(backend_capacity, remaining_inventory / expected_checkout_time)23Example:4- Backend can handle 1000 checkouts/sec5- Remaining inventory: 50006- Average checkout time: 60 seconds7- Admission rate: min(1000, 5000/60) = min(1000, 83) = 83 users/sec

Design decisions:

Decision	Rationale
DynamoDB for queue	Handles millions of entries with single-digit ms latency
Position as GSI	Enables efficient “next N users” query
EventBridge for admission	Decouples admission rate from user requests
Token in Redis + DynamoDB	Redis for fast checkout validation; DynamoDB for durability

Inventory Service (Atomic Counters)

The inventory service prevents overselling through atomic operations.

Redis Lua script for atomic reservation:

1-- reserve_inventory.lua2-- KEYS[1] = inventory key (e.g., "inventory:sku-001")3-- KEYS[2] = reserved set key (e.g., "reserved:sku-001")4-- ARGV[1] = user_id5-- ARGV[2] = quantity6-- ARGV[3] = reservation_id7-- ARGV[4] = ttl_seconds89local inventory_key = KEYS[1]10local reserved_key = KEYS[2]11local user_id = ARGV[1]12local quantity = tonumber(ARGV[2])13local reservation_id = ARGV[3]14local ttl = tonumber(ARGV[4])1516-- Check current inventory17local available = tonumber(redis.call('GET', inventory_key) or 0)1819if available < quantity then20    return { err = 'insufficient_inventory', available = available }21end2223-- Atomic decrement24local new_count = redis.call('DECRBY', inventory_key, quantity)2526if new_count < 0 then27    -- Race condition: restore and fail28    redis.call('INCRBY', inventory_key, quantity)29    return { err = 'race_condition' }30end3132-- Track reservation for expiration33redis.call('HSET', reserved_key, reservation_id,34    cjson.encode({ user_id = user_id, quantity = quantity, created_at = redis.call('TIME')[1] }))35redis.call('EXPIRE', reserved_key, ttl)3637return { ok = true, remaining = new_count, reservation_id = reservation_id }

Inventory service implementation:

1// inventory-service.ts2import Redis from "ioredis"3import { readFileSync } from "fs"45const redis = new Redis.Cluster([6  { host: "redis-1.example.com", port: 6379 },7  { host: "redis-2.example.com", port: 6379 },8  { host: "redis-3.example.com", port: 6379 },9])1011const reserveScript = readFileSync("./reserve_inventory.lua", "utf-8")1213interface ReservationResult {14  success: boolean15  reservation_id?: string16  remaining?: number17  error?: string18}1920export async function reserveInventory(21  productId: string,22  userId: string,23  quantity: number,24  ttlSeconds: number = 300,25): Promise<ReservationResult> {26  const reservationId = `res_${Date.now()}_${userId}`2728  const result = (await redis.eval(29    reserveScript,30    2, // number of keys31    `inventory:${productId}`,32    `reserved:${productId}`,33    userId,34    quantity.toString(),35    reservationId,36    ttlSeconds.toString(),37  )) as any3839  if (result.err) {40    return { success: false, error: result.err }41  }4243  return {44    success: true,45    reservation_id: reservationId,46    remaining: result.remaining,47  }48}4950export async function releaseReservation(productId: string, reservationId: string): Promise<void> {51  // Called when checkout times out or user abandons52  const reserved = await redis.hget(`reserved:${productId}`, reservationId)53  if (reserved) {54    const { quantity } = JSON.parse(reserved)55    await redis.incrby(`inventory:${productId}`, quantity)56    await redis.hdel(`reserved:${productId}`, reservationId)57  }58}5960export async function confirmReservation(productId: string, reservationId: string): Promise<void> {61  // Called after successful payment - just remove from reserved set62  await redis.hdel(`reserved:${productId}`, reservationId)63}

Reservation lifecycle:

Reservation state machine — Available → Reserved → Confirmed, with timeout returning stock to Available.

End-to-end reservation flow:

Inventory reservation flow — admitted user → Lua single-flight reserve → idempotent enqueue → worker confirms or releases. — Inventory reservation flow — the Lua reserve is single-flight; commit/release happens out-of-band on the order worker, with TTL as a safety net.

Design decisions:

Decision	Rationale
Lua script	Atomic read-check-decrement prevents race conditions
Redis Cluster	Horizontal scaling for high throughput
Reservation with TTL	Prevents inventory lock-up from abandoned checkouts
Hash for reservations	O(1) lookup/delete by reservation ID

Order Processing (Async Queue)

Orders are placed on a durable queue for async processing. This decouples order receipt from processing, preventing database overwhelm.

Order submission flow:

1// order-service.ts2import { SQSClient, SendMessageCommand } from "@aws-sdk/client-sqs"3import { v4 as uuid } from "uuid"45const sqs = new SQSClient({})6const ORDER_QUEUE_URL = process.env.ORDER_QUEUE_URL!78interface OrderRequest {9  session_id: string10  user_id: string11  product_id: string12  quantity: number13  shipping_address: Address14  payment_method_id: string15}1617export async function submitOrder(request: OrderRequest): Promise<{ order_id: string }> {18  const orderId = uuid()19  const idempotencyKey = `${request.user_id}:${request.session_id}`2021  // Check for duplicate submission22  const existing = await db.orders.findOne({ idempotency_key: idempotencyKey })23  if (existing) {24    return { order_id: existing.id }25  }2627  // Create order record in pending state28  await db.orders.insert({29    id: orderId,30    user_id: request.user_id,31    product_id: request.product_id,32    quantity: request.quantity,33    status: "pending",34    idempotency_key: idempotencyKey,35    created_at: new Date(),36  })3738  // Queue for async processing39  await sqs.send(40    new SendMessageCommand({41      QueueUrl: ORDER_QUEUE_URL,42      MessageBody: JSON.stringify({43        order_id: orderId,44        ...request,45      }),46      MessageDeduplicationId: idempotencyKey,47      MessageGroupId: request.user_id, // Ensures per-user ordering48    }),49  )5051  return { order_id: orderId }52}

Order processor (worker):

1// order-processor.ts2import { SQSEvent, SQSRecord } from "aws-lambda"3import Stripe from "stripe"45const stripe = new Stripe(process.env.STRIPE_SECRET_KEY!)67interface OrderMessage {8  order_id: string9  user_id: string10  product_id: string11  quantity: number12  shipping_address: Address13  payment_method_id: string14  session_id: string15}1617export async function handler(event: SQSEvent): Promise<void> {18  for (const record of event.Records) {19    await processOrder(record)20  }21}2223async function processOrder(record: SQSRecord): Promise<void> {24  const message: OrderMessage = JSON.parse(record.body)2526  try {27    // 1. Verify reservation still valid28    const reservation = await getReservation(message.product_id, message.session_id)29    if (!reservation) {30      await markOrderFailed(message.order_id, "reservation_expired")31      return32    }3334    // 2. Process payment35    const paymentIntent = await stripe.paymentIntents.create({36      amount: calculateTotal(message.product_id, message.quantity),37      currency: "usd",38      payment_method: message.payment_method_id,39      confirm: true,40      idempotency_key: `payment_${message.order_id}`,41    })4243    if (paymentIntent.status !== "succeeded") {44      await releaseReservation(message.product_id, message.session_id)45      await markOrderFailed(message.order_id, "payment_failed")46      return47    }4849    // 3. Confirm inventory (remove from reserved set)50    await confirmReservation(message.product_id, message.session_id)5152    // 4. Update order status53    await db.orders.update(message.order_id, {54      status: "confirmed",55      payment_intent_id: paymentIntent.id,56      confirmed_at: new Date(),57    })5859    // 5. Send confirmation60    await sendOrderConfirmation(message.order_id)61  } catch (error) {62    // Let SQS retry with exponential backoff63    throw error64  }65}6667async function markOrderFailed(orderId: string, reason: string): Promise<void> {68  await db.orders.update(orderId, {69    status: "failed",70    failure_reason: reason,71  })7273  // Notify user74  await sendOrderFailureNotification(orderId, reason)75}

Dead letter queue handling:

Orders that fail after max retries go to a Dead Letter Queue (DLQ) for manual review:

1// dlq-processor.ts2export async function handleDeadLetter(record: SQSRecord): Promise<void> {3  const message = JSON.parse(record.body)45  // Log for investigation6  console.error("Order failed permanently", {7    order_id: message.order_id,8    attempts: record.attributes.ApproximateReceiveCount,9    error: record.attributes.DeadLetterQueueSourceArn,10  })1112  // Alert ops team13  await pagerduty.createIncident({14    title: `Flash sale order failed: ${message.order_id}`,15    severity: "high",16  })1718  // Release inventory back to pool19  await releaseReservation(message.product_id, message.session_id)20}

Design decisions:

Decision	Rationale
SQS FIFO + high-throughput	Exactly-once dedup (5-min window) and per-`MessageGroupId` ordering; high-throughput mode is required above ~3K msg/sec.
Idempotency key	Prevents duplicate orders on retry; mirrors the Stripe `Idempotency-Key` contract (24h replay window of the original response).
Payment before confirmation	Never confirm inventory without successful payment.
DLQ for failures	Ensures no order is silently lost; the DLQ handler must release inventory, not just log.

Bot Detection and Fairness

Multi-Layer Bot Defense

Bot defence — three layers: WAF / app fingerprint / queue-level duplicate and velocity checks.

Layer 1: Edge defense (WAF)

1# AWS WAF rules for flash sale2Rules:3  - Name: RateLimitPerIP4    Action: Block5    Statement:6      RateBasedStatement:7        Limit: 100 # requests per 5 minutes per IP8        AggregateKeyType: IP910  - Name: BlockKnownBots11    Action: Block12    Statement:13      IPSetReferenceStatement:14        ARN: arn:aws:wafv2:....:ipset/known-bots1516  - Name: GeoRestriction17    Action: Block18    Statement:19      NotStatement:20        Statement:21          GeoMatchStatement:22            CountryCodes: [US, CA, GB, DE] # Allowed countries

Layer 2: Application-level detection

1// bot-detection.ts2interface BotSignals {3  score: number4  signals: string[]5}67export function detectBot(request: Request): BotSignals {8  const signals: string[] = []9  let score = 01011  // Device fingerprint consistency12  const fp = request.headers.get("x-device-fingerprint")13  if (!fp || fp.length < 32) {14    signals.push("missing_fingerprint")15    score += 3016  }1718  // Behavioral signals19  const timing = parseTimingHeader(request)20  if (timing.pageLoadToAction < 500) {21    // < 500ms is suspicious22    signals.push("fast_interaction")23    score += 2524  }2526  // Browser consistency27  const ua = request.headers.get("user-agent")28  const acceptLang = request.headers.get("accept-language")29  if (isHeadlessBrowser(ua) || !acceptLang) {30    signals.push("headless_indicators")31    score += 4032  }3334  // Known residential proxy detection35  const ip = getClientIP(request)36  if (await isResidentialProxy(ip)) {37    signals.push("residential_proxy")38    score += 2039  }4041  return { score, signals }42}4344export function shouldChallenge(signals: BotSignals): boolean {45  return signals.score >= 5046}4748export function shouldBlock(signals: BotSignals): boolean {49  return signals.score >= 8050}

Layer 3: Queue-level protection

1// queue-protection.ts2export async function validateQueueJoin(3  userId: string,4  deviceFingerprint: string,5  saleId: string,6): Promise<{ allowed: boolean; reason?: string }> {7  // Check for duplicate user8  const existingEntry = await findUserInQueue(saleId, userId)9  if (existingEntry) {10    return { allowed: false, reason: "already_in_queue" }11  }1213  // Check for fingerprint reuse (same device, different accounts)14  const fpCount = await countFingerprintInQueue(saleId, deviceFingerprint)15  if (fpCount >= 2) {16    return { allowed: false, reason: "device_limit_exceeded" }17  }1819  // Velocity check: how many queues has this user joined recently?20  const recentJoins = await countRecentQueueJoins(userId, 3600) // last hour21  if (recentJoins >= 5) {22    return { allowed: false, reason: "velocity_exceeded" }23  }2425  return { allowed: true }26}

Fairness Mechanisms

1. FIFO queue with randomized entry window

Users who arrive before sale start are randomized when the sale begins (prevents “refresh at exactly 10:00:00” advantage):

1export async function openSaleQueue(saleId: string): Promise<void> {2  // Get all users who arrived in pre-sale window (e.g., last 15 minutes)3  const earlyArrivals = await getEarlyArrivals(saleId)45  // Shuffle positions randomly6  const shuffled = shuffleArray(earlyArrivals)78  // Assign positions 1, 2, 3, ...9  for (let i = 0; i < shuffled.length; i++) {10    await updatePosition(shuffled[i].queue_ticket, i + 1)11  }1213  // Users arriving after sale start get position = current_max + 1 (true FIFO)14}

2. Per-customer purchase limits

1export async function validatePurchaseLimit(userId: string, productId: string, quantity: number): Promise<boolean> {2  const existingOrders = await db.orders.count({3    user_id: userId,4    product_id: productId,5    status: { $in: ["confirmed", "pending"] },6  })78  const LIMIT_PER_USER = 29  return existingOrders + quantity <= LIMIT_PER_USER10}

Frontend Considerations

Waiting Room UX

Critical UX decisions:

Decision	Implementation	Rationale
Progress indicator	Position + estimated time + progress bar	Reduces anxiety; users know they’re progressing
No refresh needed	SPA with polling	Prevents users from losing position
Transparent communication	Show exact position	Trust requires honesty
Graceful degradation	Static HTML	Must work even if JS fails

Optimistic UI for checkout:

1// checkout-ui.ts2async function submitOrder(orderData: OrderData): Promise<void> {3  // Optimistic: show "Processing..." immediately4  setOrderStatus("processing")5  showConfirmationPreview(orderData)67  try {8    const { order_id } = await api.submitOrder(orderData)910    // Poll for confirmation (async processing)11    pollOrderStatus(order_id, (status) => {12      if (status === "confirmed") {13        setOrderStatus("confirmed")14        showSuccessAnimation()15      } else if (status === "failed") {16        setOrderStatus("failed")17        showRetryOption()18      }19    })20  } catch (error) {21    // Revert optimistic UI22    setOrderStatus("error")23    showErrorMessage(error)24  }25}

Real-Time Queue Updates

Polling vs WebSocket decision:

Factor	Polling	WebSocket
Scale	Easy (stateless)	Hard (connection management)
Latency	5-10s	Sub-second
Infrastructure	Simple	Complex
Battery impact	Higher	Lower

Chosen: Adaptive polling — Poll every 5s when far from front; every 1s when close.

1function calculatePollInterval(position: number, totalAhead: number): number {2  const progressPercent = 1 - position / totalAhead34  if (progressPercent > 0.9) return 1000 // Top 10%: 1s5  if (progressPercent > 0.7) return 2000 // Top 30%: 2s6  if (progressPercent > 0.5) return 3000 // Top 50%: 3s7  return 5000 // Back 50%: 5s8}

Client State Management

1// flash-sale-state.ts2interface FlashSaleState {3  // Queue state4  queueTicket: string | null5  position: number | null6  status: "idle" | "queued" | "admitted" | "checkout" | "completed" | "expired"78  // Checkout state9  checkoutToken: string | null10  checkoutExpiresAt: Date | null11  reservationId: string | null1213  // Order state14  orderId: string | null15  orderStatus: "pending" | "processing" | "confirmed" | "failed" | null16}1718// State persisted to localStorage for tab recovery19function persistState(state: FlashSaleState): void {20  localStorage.setItem("flash-sale-state", JSON.stringify(state))21}2223// Restore on page load (handles accidental tab close)24function restoreState(): FlashSaleState | null {25  const saved = localStorage.getItem("flash-sale-state")26  if (!saved) return null2728  const state = JSON.parse(saved)2930  // Check if checkout token is still valid31  if (state.checkoutExpiresAt && new Date(state.checkoutExpiresAt) < new Date()) {32    return null // Expired, start fresh33  }3435  return state36}

Infrastructure Design

Cloud-Agnostic Components

Component	Purpose	Requirements
CDN	Waiting room, static assets	Edge caching, high throughput
Serverless compute	Queue service, APIs	Auto-scale, pay-per-use
Key-value store	Inventory counters, tokens	Sub-ms latency, atomic operations
Document store	Queue state	Single-digit ms, auto-scale
Message queue	Order processing	Durability, exactly-once
Relational DB	Orders, users	ACID, complex queries

AWS Reference Architecture

Service configuration:

Service	Configuration	Rationale
CloudFront	Origin: S3 (static), Cache: 1 year	Waiting room must survive origin failure
API Gateway	Throttling: 10K RPS, Burst: 5K	Protects backend during spike
Lambda	Memory: 1024MB, Timeout: 30s, Reserved: 1000	Predictable latency under load
ElastiCache	Redis Cluster, 3 nodes, r6g.large	Sub-ms latency, failover
DynamoDB	On-demand, Auto-scaling	Handles unpredictable load
SQS FIFO	High-throughput mode, 14-day retention	Order durability + per-user ordering
RDS	Multi-AZ, db.r6g.xlarge, Read replicas	ACID + read scaling

Self-Hosted Alternatives

Managed Service	Self-Hosted Option	Trade-off
ElastiCache	Redis Cluster on EC2	More control, operational burden
DynamoDB	Cassandra/ScyllaDB	Cost at scale, complexity
SQS FIFO	Kafka	Higher throughput, operational complexity
Lambda	Kubernetes + KEDA	Fine-grained control, cold starts

Note

The first-party AWS Virtual Waiting Room solution was retired in November 2025. New deployments should follow the SeatGeek-style reference architecture (Lambda + DynamoDB + ElastiCache) rather than the retired CloudFormation stack.

Production analogues

The architecture above is a synthesis of several published systems. Use these as concrete anchors when a design choice feels arbitrary.

System	Architecture shape	What to copy
SeatGeek	Edge gatekeeper + DynamoDB token tables + leaky-bucket admission counter; visitor token captures arrival timestamp, access token authorises the protected zone.	Two-stage tokens, leaky-bucket admission, DynamoDB Streams → Timestream for live ops dashboards.
Shopify	”Sorting Hat” Lua module in Nginx/OpenResty routes traffic to per-shop pods; checkout throttle is a leaky bucket implemented at the edge; signed cookie exempts admitted users from re-throttling for the rest of the session.	Edge-scriptable throttle, signed-cookie skip-the-queue, pod isolation so one viral shop can’t tank neighbours.
Shopify BFCM 2024-25	Peaked at 284M req/min on the edge and 80M req/min on app servers in BFCM 2024; BFCM 2025 reached 489M req/min on the edge per Shopify’s internal recap.	Plan for ≥3× your headline forecast; chaos-test with Toxiproxy and Game Days before the on-sale.
Ticketmaster Smart Queue	Public waiting room opens 15-30 min before sale; queue position is randomised when the sale opens; ~10 min checkout hold once at front; aggressive bot mitigation including Verified Fan pre-registration codes.	Pre-sale randomisation window, hard checkout hold, presale invite codes for the highest-demand events.
Alibaba Tmall Singles Day	Tair (Redis-compatible) for atomic inventory deduction with sharded SKUs, cell-based isolation, and traffic queuing for the hottest SKUs. Platform peaked at 583K orders/sec in 2020; Alibaba stopped publishing peak TPS after 2022.	Hot-SKU sharding, traffic queuing per product, cell-based capacity planning.
Cloudflare Waiting Room	Workers + Durable Objects at the edge; per-data-centre DOs aggregate to a single global DO every few seconds; admission state lives in an encrypted cookie carrying `bucketId` + `lastCheckInTime`.	Edge-only admission with no origin polling, eventually-consistent global counters, cookie-as-token.

Failure modes and operational implications

Flash sales fail in predictable shapes. Plan and rehearse for each.

Sale-day timeline — pre-warm, T0 admission, peak depletion, drain, and post-sale postmortem. — Sale-day timeline — most failures concentrate in the first five minutes after T0; the drain window is when DLQ work and reservation-release math show up.

Failure mode	Symptom	Containment
Hot key on a single SKU	Redis shard CPU saturates; p99 inventory check explodes	Bucket the SKU into N sub-keys, distribute decrements; pre-cache product detail pages at the edge; fall back to “queueing for stock” UI rather than 5xx.
Token gate misconfigured (admit too fast)	Backend overload behind the gate; cascading 5xx	Tie admission to live healthy-worker count, not a static rate. Shed traffic via a circuit breaker that returns “still in queue” rather than failed checkouts.
Token gate misconfigured (admit too slow)	Conversion drops; customer support floods with “stuck in queue”	Watch admission rate vs remaining-inventory ratio; alert when admission falls below `inventory / target_sale_duration`.
Reservation TTL too short	Users lose their seat mid-payment; refund and complaint volume spikes	Make reservation TTL > p99 of payment latency observed in dress rehearsal; extend TTL on user activity pings.
Reservation TTL too long	Inventory looks sold out while real users abandoned silently	Aggressive client heartbeat to release on tab close; shorter TTL with explicit “extend” call when user enters payment details.
SQS DLQ accumulating	Orders silently failing; no user-facing error	Alert on DLQ depth > 0 during a sale; auto-page; the DLQ handler must release inventory and notify the user, not just log.
Payment provider degraded	Checkout latency spikes; payment confirmations time out	Circuit-break payment calls; queue the order with a “payment retry” status; communicate honestly in the UI (“your spot is held; we’ll retry payment”).
Bot wave overwhelms WAF	Legitimate users see 429 or CAPTCHA storms	Pre-warm WAF rules; rate-limit per device fingerprint not just per IP; have a “raise the difficulty” lever (require 2FA, presale code, or Verified Fan) ready to flip.
CDN origin shield miss	Spike at origin for the waiting room HTML	Pre-warm CDN with the exact waiting-room asset; pin a long edge TTL; serve a stale-while-revalidate fallback if origin dies.

Caution

Pre-rehearse the on-sale with a realistic load test. Both Shopify Game Days and Alibaba’s PTS (Performance Testing Service) are public examples. A flash sale is the worst time to discover that your queue depth metric is wrong.

Observability under spike

The metric set you actually need on the war-room dashboard is small and ratio-based. Counters lie when traffic doubles; ratios survive.

Signal	What you watch	What you do
`admission_rate / target_admission_rate`	Drift below 0.7 for >1 min	Open the gate further, or check downstream worker health (cause is almost never the queue).
`reserved_inventory / total_inventory`	Should monotonically rise; flat = workers stalled	Page the order-worker oncall before the user-facing dashboard catches up.
`redis.hot_key_ops / redis.cluster_ops`	A single key crossing ~30% of cluster ops is hot	Flip to the sharded-key path; raise the WAF challenge difficulty.
`queue.dlq_depth`	Any non-zero during a sale = user money silently failing	Auto-page; DLQ handler must release inventory and notify the user.
`payment.p99` and `payment.error_rate`	Compare against the dress-rehearsal baseline, not absolute SLOs	Trip the payment circuit breaker; shift to “your spot is held” UI rather than failing checkouts.
`oversell_count` (= `confirmed_orders − inventory`)	Must be 0 by construction; alert on >0 immediately	This is the contract. If it ever fires, freeze the sale and reconcile manually.
`cdn.origin_shield_miss_rate` for waiting-room HTML	Should be ~0 once warmed	Re-pin the asset, raise edge TTL, serve stale-while-revalidate.

Graceful degradation, in priority order:

Shed the cheapest thing first. Increase poll interval, drop ornamental queue-position estimates, hide product imagery — keep the admission decision honest.
Trade UX for correctness. Show “your spot is held; we are retrying payment” rather than failing the checkout when the payment provider is degraded.
Raise the friction lever. Force CAPTCHA or re-auth when bot scores climb; require a presale code or Verified-Fan style gate when the WAF is at capacity. Communicate that you are doing this — silence is what destroys trust, not the friction.
Never undo a successful inventory reserve to make the UI nicer. Release on TTL or explicit cancel, never on a UI timeout.

Variations

Path B Implementation: Real-Time Counter Model

For e-commerce with dynamic inventory, replace token-based admission with real-time inventory checks:

1// real-time-inventory.ts2export async function attemptPurchase(3  productId: string,4  userId: string,5  quantity: number,6): Promise<{ success: boolean; orderId?: string }> {7  // Rate limit first (protect backend)8  const allowed = await rateLimiter.check(userId, "purchase")9  if (!allowed) {10    return { success: false }11  }1213  // Atomic inventory check + decrement14  const result = await redis.eval(15    `16    local count = redis.call('GET', KEYS[1])17    if tonumber(count) >= tonumber(ARGV[1]) then18      return redis.call('DECRBY', KEYS[1], ARGV[1])19    else20      return -121    end22  `,23    1,24    `inventory:${productId}`,25    quantity,26  )2728  if (result < 0) {29    return { success: false } // Sold out30  }3132  // Proceed to order (inventory already decremented)33  const orderId = await createOrder(productId, userId, quantity)34  return { success: true, orderId }35}

Key difference: Inventory decremented at purchase attempt, not at queue admission. Higher risk of “sold out after waiting” but supports dynamic restocking.

VIP Early Access

Add priority tiers to queue service:

1// vip-queue.ts2interface QueueEntry {3  // ... existing fields4  tier: "vip" | "member" | "standard"5  tierJoinedAt: Date6}78export async function getNextPosition(saleId: string, tier: string): Promise<number> {9  // VIPs get positions 1-1000, members 1001-10000, standard 10001+10  const tierOffsets = { vip: 0, member: 1000, standard: 10000 }11  const offset = tierOffsets[tier]1213  const countInTier = await ddb.query({14    TableName: "FlashSaleQueue",15    KeyConditionExpression: "sale_id = :sid",16    FilterExpression: "tier = :tier",17    ExpressionAttributeValues: { ":sid": saleId, ":tier": tier },18  })1920  return offset + (countInTier.Count || 0) + 121}

Raffle-Based Allocation

For extremely limited inventory (e.g., 100 items, 1M users), replace queue with raffle:

1// raffle-mode.ts2export async function enterRaffle(saleId: string, userId: string): Promise<void> {3  // Entry window: 1 hour before draw4  await ddb.put({5    TableName: "FlashSaleRaffle",6    Item: {7      sale_id: saleId,8      user_id: userId,9      entry_id: uuid(),10      entered_at: new Date().toISOString(),11    },12  })13}1415export async function drawWinners(saleId: string, count: number): Promise<string[]> {16  // Get all entries17  const entries = await getAllEntries(saleId)1819  // Cryptographically random selection20  const shuffled = cryptoShuffle(entries)21  const winners = shuffled.slice(0, count)2223  // Grant checkout tokens to winners24  for (const winner of winners) {25    await grantCheckoutToken(winner.user_id, saleId)26  }2728  return winners.map((w) => w.user_id)29}

Conclusion

Flash sale systems require coordinated defense at every layer:

Traffic absorption: CDN-hosted waiting room prevents backend overwhelm. Static HTML + client-side polling scales infinitely at the edge.
Fair admission: Token-based queue management (Path A) guarantees purchase opportunity. FIFO with randomized early arrival prevents “refresh race.”
Inventory accuracy: Redis Lua scripts provide atomic check-and-decrement. Zero overselling through construction, not hope.
Order durability: Async processing via SQS decouples order receipt from processing. DLQ ensures no order is silently lost.
Bot defense: Multi-layer detection (WAF → behavioral → queue-level) raises the bar for attackers without blocking legitimate users.

What this design optimizes for:

Zero overselling (100% inventory accuracy)
Fairness (transparent queue position)
Durability (no lost orders)
Scalability (1M+ concurrent users)

What it sacrifices:

Latency (queue wait time)
Simplicity (multiple coordinated services)
Dynamic inventory (pre-allocation model)

Known limitations:

Token expiration requires careful tuning (too short: frustrated users; too long: wasted inventory)
Sophisticated bots with residential proxies remain challenging
VIP tiers can feel unfair to standard users

Appendix

Prerequisites

Distributed systems fundamentals (CAP theorem, consistency models)
Queue theory basics (FIFO, rate limiting)
Redis data structures and Lua scripting
Message queue patterns (at-least-once, exactly-once)
Payment processing (idempotency, webhooks)

Summary

Flash sales require a waiting room → token gate → atomic inventory → async order queue architecture
CDN-hosted waiting room absorbs traffic spikes cheaply and reliably
Token-based admission (Path A) guarantees purchase opportunity and prevents overselling by construction
Redis Lua scripts provide atomic inventory operations at 500K+ ops/second
Async order processing via message queues decouples order receipt from fulfillment
Multi-layer bot defense (WAF + behavioral + queue-level) raises attack cost without blocking legitimate users

References

Alibaba Cloud: system stability for large-scale flash sales — Tmall Singles Day architecture and the 583K orders/sec peak.
Alibaba Cloud: build a flash-sale system on Tair (Redis) — HMGET + HINCRBY Lua pattern, instance-level QPS numbers.
Alibaba Cloud: identify and handle hot keys — bucketing strategies for single-shard contention.
SeatGeek virtual waiting room on AWS — visitor + access tokens, leaky-bucket admission, DynamoDB Streams analytics.
Shopify: surviving high-write flash sales with scriptable load balancers — edge-level checkout throttle, signed-cookie skip-the-queue.
Shopify: how we prepare for BFCM 2025 — capacity planning, Toxiproxy chaos, BFCM 2024 peak metrics.
Ticketmaster: how the Smart Queue works — randomised entry, checkout hold window, bot mitigation tiers.
AWS Prime Day 2024 metrics — DynamoDB at 146M req/sec, CloudFront ≥500M req/min.
Cloudflare: building Waiting Room on Workers and Durable Objects — per-PoP DO aggregation to a global DO, cookie-bound admission, eventually-consistent counters.
Stripe: designing robust APIs with idempotency — Idempotency-Key semantics, response replay, 24h window.
Stripe: scaling APIs with rate limiters — token bucket on Redis; per-user vs per-API quotas.
AWS: SQS FIFO exactly-once processing — MessageDeduplicationId semantics and the 5-minute dedup window.
AWS: DynamoDB write sharding — partition-key sharding strategies for hot items.
Redis: programmability — EVAL atomicity — script atomicity guarantees.
Redis: distributed lock patterns — when a Lua script is not enough.
AWS Virtual Waiting Room solution (retired Nov 2025) — the legacy first-party reference; do not deploy new stacks of it.
Martin Kleppmann: Designing Data-Intensive Applications — distributed-systems fundamentals.