Design Amazon Shopping Cart

A system design for an e-commerce shopping cart that survives flash sales: millions of concurrent shoppers, real-time inventory, dynamic pricing, and a distributed checkout that must be atomic without distributed locks. The thesis: the cart is the canonical “always writeable” workload from the Dynamo paper¹ — solve write availability with replication and semantic conflict resolution at read time, solve inventory contention with soft reservations, and solve checkout atomicity with an idempotent saga and per-step compensations. Accept eventual consistency everywhere except checkout.

High-level architecture: client apps, edge/gateway, application services (cart, pricing, inventory, checkout, payment, notifications), data layer (Redis cluster + cart/inventory/order DBs), and async cart-expiry / abandoned-cart workers behind a queue. — High-level architecture: client apps, edge/gateway, application services, data layer, and async processing for cart expiration and abandoned-cart recovery.

Abstract

Shopping cart systems sit between three loosely-coupled but easily-conflicting concerns: cart state persistence (guest vs authenticated, cross-device, merge on login), inventory accuracy (avoid overselling without tanking availability), and checkout atomicity (payment + inventory deduction + order creation across independent services).

Five architectural decisions drive the rest of the design:

Always-writeable cart — Dynamo’s foundational design choice (SOSP 2007 §4.4–§6.1) was to never reject a cart write; conflicts are reconciled by the application at read time via semantic merge¹. We adopt the same primitive even on a relational backing store: idempotent line-level (cart_id, sku, variant) upserts, monotonic line versions, and a deterministic merge function.
Two-tier cart storage — Redis for sub-millisecond reads on the hot browse/edit path; a relational store (PostgreSQL) for durability, merge queries, and multi-device sync. A pure Dynamo-style KV (DynamoDB, Cassandra, Riak, ScyllaDB) is the alternative — see Storage Model for when to flip the default.
Soft reservations with TTL — reserve units on add-to-cart with a short expiry (typically 5–15 min, per Microsoft Dynamics 365 inventory guidance); convert to a hard reservation only after payment is authorized. Reservation is the seam between the loosely-consistent cart and the strongly-consistent ledger.
Saga for checkout — orchestrated sequence of validate → authorize → reserve → create order → capture, with explicit compensating transactions on each failure. The saga concept is from Garcia-Molina & Salem’s 1987 SIGMOD paper on long-lived transactions².
Eventually consistent display, strongly consistent checkout — accept stale stock counts on product pages and recover at checkout. This is exactly Vogels’ “eventually consistent” trade-off applied per operation type³.

Dimension	Optimizes for	Sacrifices
Redis cart cache	Read latency (<1 ms)	Durability (requires DB persistence)
Soft reservations	Inventory turnover	Checkout may fail if reservation expired
Saga orchestration	Reliability, debuggability	Latency (sequential steps + compensation)
Hash-shard by `cart_id`	Even load distribution	Cross-user analytics queries

Requirements

Functional Requirements

Feature	Scope	Notes
Add/remove/update cart items	Core	Real-time quantity validation against inventory
Guest cart with persistence	Core	Survives browser close, 30-day expiry
Cart merge on login	Core	Combine guest + user cart, resolve quantity conflicts
Real-time price updates	Core	Price at checkout reflects current price, not add-time
Inventory soft reservation	Core	Prevent checkout failures from out-of-stock
Coupon/promotion application	Core	Stackable rules, exclusivity handling
Multi-step checkout	Core	Address → Payment → Review → Confirm
Abandoned cart recovery	Extended	Email/push notifications with cart link
Wishlist / Save for Later	Extended	Move items between cart and wishlist

Non-Functional Requirements

Requirement	Target	Rationale
Availability	99.99% (4 nines)	Revenue-critical — every minute of downtime is direct loss
Cart read latency	p99 < 50 ms	Instant feedback on cart interactions
Cart write latency	p99 < 200 ms	Acceptable for add/remove operations
Checkout latency	p99 < 3 s	End-to-end including payment authorization
Peak throughput	100K cart ops/sec	Flash sale scenarios
Data durability	99.999999999% (11 nines)	Cart loss is a customer-trust event
Consistency	Eventual (display), Strong (checkout)	Hybrid model per operation type

Scale Estimation

These are illustrative back-of-envelope numbers calibrated against published Shopify and Amazon scale points; they are not Amazon’s real numbers (which Amazon does not publish for cart subsystems). For anchoring: Shopify reported peak BFCM 2024 sales of ~$4.6M/min and ~80M app-server requests/min, with internal scale tests reaching 80,000+ checkouts/min⁴. AWS reports DynamoDB peaked at 126M req/s during Prime Day 2023 and 146M req/s during Prime Day 2024 across Amazon retail’s many DynamoDB tables, while sustaining single-digit millisecond reads⁵. Cart is one of many tables that contributes to that envelope, not the whole of it.

Users

DAU: 50M
Peak concurrent users: 5M (10% of DAU during a flash sale)
Active carts per user: 1

Traffic

Cart views: 50M DAU × 10 views/day = 500M/day ≈ 6K RPS average
Cart modifications: 50M DAU × 3 ops/day = 150M/day ≈ 1.7K RPS average
Peak multiplier (flash sale): 50× → 300K cart views/sec, 85K modifications/sec
Checkouts: 5M/day ≈ 60/sec average, 3K/sec peak

Storage

Cart record: ~2 KB (metadata + 10 items average)
Active carts: 50M × 2 KB ≈ 100 GB
Historical (90-day retention): ~300 GB
With 3× replication: ~1 TB total

Inventory

SKUs: 100M products
Inventory checks: 500M/day (1 per cart view)
Reservation writes: 150M/day

Design Paths

A “shopping cart” can mean very different systems depending on inventory criticality. Two archetypes bracket the space.

Path A: Consistency-First (financial / scarce inventory)

Best when:

High-value items where overselling has material cost (electronics, luxury goods).
Strictly limited inventory (concert tickets, limited drops).
Regulatory or contractual inventory accuracy requirements.

Architecture:

Strong consistency for every inventory operation (single-leader writes, synchronous replication).
Synchronous inventory checks before cart-add succeeds.
Pessimistic locking (or serializable isolation) during checkout.

Trade-offs:

✅ Zero overselling.
✅ Inventory counts are accurate everywhere.
❌ Higher latency from lock contention.
❌ Lower throughput under burst load.
❌ Checkout-failure rate climbs during traffic spikes (lock timeouts).

Real-world example: Ticketing platforms commonly serialize seat selection through a virtual waiting room when demand exceeds inventory, precisely to enforce strong consistency on a tiny pool of reserved seats — see Queue-it: overselling prevention for the operational pattern.

Path B: Availability-First (high-volume retail)

Best when:

Inventory buffers are large enough that overselling is rare.
Customer-perceived speed matters more than perfect accuracy.
The business can afford to compensate the rare oversold customer (refund + apology + alternative).

Architecture:

Eventually consistent inventory reads (replicas, caches).
Optimistic updates; conflict resolution at checkout.
Tolerate occasional overselling, recover via backorder or compensation flow.

Trade-offs:

✅ Sub-millisecond cart operations.
✅ Graceful behavior under extreme bursts.
✅ Fewer abandoned carts from latency stalls.
❌ Occasional overselling (low-single-digit-percent worst case on flash sales).
❌ Requires a robust customer-recovery workflow (refund, apology, restock alert).

Real-world example: Amazon’s foundational Dynamo paper (SOSP 2007) explicitly chose this side of the trade-off: shopping cart was the motivating use case, the system was designed to be “always writeable”, and conflicts (e.g. concurrent “add to cart” from different replicas) were resolved later via semantic merging — never by rejecting the write¹.

Path Comparison

Factor	Path A: Consistency-First	Path B: Availability-First
Inventory accuracy	100%	99.9%+
Cart-add latency	50–200 ms	< 10 ms
Peak throughput	~10K ops/sec	100K+ ops/sec
Checkout failure rate	Higher (lock timeouts)	Lower (optimistic)
Operational complexity	Lower	Higher (compensation flows)
Best for	Tickets, luxury, limited	General retail, commodities

This article’s focus

The rest of this article designs Path B (Availability-First) because that’s the regime Amazon, Shopify, Walmart, and most large retailers actually run. Path-A patterns are called out where the inventory criticality flips.

Storage Model: Dynamo-Style KV vs Two-Tier RDBMS

The cart write path is the design’s center of gravity. Two storage philosophies bracket what’s reasonable, and the choice changes how every other component (merge, multi-device sync, idempotency, failure modes) is implemented.

The Dynamo thesis: never reject a cart write

The original Dynamo: Amazon’s Highly Available Key-value Store paper (SOSP 2007) was motivated explicitly by the shopping cart¹. The hard requirement was that an add to cart must succeed under network partitions, replica failures, and even data-center outages — “always writeable” is the literal phrase used in §1¹. To get that, Dynamo gives up linearizability:

Replication. Every key is replicated to N nodes on a consistent-hash ring, with tunable read/write quorums (R, W). The cart used N=3, R=2, W=2 historically, but W=1 with sloppy quorum and hinted handoff is the regime that makes the system “always writeable” during partitions[^dynamo §4.5].
Object versioning. Every write produces a new immutable version stamped with a vector clock of (node, counter) pairs (§4.4). Concurrent versions are not collapsed.
Reconciliation on read. When a read sees versions whose vector clocks are concurrent (neither descends from the other), Dynamo returns all of them as siblings and lets the application merge them (§4.4 + §6.1). The merged version is written back with a vector clock that descends from all siblings.
Cart-specific merge. The shopping cart’s reconciliation function is a set union of items, with max over per-SKU quantity (§6.1). The paper is explicit that this can resurrect items the user just removed — that is the trade Amazon accepted to never lose an add (§6.1, “the only side effect”).

Important

Vogels’ 2008/2009 Eventually Consistent article makes the same point in plain English: Amazon’s cart is built so that adds keep succeeding through a network partition even when the original cart sits on an unreachable partition; the application reconciles divergent versions once the partition heals³. Reach for this article whenever you need to convince a reviewer that consistency is an SLA per operation, not a property of the system.

Dynamo write path under a partition — concurrent vector clocks become siblings; the application reconciles with a union+max merge.

Why Cassandra LWW is the wrong shape for a cart

Cassandra is the most common Dynamo-inspired store engineers reach for, but its conflict-resolution model differs in a way that is fatal for cart correctness if you ignore it:

Property	Dynamo / Riak / DynamoDB	Cassandra
Causality tracking	Vector clocks (per object)	None at the value level
Conflict resolution	Application-level sibling merge	Last-Write-Wins by client/coordinator timestamp
Granularity	Whole object	Per column
Behavior on concurrent updates	Both versions preserved as siblings	One version silently overwrites the other (highest µs wins)
Clock dependency	Tolerant (causality, not time)	Hard — NTP skew can cause silent data loss

Cassandra’s per-column LWW is well documented in its data-model docs and is the dominant cause of “we lost an item from the cart” bugs in cart implementations on Cassandra-family stores⁶. The mitigations are real but unergonomic: encode the cart as a set<frozen<line>> so each item is its own column (LWW resolves per line, not per cart), or carry a CRDT (e.g. an OR-Set) on top. Either way you are reinventing siblings.

DynamoDB by default also resolves at the item level with LWW on the write — it doesn’t expose vector clocks. To recover Dynamo’s sibling semantics on DynamoDB you do one of:

Per-line items. Model each cart line as its own item (PK = cart_id, SK = sku#variant), use idempotent UpdateItem with conditional expressions on a monotonic version, and run an application-side merge over the full collection. This is what most production carts on DynamoDB look like in 2026.
CRDT layer. Store a serialized OR-Set (observed-remove set) per cart and reconcile in the client. CRDTs avoid the “deletion resurrection” problem of the original Dynamo cart at the cost of metadata size.
Multi-region strong consistency. DynamoDB Global Tables shipped a multi-region strong-consistency mode that gives linearizable cross-region reads at the cost of write latency — useful for the checkout ledger, not the cart hot path⁷.

Why this article picks two-tier RDBMS

The reference design here uses PostgreSQL + Redis rather than a Dynamo-family KV. The trade-offs:

Criterion	Dynamo-style KV (DynamoDB / Cassandra)	Two-tier RDBMS (PostgreSQL + Redis)
Always-writeable under partition	Native (sloppy quorum, hinted handoff)	Achieved via async DB write + Redis primary
Multi-region multi-active	Native (Global Tables)	Bolt-on (logical replication, conflict CDC)
Cart merge implementation	App-level sibling reconciliation	Idempotent line upserts on `(cart_id, sku, variant)`
Coupon / promotion joins	Awkward (denormalize or move out of KV)	Native SQL joins
Operational familiarity	Specialized	Mainstream
Storage cost at 100M+ carts	Lower (no joins, predictable per-item)	Higher
Right answer when	Multi-region active-active is mandatory; team owns the merge function	Single-region or active-passive; cart is one slice of a richer relational model

We pick the two-tier RDBMS variant because it minimizes cognitive load for the rest of the article — but every API and data-flow primitive below (idempotency keys, line versioning, deterministic merge) is the one you need on Dynamo-style storage too. Where the choice changes a section materially, it’s called out inline.

High-Level Design

Service Architecture

Cart Service

Manages cart lifecycle: creation, item management, persistence, and merge.

Responsibilities:

CRUD on cart items.
Guest cart token generation and lookup.
Cart merge on user authentication.
Coordinating price + availability validation.
Scheduling cart expiration.

Add-to-cart flow:

Sequence: client POST /cart/items → Cart Service checks availability via Inventory Service, places a 5-minute soft reservation, fetches current price from Pricing Service, writes to Redis, persists to the cart DB asynchronously, returns 200. — Add-to-cart request flow — synchronous availability + soft reservation, asynchronous durability.

Note

Persisting the cart to the relational store asynchronously is what gets the write latency under 200 ms. The trade-off: a Redis-only window of a few hundred milliseconds where a cart write can be lost. Acceptable for cart deltas; not acceptable for the order itself, which is written synchronously inside the saga.

Inventory Service

Manages stock levels, reservations, and per-warehouse allocation.

Key concepts:

Available-to-Sell (ATS): physical on-hand inventory minus hard reservations (and any safety-stock buffer). This is what the storefront should advertise as “in stock” — see Fluent OMS: ATS vs ATP.
Available-to-Promise (ATP): ATS adjusted for incoming supply and existing commitments — used for “ships in N days” promises.
Available-for-Reservation (AFR, project-internal): ATS minus soft reservations — the number that determines whether add to cart succeeds.
Soft reservation: temporary hold with TTL, auto-expires, never debits physical stock.
Hard reservation: committed allocation after payment authorization, triggers fulfillment.

Reservation state machine:

State machine: Available → SoftReserved (add to cart) → HardReserved (payment authorized) → Allocated (pick ticket created) → Shipped. SoftReserved → Available on TTL or item removal; HardReserved → Available on order cancellation (compensation). — Reservation lifecycle — soft reservation auto-expires; hard reservation only released by an explicit compensating action.

Checkout Orchestrator

Coordinates the multi-step checkout using the saga pattern². The orchestrator owns the workflow state and drives each downstream service explicitly — choreography (each service emitting events that others react to) is harder to reason about for a payment-handling flow with mandatory compensations⁸.

Saga steps:

Validate cart — items still in stock, prices unchanged within tolerance.
Authorize payment — place a hold on the payment method.
Convert reservations — soft → hard for every cart line.
Create order — generate the order record (PostgreSQL, ACID).
Capture payment — settle the authorized amount.
Trigger fulfillment — publish to warehouse.

Compensating actions (run in reverse on failure):

Failed step	Compensation
Reservation conversion failed	Void payment authorization
Order creation failed	Release hard reservations, void authorization
Payment capture failed	Mark order `payment_failed`, release reservations

Pricing Service

Evaluates price rules, promotions, and coupons in real time.

Rule evaluation order:

Base price (catalog).
Sale price (time-based overrides).
Quantity discounts (buy 3, get 10% off).
Coupon codes (user-applied).
Cart-level promotions (free shipping over $50).
Loyalty / member pricing.

Conflict resolution:

Each promotion declares an exclusive flag and a priority integer.
Default mode evaluates in priority order, dropping promotions blocked by an active exclusive.
An optional best-for-customer mode tries combinations and returns the maximum total discount; this is more expensive (combinatorial within a small N) and should only run when the catalog explicitly enables it on the SKU.

Warning

“Best for customer” is tempting marketing copy but a debugging nightmare in production. It makes promotion stacking non-deterministic from the customer’s perspective (“why did I get 15% but my colleague got 20% on the same cart yesterday?”). Default to deterministic priority ordering.

API Design

Cart Endpoints

Get Cart

1GET /api/v1/cart2Authorization: Bearer {token} | X-Guest-Token: {guest_token}

Response (200 OK):

1{2  "cart_id": "cart_abc123",3  "user_id": "user_xyz789",4  "items": [5    {6      "item_id": "item_001",7      "product_id": "prod_12345",8      "product_name": "Wireless Headphones",9      "variant_id": "var_black_medium",10      "quantity": 2,11      "unit_price": 79.99,12      "line_total": 159.98,13      "image_url": "https://cdn.example.com/headphones.jpg",14      "availability": {15        "status": "in_stock",16        "quantity_available": 45,17        "reservation_expires_at": "2024-01-15T10:35:00Z"18      },19      "applied_promotions": [20        {21          "promotion_id": "promo_winter_sale",22          "name": "Winter Sale 20% Off",23          "discount_amount": 31.9924        }25      ]26    }27  ],28  "summary": {29    "subtotal": 159.98,30    "discount_total": 31.99,31    "shipping_estimate": 0.0,32    "tax_estimate": 10.24,33    "total": 138.2334  },35  "applied_coupons": [],36  "created_at": "2024-01-15T09:00:00Z",37  "updated_at": "2024-01-15T10:30:00Z",38  "expires_at": "2024-02-14T09:00:00Z"39}

Design decisions:

availability embedded per item — the frontend can show stock warnings without a second round-trip.
reservation_expires_at exposed — the client can render a countdown to encourage checkout.
summary server-calculated — avoids client-side rounding drift between display and final charge.
No pagination — a real cart rarely exceeds 50 items; the full payload is < 10 KB.

Add Item to Cart

1POST /api/v1/cart/items2Authorization: Bearer {token} | X-Guest-Token: {guest_token}3Content-Type: application/json4Idempotency-Key: {uuid}

Request:

1{2  "product_id": "prod_12345",3  "variant_id": "var_black_medium",4  "quantity": 25}

Response (201 Created):

1{2  "item_id": "item_001",3  "product_id": "prod_12345",4  "quantity": 2,5  "unit_price": 79.99,6  "line_total": 159.98,7  "reservation": {8    "reservation_id": "res_abc123",9    "expires_at": "2024-01-15T10:35:00Z"10  },11  "cart_summary": {12    "item_count": 2,13    "subtotal": 159.98,14    "total": 138.2315  }16}

Error responses:

Status	Condition	Body
400	Invalid product/variant ID	`{"error": "INVALID_PRODUCT", "message": "Product not found"}`
409	Insufficient inventory	`{"error": "INSUFFICIENT_STOCK", "available": 1, "requested": 2}`
409	Duplicate add (idempotency)	Returns the original cached response
429	Rate-limit exceeded	`{"error": "RATE_LIMITED", "retry_after": 60}`

Rate limit: 60 req/min per user — defends against cart-bombing and inventory-probing scrapers.

Update Item Quantity

1PATCH /api/v1/cart/items/{item_id}

1{2  "quantity": 33}

Behavior:

quantity: 0 removes the item.
Validates against current ATS - reservations.
Adjusts the soft reservation: extends TTL on increase, releases the delta on decrease.

Apply Coupon

1POST /api/v1/cart/coupons

1{2  "code": "SAVE20"3}

Response (200 OK):

1{2  "coupon": {3    "code": "SAVE20",4    "description": "20% off your order",5    "discount_type": "percentage",6    "discount_value": 20,7    "applied_discount": 27.998  },9  "cart_summary": {10    "subtotal": 159.98,11    "discount_total": 59.98,12    "total": 110.2413  }14}

Error responses:

Status	Condition
400	Invalid / expired coupon
409	Coupon not combinable with existing promotions
409	Minimum order value not met

Checkout Endpoints

Initialize Checkout

1POST /api/v1/checkout2Authorization: Bearer {token}

1{2  "cart_id": "cart_abc123"3}

Response (201 Created):

1{2  "checkout_id": "checkout_xyz789",3  "status": "pending",4  "cart_snapshot": {5    "items": [],6    "summary": {}7  },8  "required_steps": ["address", "payment", "review"],9  "completed_steps": [],10  "expires_at": "2024-01-15T11:00:00Z"11}

Design decisions:

cart_snapshot is captured at init — prices and quantities are locked for the checkout duration.
expires_at is enforced — a 30-minute checkout session prevents indefinite holding of hard reservations.
required_steps is server-driven — enables A/B testing the checkout flow without a client release.

Submit Shipping Address

1PUT /api/v1/checkout/{checkout_id}/address

1{2  "shipping_address": {3    "name": "John Doe",4    "line1": "123 Main St",5    "line2": "Apt 4B",6    "city": "Seattle",7    "state": "WA",8    "postal_code": "98101",9    "country": "US",10    "phone": "+1-206-555-0123"11  },12  "billing_same_as_shipping": true13}

Response includes: validated/normalized address, updated shipping options with real costs, destination-based tax.

Submit Payment and Complete

1POST /api/v1/checkout/{checkout_id}/complete2Idempotency-Key: {uuid}

1{2  "payment_method_id": "pm_card_visa_4242",3  "accept_terms": true4}

Response (201 Created):

1{2  "order_id": "order_abc123",3  "status": "confirmed",4  "confirmation_number": "AMZ-2024-ABC123",5  "estimated_delivery": "2024-01-18",6  "total_charged": 138.23,7  "payment": {8    "method": "Visa ending in 4242",9    "transaction_id": "txn_xyz789"10  }11}

Important

The Idempotency-Key is the single most important header on this endpoint. The same key within the retention window must return the same response — and the original request body must match. Stripe’s API has been the canonical reference for this pattern: it stores keys for 24 hours, and a key reused with a different payload returns an idempotency_error rather than silently overwriting. See Stripe — Idempotent requests and Brandur Leach’s Stripe blog post on idempotent APIs.

Data Modeling

Cart Schema

Primary store: PostgreSQL (ACID for the merge transaction, relational joins for analytics).

1-- Cart table2CREATE TABLE carts (3    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),4    user_id UUID REFERENCES users(id),5    guest_token VARCHAR(64) UNIQUE,6    status VARCHAR(20) DEFAULT 'active',7    created_at TIMESTAMPTZ DEFAULT NOW(),8    updated_at TIMESTAMPTZ DEFAULT NOW(),9    expires_at TIMESTAMPTZ,10    merged_into_cart_id UUID REFERENCES carts(id),1112    CONSTRAINT user_or_guest CHECK (13        (user_id IS NOT NULL AND guest_token IS NULL) OR14        (user_id IS NULL AND guest_token IS NOT NULL)15    )16);1718-- Cart items table19CREATE TABLE cart_items (20    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),21    cart_id UUID NOT NULL REFERENCES carts(id) ON DELETE CASCADE,22    product_id UUID NOT NULL,23    variant_id UUID NOT NULL,24    quantity INT NOT NULL CHECK (quantity > 0),25    unit_price_at_add DECIMAL(10,2) NOT NULL,26    reservation_id UUID,27    added_at TIMESTAMPTZ DEFAULT NOW(),28    updated_at TIMESTAMPTZ DEFAULT NOW(),2930    UNIQUE (cart_id, product_id, variant_id)31);3233-- Indexes for common access patterns34CREATE INDEX idx_carts_user ON carts(user_id) WHERE user_id IS NOT NULL;35CREATE INDEX idx_carts_guest ON carts(guest_token) WHERE guest_token IS NOT NULL;36CREATE INDEX idx_carts_expires ON carts(expires_at) WHERE status = 'active';37CREATE INDEX idx_cart_items_cart ON cart_items(cart_id);38CREATE INDEX idx_cart_items_reservation ON cart_items(reservation_id)39    WHERE reservation_id IS NOT NULL;

Design decisions:

user_id vs guest_token mutual exclusion — clean separation of authenticated vs guest carts; merge sets merged_into_cart_id instead of deleting.
unit_price_at_add — audit trail for the price at the moment of add, useful for legal/regulatory display (“price changed since you added”) and for analytics.
reservation_id is nullable — digital goods, gift cards, and pre-orders don’t reserve inventory.
Soft delete via merged_into_cart_id — preserves guest cart history for analytics and lets support trace a “where did my items go” question.

Cart Cache Structure (Redis)

1# Cart metadata (Hash)2HSET cart:{cart_id}3    user_id "user_xyz789"4    item_count 35    subtotal 259.976    updated_at 170531220078# Cart items (Hash - one per item)9HSET cart:{cart_id}:item:{item_id}10    product_id "prod_12345"11    variant_id "var_black_medium"12    quantity 213    unit_price 79.9914    reservation_id "res_abc123"15    reservation_expires 17053125001617# Guest token to cart mapping18SET guest:{guest_token} cart_abc123 EX 2592000  # 30 days1920# Cart expiration sorted set (for cleanup workers)21ZADD cart_expirations 1705312500 cart_abc123

TTL strategy:

Cart metadata: 30 days (matches business retention).
Reservation entries: 5 minutes (aligned with the soft-reservation TTL).
Guest token mapping: 30 days.

Inventory Schema

Primary store: PostgreSQL with read replicas.

1-- Inventory by location2CREATE TABLE inventory_entries (3    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),4    product_id UUID NOT NULL,5    variant_id UUID NOT NULL,6    location_id UUID NOT NULL,7    quantity_on_hand INT NOT NULL DEFAULT 0,8    quantity_reserved INT NOT NULL DEFAULT 0,9    quantity_available INT GENERATED ALWAYS AS10        (quantity_on_hand - quantity_reserved) STORED,11    updated_at TIMESTAMPTZ DEFAULT NOW(),1213    UNIQUE (product_id, variant_id, location_id),14    CHECK (quantity_reserved <= quantity_on_hand)15);1617-- Reservations table18CREATE TABLE inventory_reservations (19    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),20    inventory_entry_id UUID NOT NULL REFERENCES inventory_entries(id),21    cart_id UUID NOT NULL,22    quantity INT NOT NULL,23    type VARCHAR(10) NOT NULL CHECK (type IN ('soft', 'hard')),24    created_at TIMESTAMPTZ DEFAULT NOW(),25    expires_at TIMESTAMPTZ,  -- NULL for hard reservations26    order_id UUID            -- set when converted to hard reservation27);2829CREATE INDEX idx_reservations_entry ON inventory_reservations (inventory_entry_id);30CREATE INDEX idx_reservations_cart ON inventory_reservations (cart_id);31CREATE INDEX idx_reservations_expires ON inventory_reservations (expires_at)32    WHERE type = 'soft';

Consistency approach:

quantity_available as a generated column — always consistent with the underlying values, no application-side drift.
Reservation updates take a SELECT ... FOR UPDATE row lock on inventory_entries — see PostgreSQL’s explicit locking docs for the exact semantics.
Read replicas serve the storefront availability display where eventual consistency is acceptable.

Reservation Cache (Redis)

1# Soft reservation with automatic expiry2SET reservation:{res_id}3    '{"inventory_entry_id":"inv_123","cart_id":"cart_abc","quantity":2}'4    EX 300  # 5 minutes56# Fast lookup: cart -> reservations7SADD cart_reservations:{cart_id} res_001 res_0028EXPIRE cart_reservations:{cart_id} 300910# Fast lookup: inventory -> reservations (for availability calc)11SADD inventory_reservations:{inventory_entry_id} res_001 res_00212EXPIRE inventory_reservations:{inventory_entry_id} 300

Why Redis for reservations:

Native key TTL handles cleanup without a background job — see the SET ... EX semantics.
Sub-millisecond lookups for the hot availability check.
SADD is atomic, which is enough for reservation tracking inside a single Redis shard.

Caution

The Redis-side reservation is an optimization, not the source of truth. Redis evictions, failovers, or split-brain partitions can drop entries silently; the relational inventory_reservations table is the durable record and the saga reconciles against it on commit.

Database Selection Summary

Data Type	Store	Rationale
Cart (persistent)	PostgreSQL	ACID for merge operations, complex queries
Cart (cache)	Redis Cluster	Sub-ms reads, native TTL
Inventory	PostgreSQL + replicas	Strong-consistency writes, scaled reads
Reservations	Redis + PostgreSQL	Redis for speed, PG for durability
Orders	PostgreSQL	ACID required for financial records
Price rules	PostgreSQL + cache	Complex queries, Redis for hot paths

Low-Level Design: Cart Merge

Cart merge fires when an authenticated user has a guest cart cookie. The system must combine items from both carts while resolving conflicts in a way the user finds predictable.

Merge Algorithm

Flowchart: log in → guest cart? if no, return user cart. If yes, load both with row locks; for each guest item, check if it exists in user cart. If new, transfer. If conflicting, apply strategy (sum, max, keep-user). Cap at max-per-order, re-issue reservations, mark guest cart merged, return. — Cart merge with strategy-driven conflict resolution.

Merge Implementation

1interface CartItem {2  productId: string3  variantId: string4  quantity: number5  reservationId?: string6}78interface MergeResult {9  mergedCart: Cart10  addedItems: CartItem[]11  updatedItems: Array<{ item: CartItem; previousQty: number }>12  conflicts: Array<{ guestItem: CartItem; reason: string }>13}1415async function mergeGuestCart(16  userId: string,17  guestToken: string,18  strategy: "sum" | "max" | "keep_user" = "sum",19): Promise<MergeResult> {20  return await db.transaction(async (tx) => {21    const [userCart, guestCart] = await Promise.all([22      tx.query("SELECT * FROM carts WHERE user_id = $1 FOR UPDATE", [userId]),23      tx.query("SELECT * FROM carts WHERE guest_token = $1 FOR UPDATE", [guestToken]),24    ])2526    if (!guestCart) {27      return { mergedCart: userCart, addedItems: [], updatedItems: [], conflicts: [] }28    }2930    const result: MergeResult = {31      mergedCart: userCart || (await createUserCart(tx, userId)),32      addedItems: [],33      updatedItems: [],34      conflicts: [],35    }3637    for (const guestItem of guestCart.items) {38      const existingItem = result.mergedCart.items.find(39        (i) => i.productId === guestItem.productId && i.variantId === guestItem.variantId,40      )4142      if (!existingItem) {43        await transferItem(tx, guestItem, result.mergedCart.id)44        result.addedItems.push(guestItem)45      } else {46        const newQty = resolveQuantity(existingItem.quantity, guestItem.quantity, strategy)47        const maxAllowed = await getMaxQuantity(guestItem.productId, guestItem.variantId)4849        if (newQty > maxAllowed) {50          result.conflicts.push({51            guestItem,52            reason: `Quantity capped at ${maxAllowed} (max per order)`,53          })54        }5556        if (newQty !== existingItem.quantity) {57          await updateItemQuantity(tx, existingItem.id, Math.min(newQty, maxAllowed))58          result.updatedItems.push({ item: existingItem, previousQty: existingItem.quantity })59        }6061        if (guestItem.reservationId) {62          await releaseReservation(guestItem.reservationId)63        }64      }65    }6667    await tx.query("UPDATE carts SET status = $1, merged_into_cart_id = $2 WHERE id = $3", [68      "merged",69      result.mergedCart.id,70      guestCart.id,71    ])7273    return result74  })75}7677function resolveQuantity(userQty: number, guestQty: number, strategy: string): number {78  switch (strategy) {79    case "sum":80      return userQty + guestQty81    case "max":82      return Math.max(userQty, guestQty)83    case "keep_user":84      return userQty85  }86}

Merge Edge Cases

Scenario	Handling
Guest item now out of stock	Add to cart with `unavailable` flag; notify user
Price changed since guest add	Use current price; show price-change notice
Guest item discontinued	Move to “Saved for later” instead of cart; notify user
Combined quantity exceeds limit	Cap at limit; surface as a `MergeResult.conflicts` entry
Guest cart has applied coupon	Re-validate coupon for the user; user-specific coupons may not transfer

Multi-Device Concurrent Edits

Login-time merge is the one-shot case. The harder, continuous case is the same authenticated user editing the cart from a phone and a laptop within the same minute. Two design primitives keep this correct without locks:

Idempotent line operations on (cart_id, sku, variant). Add and update are upserts keyed on the cart line, not appends. A duplicate add for the same line is collapsed to a no-op; a quantity update is SET qty = $new, never qty = qty + $delta. This is the same shape as Dynamo’s per-key writes and survives client retries that the network turned into duplicates.
Optimistic concurrency on a per-line version. Each line carries a monotonic version integer. Updates send If-Match: version=N (or its body equivalent), and the server returns 409 Conflict with the current line if the precondition fails. The client refetches and retries with the new version. This is the per-line analogue of the cart-level vector clock.

The transport for “the other device just changed your cart” is a per-cart WebSocket / SSE topic that fans out line_changed events as a side effect of every cart write. Devices reconcile against the new version and re-render. If the WebSocket is unavailable, the next cart GET carries the truth — eventual consistency on the display, point-in-time consistency on the write.

Multi-device cart sync sequence: phone POSTs add (sku A), Cart Service writes Redis + DB and publishes line_changed over WebSocket; laptop receives the event, then PATCHes qty=2 with If-Match version=3; both writes are line-keyed and idempotent. A retried POST from the phone hits the cached idempotency response and is a no-op. — Multi-device cart sync — idempotent line upserts, optimistic version checks, and a per-cart fan-out topic.

Note

This is a deliberately weaker guarantee than Dynamo’s sibling reconciliation. We do not preserve the intent of two truly-concurrent writes from different devices on the same line — the late writer either wins (LWW behavior) or fails the precondition and is asked to retry. The trade is justified because (a) same-user, same-line concurrency is rare in practice (humans don’t edit the same line on two devices in the same 200 ms window), and (b) the user can see the conflict resolved on screen via the WebSocket fan-out. If you need true sibling preservation per line, swap the backing store for a Dynamo-family KV and apply the cart’s union+max merge function from §6.1.

Low-Level Design: Checkout Saga

The checkout process spans multiple services with independent databases and must coordinate atomically without distributed locks. This is the textbook saga case².

Saga Orchestration

Sequence: client triggers /checkout/complete on the orchestrator. The orchestrator validates the cart, authorizes payment, converts soft reservations to hard, creates the order, captures payment, then clears the cart. Each step has a labeled compensating action shown in alt branches: insufficient stock voids the auth and returns 409, order creation failure releases reservations and voids the auth, capture failure marks the order failed and releases reservations. — Orchestrated checkout saga — each forward step has an explicit compensating action shown in the alt branches.

Saga State Machine

1enum CheckoutState {2  INITIATED = "initiated",3  CART_VALIDATED = "cart_validated",4  PAYMENT_AUTHORIZED = "payment_authorized",5  INVENTORY_RESERVED = "inventory_reserved",6  ORDER_CREATED = "order_created",7  PAYMENT_CAPTURED = "payment_captured",8  COMPLETED = "completed",9  COMPENSATION_REQUIRED = "compensation_required",10  FAILED = "failed",11}1213interface CheckoutSaga {14  id: string15  cartId: string16  state: CheckoutState17  authorizationId?: string18  orderId?: string19  failedStep?: string20  compensationSteps: string[]21  createdAt: Date22  updatedAt: Date23}2425async function executeCheckoutSaga(checkoutId: string): Promise<Order> {26  const saga = await loadSaga(checkoutId)2728  try {29    if (saga.state === CheckoutState.INITIATED) {30      await validateCart(saga)31      await transitionState(saga, CheckoutState.CART_VALIDATED)32    }3334    if (saga.state === CheckoutState.CART_VALIDATED) {35      saga.authorizationId = await authorizePayment(saga)36      await transitionState(saga, CheckoutState.PAYMENT_AUTHORIZED)37    }3839    if (saga.state === CheckoutState.PAYMENT_AUTHORIZED) {40      await convertReservations(saga)41      await transitionState(saga, CheckoutState.INVENTORY_RESERVED)42    }4344    if (saga.state === CheckoutState.INVENTORY_RESERVED) {45      saga.orderId = await createOrder(saga)46      await transitionState(saga, CheckoutState.ORDER_CREATED)47    }4849    if (saga.state === CheckoutState.ORDER_CREATED) {50      await capturePayment(saga)51      await transitionState(saga, CheckoutState.PAYMENT_CAPTURED)52    }5354    if (saga.state === CheckoutState.PAYMENT_CAPTURED) {55      await clearCart(saga)56      await transitionState(saga, CheckoutState.COMPLETED)57    }5859    return await loadOrder(saga.orderId)60  } catch (error) {61    saga.failedStep = saga.state62    await transitionState(saga, CheckoutState.COMPENSATION_REQUIRED)63    await executeCompensation(saga)64    throw error65  }66}6768async function executeCompensation(saga: CheckoutSaga): Promise<void> {69  if (saga.orderId && saga.state !== CheckoutState.PAYMENT_CAPTURED) {70    await markOrderFailed(saga.orderId)71  }7273  if (saga.state >= CheckoutState.INVENTORY_RESERVED) {74    await releaseHardReservations(saga.cartId)75  }7677  if (saga.authorizationId) {78    await voidAuthorization(saga.authorizationId)79  }8081  await transitionState(saga, CheckoutState.FAILED)82}

Note

Each step is idempotent and starts by inspecting the current state — so a retried saga (worker crash, message redelivery) re-enters at the right point rather than double-charging the customer or double-reserving stock.

Idempotency Implementation

The idempotency contract has two halves: same key + same payload returns the cached response; same key + different payload is an error. This is the same shape Stripe ships in their public API⁹.

1interface IdempotencyRecord {2  key: string3  requestHash: string4  response: any5  statusCode: number6  createdAt: Date7  expiresAt: Date8}910async function withIdempotency<T>(11  key: string,12  request: any,13  handler: () => Promise<T>,14): Promise<{ result: T; statusCode: number; cached: boolean }> {15  const requestHash = hashRequest(request)1617  const existing = await redis.get(`idempotency:${key}`)18  if (existing) {19    const record: IdempotencyRecord = JSON.parse(existing)20    if (record.requestHash === requestHash) {21      return { result: record.response, statusCode: record.statusCode, cached: true }22    }23    throw new ConflictError("Idempotency key reused with different request")24  }2526  const lockAcquired = await redis.set(27    `idempotency:${key}`,28    JSON.stringify({ requestHash, status: "processing" }),29    "NX",30    "EX",31    300,32  )3334  if (!lockAcquired) {35    throw new ConflictError("Request already in progress")36  }3738  try {39    const result = await handler()40    const record: IdempotencyRecord = {41      key,42      requestHash,43      response: result,44      statusCode: 201,45      createdAt: new Date(),46      expiresAt: new Date(Date.now() + 24 * 60 * 60 * 1000),47    }4849    await redis.set(`idempotency:${key}`, JSON.stringify(record), "EX", 86400)50    return { result, statusCode: 201, cached: false }51  } catch (error) {52    await redis.del(`idempotency:${key}`)53    throw error54  }55}

Frontend Considerations

Cart Data Structure

Naive approach:

1interface Cart {2  items: CartItem[]3}

Optimized approach:

1interface CartState {2  items: Record<string, CartItem>3  itemOrder: string[]4  summary: CartSummary5  appliedCoupons: Coupon[]6  reservationTimers: Record<string, number>7}

Why normalized:

Quantity update — single object write, no array scan.
Remove item — delete from items, filter itemOrder.
Reorder — modify itemOrder only, items untouched.
React rendering — reference equality holds for unchanged items, so memoized rows skip re-render.

Optimistic Updates with Rollback

The standard React Query / TanStack Query optimistic-update pattern: snapshot, mutate, roll back on error, refetch on settle. See TanStack Query — Optimistic updates.

1import { useMutation, useQueryClient } from "@tanstack/react-query"23function useAddToCart() {4  const queryClient = useQueryClient()56  return useMutation({7    mutationFn: addItemToCart,89    onMutate: async (newItem) => {10      await queryClient.cancelQueries({ queryKey: ["cart"] })11      const previousCart = queryClient.getQueryData(["cart"])1213      queryClient.setQueryData(["cart"], (old: CartState) => ({14        ...old,15        items: {16          ...old.items,17          [newItem.itemId]: {18            ...newItem,19            status: "pending",20          },21        },22        itemOrder: [...old.itemOrder, newItem.itemId],23        summary: recalculateSummary(old, newItem),24      }))2526      return { previousCart }27    },2829    onError: (err, newItem, context) => {30      queryClient.setQueryData(["cart"], context.previousCart)31      showToast({32        type: "error",33        message: err.code === "INSUFFICIENT_STOCK" ? `Only ${err.available} available` : "Failed to add item",34      })35    },3637    onSuccess: (data, newItem) => {38      queryClient.setQueryData(["cart"], (old: CartState) => ({39        ...old,40        items: {41          ...old.items,42          [newItem.itemId]: {43            ...data.item,44            status: "confirmed",45          },46        },47        summary: data.cartSummary,48      }))49    },5051    onSettled: () => {52      queryClient.invalidateQueries({ queryKey: ["cart"] })53    },54  })55}

Reservation Countdown Timer

1function useReservationTimer(expiresAt: string | null) {2  const [timeLeft, setTimeLeft] = useState<number | null>(null);3  const [isExpired, setIsExpired] = useState(false);45  useEffect(() => {6    if (!expiresAt) return;78    const updateTimer = () => {9      const remaining = new Date(expiresAt).getTime() - Date.now();10      if (remaining <= 0) {11        setIsExpired(true);12        setTimeLeft(0);13      } else {14        setTimeLeft(Math.ceil(remaining / 1000));15      }16    };1718    updateTimer();19    const interval = setInterval(updateTimer, 1000);20    return () => clearInterval(interval);21  }, [expiresAt]);2223  return { timeLeft, isExpired };24}2526function CartItem({ item }: { item: CartItemData }) {27  const { timeLeft, isExpired } = useReservationTimer(item.reservationExpiresAt);2829  return (30    <div className={isExpired ? 'item-expired' : ''}>31      {timeLeft !== null && timeLeft < 300 && (32        <div className="reservation-warning">33          Reserved for {formatTime(timeLeft)} - complete checkout soon34        </div>35      )}36      {isExpired && (37        <div className="reservation-expired">38          Reservation expired - item may become unavailable39        </div>40      )}41    </div>42  );43}

Real-Time Price Updates

1function useCartPriceSync(cartId: string) {2  const queryClient = useQueryClient()34  useEffect(() => {5    const ws = new WebSocket(`wss://api.example.com/cart/${cartId}/updates`)67    ws.onmessage = (event) => {8      const update = JSON.parse(event.data)910      switch (update.type) {11        case "price_change":12          queryClient.setQueryData(["cart"], (old: CartState) => {13            const item = old.items[update.itemId]14            if (!item) return old1516            const priceDiff = update.newPrice - item.unitPrice17            return {18              ...old,19              items: {20                ...old.items,21                [update.itemId]: {22                  ...item,23                  unitPrice: update.newPrice,24                  lineTotal: update.newPrice * item.quantity,25                  priceChanged: priceDiff !== 0,26                  priceDiff,27                },28              },29              summary: recalculateSummary(old, update),30            }31          })32          break3334        case "item_unavailable":35          queryClient.setQueryData(["cart"], (old: CartState) => ({36            ...old,37            items: {38              ...old.items,39              [update.itemId]: {40                ...old.items[update.itemId],41                available: false,42                availableQuantity: update.availableQuantity,43              },44            },45          }))46          break4748        case "reservation_expired":49          queryClient.invalidateQueries({ queryKey: ["cart"] })50          break51      }52    }5354    return () => ws.close()55  }, [cartId, queryClient])56}

Tip

The Baymard Institute’s long-running checkout-usability research consistently finds checkout abandonment around 70%, and the top recoverable causes are extra costs at the last step, forced account creation, and complicated checkout — see Baymard’s checkout abandonment statistics. The reservation timer and live price/availability updates above target exactly the “surprise at the last step” failure mode.

Infrastructure Design

Cloud-Agnostic Architecture

Compute

Component	Concept	Requirements
Cart Service	Stateless API servers	Auto-scaling, health checks
Checkout Orchestrator	Stateful workflow engine	Durable execution, retry support
Background Workers	Job processors	At-least-once delivery, idempotency

Data Stores

Data	Concept	Requirements
Cart (hot)	In-memory cache	Sub-ms reads, TTL support, clustering
Cart (persistent)	Relational DB	ACID, complex queries, replication
Inventory	Relational DB	Strong consistency, row-level locking
Reservations	KV store with TTL	Automatic expiration, high throughput
Orders	Relational DB	ACID, audit trail

Messaging

Use Case	Concept	Requirements
Cart events	Message queue	At-least-once, ordering per cart
Inventory updates	Event stream	Fan-out to multiple consumers
Abandoned cart	Delayed queue	Scheduled delivery

AWS Reference Architecture

AWS Service Mapping

Component	AWS Service	Configuration
Cart API	ECS Fargate	2–50 tasks, auto-scaling on CPU
Cart cache	ElastiCache Redis	r6g.large, cluster mode, 3 shards
Cart DB	RDS PostgreSQL	db.r6g.xlarge, Multi-AZ, 2 read replicas
Reservations	DynamoDB	On-demand, TTL enabled
Checkout saga	Step Functions	Standard workflow (Express tops out at 5 min and is the wrong tool here)
Event bus	SQS + EventBridge	Standard queue, 14-day retention
Background workers	Lambda	1024 MB, 15-min timeout
CDN	CloudFront	Price class 100 (US/EU)
WAF	AWS WAF	Rate limiting, SQL injection protection

Important

Step Functions has two workflow types and they are not interchangeable here. Standard workflows can run for up to a year and emit full execution history; Express workflows run only up to 5 minutes and have a slimmer history model — see AWS — choosing workflow type. A checkout that includes 3DS challenges, PSP timeouts, and human input on address verification can easily exceed the Express ceiling, so the saga belongs on Standard. Reference architecture: AWS — architecting a highly available serverless e-commerce site.

Multi-Region Deployment

For survivability during peak events:

Multi-region active-passive: us-east-1 hosts the primary ALB/ECS/RDS/Redis stack, us-west-2 hosts a secondary stack with an RDS replica. Route 53 failover routing with health checks fronts an AWS Global Accelerator that points at both ALBs. RDS replicates asynchronously across regions. — Multi-region active-passive failover with Route 53 health checks and Global Accelerator.

Failover strategy:

Route 53 health checks detect primary failure — see Route 53 — DNS failover.
Global Accelerator reroutes anycast traffic to the secondary edge.
The RDS read replica is promoted (RPO ~seconds for cross-region replicas, RTO minutes for promotion).
Redis cache is rebuilt from the database — acceptable for carts because the relational store is the source of truth.

For multi-active reads/writes (active-active) on the cart store, DynamoDB Global Tables is the AWS-native option; it uses last-writer-wins by default and a newer multi-region strong-consistency mode for use cases that need zero RPO⁷.

Self-Hosted Alternatives

Managed Service	Self-Hosted Option	When to Self-Host
ElastiCache	Redis Cluster on EC2	Specific modules (RedisGraph, RedisJSON)
RDS PostgreSQL	PostgreSQL on EC2	Cost at scale, specific extensions
DynamoDB	ScyllaDB / Cassandra	Multi-cloud, cost optimization
Step Functions	Temporal.io	Complex workflows, long-running sagas
SQS	RabbitMQ / Redis Streams	Specific routing needs

Note

Shopify’s published flash-sale architecture takes a different shape: a pod-based layout where each pod has its own MySQL, Redis, and Memcached, and stateless web/job workers scale horizontally across pods. A request router (Sorting Hat) maps each shop to its pod, and a Pod Mover migrates pods between regions for failover. See Shopify Engineering — A pods architecture to allow Shopify to scale and the InfoQ talk on flash-sale architecture for the operational details.

Failure Modes and Operational Implications

Failure	Detection	Mitigation
Redis cluster failover	Cart latency spike, Redis pings	Auto-replicas, async DB persistence absorbs the gap; client retries idempotent on `Idempotency-Key`
Cache cold start after region failover	Read-through latency, DB CPU spike	Pre-warm with a background sweep; cap fan-out per key — see AWS Builders’ Library, Caching challenges and strategies on cache addiction and cold-start mitigation.
Flash-sale overload	Goodput drops while p99 climbs	Shed load at the gateway with bounded queues and fail-fast — see AWS Builders’ Library, Using load shedding to avoid overload. Prefer LIFO + age-based dropping so freshly-arrived shoppers see a real response.
Inventory DB lock contention	p99 latency on reservation writes	Backoff + jitter; route hot SKUs through a dedicated shard or a queue-serialized writer
Stuck soft reservation	Reservation TTL approaching, ZSET grows	Two redundant cleaners (Redis TTL + a worker reading the PG `WHERE expires_at < now()` index)
Soft reservation expires mid-checkout	Saga `convert reservations` returns 409	Surface to user with the actual remaining stock; offer to adjust quantity
Payment authorization timeout	PSP returns timeout / 5xx	Retry with same `Idempotency-Key` (PSPs all support this); escalate to compensation if persistent
Saga orchestrator leader flap	Multiple replicas claim the same saga	Lease the saga ID through a leader-election primitive — see AWS Builders’ Library, Leader election in distributed systems. DynamoDB conditional writes or ZooKeeper are battle-tested options; rolling your own usually loses to a corner case.
Step Functions execution stuck	Workflow age + state	Standard workflows expose execution history — replay or re-drive from the failed step
Abandoned cart cleanup falls behind	Sorted-set length, queue depth	Scale workers; never leave it to “DB cleanup” alone — TTL+worker is two redundant safety nets
Coupon code abuse (cart bombing)	Rate-limit metrics, coupon-redeem ratio	Per-user + per-coupon rate limits; CAPTCHA on suspicious sessions
Sibling explosion (Dynamo variant)	Read returns >K versions persistently	Cap K (Dynamo §4.4 reports 99.94% of cart reads see ≤1 version); alert when the long tail grows; force a merge-and-write from a background reaper.

Observability that catches these early

A cart system without these four signals is invisible during the only events that matter:

Signal	What it tells you
Add-to-cart success rate by region	Detects a partial outage before customers tweet about it.
Reservation hit rate at checkout	Drops when soft TTLs are too short; rises when inventory is over-promised.
Saga step latency + failure histogram	Surfaces which compensating action is firing most; informs whether to widen idempotency windows.
Idempotency cache hit rate	A sudden spike means clients are retrying — usually a transient backend symptom worth investigating upstream.

These align with the operational stance in the AWS Builders’ Library articles cited above: measure goodput (successful operations) rather than raw throughput, and instrument the queue depths and timeouts that drive load-shedding decisions.

Conclusion

This shopping cart design prioritizes availability and customer experience over perfect consistency, and accepts the resulting trade-offs:

Eventual consistency for display, strong consistency for checkout — the only consistency that has to be perfect is the one customers pay for.
Soft reservations with TTL prevent abandoned carts from locking inventory, while still giving the active shopper a real assurance of stock.
Idempotent saga orchestration turns “distributed atomic transaction” into “explicit forward steps + compensations” — and it survives retries.
Two-tier caching delivers sub-millisecond cart reads while keeping the relational store as the durable record.

Trade-offs accepted:

Occasional checkout failure when reservations expire (mitigated by countdown UX and automatic re-reservation).
Rare overselling on extreme flash sales (handled by backorder / compensation flow).
Higher operational complexity from a distributed architecture (justified by the scale targets).

What this design intentionally does not address:

Multi-currency pricing (requires an FX-aware pricing service).
Subscription / recurring purchases (different cart lifecycle).
B2B bulk ordering (different quantity / pricing rules, contracts).
Marketplace multi-seller carts (checkout splits across sellers, separate fulfillment).

Appendix

Prerequisites

Distributed-systems fundamentals (CAP, eventual vs strong consistency, quorum reads/writes).
Database concepts (ACID, sharding, replication, row-level locking).
API design (REST, idempotency, error semantics).
Basic understanding of payment processing (authorize/capture, 3DS, refunds).

Terminology

Term	Definition
ATS (Available-to-Sell)	Physical on-hand inventory minus hard reservations
ATP (Available-to-Promise)	ATS adjusted for incoming supply and existing commitments
AFR (Available-for-Reservation)	ATS minus soft reservations (project-internal term)
Soft Reservation	Temporary inventory hold with automatic TTL (typically 5–15 minutes)
Hard Reservation	Committed inventory allocation after payment confirmation
Saga	Pattern for distributed transactions using compensating actions
Idempotency Key	Client-generated UUID ensuring duplicate requests return same response
Cart Merge	Process of combining a guest cart with an authenticated user’s cart

Summary

Two-tier storage (Redis + PostgreSQL) balances latency and durability for the cart hot path.
Soft / hard reservation model prevents inventory lock-up while providing checkout assurance.
Saga orchestration with compensation ensures reliable multi-service checkout despite independent databases.
Eventually consistent inventory reads, strongly consistent checkout unlocks the throughput needed for flash sales.
Normalized frontend state with optimistic updates delivers responsive UX without sacrificing correctness on the wire.
Multi-region deployment with failover targets the 99.99% availability requirement.

References

Garcia-Molina & Salem, Sagas — Cornell PDF (SIGMOD 1987).
DeCandia et al., Dynamo: Amazon’s Highly Available Key-value Store — allthingsdistributed.com PDF (SOSP 2007). §4.4 vector clocks; §6.1 cart-specific reconciliation.
Werner Vogels, Eventually Consistent — ACM Queue 6(6), 2008 / CACM 52(1), 2009.
AWS Builders’ Library — Caching challenges and strategies, Using load shedding to avoid overload, Leader election in distributed systems.
AWS News Blog — Prime Day 2023: all the numbers, How AWS powered Prime Day 2024.
Apache Cassandra docs — Data modeling, Read repair. LWW semantics and lack of sibling reconciliation.
Chris Richardson, Pattern: Saga — microservices.io.
Stripe — Idempotent requests — docs.stripe.com.
Modern Treasury — Why idempotency matters in payments — moderntreasury.com.
Microsoft Learn — Inventory Visibility reservations (Dynamics 365) — learn.microsoft.com.
Shopify — BFCM 2024 data — shopify.com/news/bfcm-data-2024.
Shopify Engineering — How we prepare Shopify for BFCM (2025) — shopify.engineering/bfcm-readiness-2025.
Shopify Engineering — A pods architecture to allow Shopify to scale — shopify.engineering.
AWS — Architecting a highly available serverless e-commerce site — aws.amazon.com.
AWS — Step Functions: choosing workflow type — docs.aws.amazon.com.
AWS — DynamoDB Global tables — docs.aws.amazon.com.
PostgreSQL — Explicit locking (row-level locks) — postgresql.org.
Redis — SET command (NX, EX semantics) — redis.io.
Baymard Institute — Cart abandonment rate statistics — baymard.com.
Queue-it — Overselling prevention — queue-it.com.
TanStack Query — Optimistic updates — tanstack.com.
Fluent Commerce — Available-to-Sell vs Available-to-Promise — docs.fluentcommerce.com.

G. DeCandia et al., Dynamo: Amazon’s Highly Available Key-value Store, SOSP 2007 — PDF. Modern AWS provides this pattern as DynamoDB Global Tables — see Global tables docs. ↩ ↩² ↩³ ↩⁴ ↩⁵
Hector Garcia-Molina and Kenneth Salem, Sagas, ACM SIGMOD 1987 — PDF. ↩ ↩² ↩³
Werner Vogels, Eventually Consistent, ACM Queue 6(6), Dec 2008; reprinted CACM 52(1), Jan 2009 — queue.acm.org/detail.cfm?id=1466448, DOI 10.1145/1435417.1435432. The article uses the Amazon shopping cart as the worked example for the always-writeable / merge-on-read pattern. ↩ ↩²
Shopify, New achievement unlocked: $11.5B in BFCM 2024 sales — shopify.com/news/bfcm-data-2024; Shopify Engineering, How we prepare Shopify for BFCM (2025) — shopify.engineering/bfcm-readiness-2025. ↩
AWS News Blog — Prime Day 2023 powered by AWS — all the numbers (aws.amazon.com/blogs/aws/prime-day-2023-powered-by-aws-all-the-numbers) and How AWS powered Prime Day 2024 for record-breaking sales (aws.amazon.com/blogs/aws/how-aws-powered-prime-day-2024-for-record-breaking-sales). ↩
Apache Cassandra documentation, Data modeling — How is data updated? — cassandra.apache.org/doc/latest/cassandra/data_modeling/index.html; see also Apache Cassandra docs, Read repair and consistency — cassandra.apache.org/doc/latest/cassandra/managing/operating/read_repair.html. Cassandra resolves conflicts by client-supplied microsecond timestamps; concurrent writes silently overwrite rather than producing siblings, which is why a naive cart_id → JSON schema on Cassandra loses items under concurrency. ↩
AWS — DynamoDB Global tables: multi-active, multi-Region replication — docs.aws.amazon.com/…/GlobalTables.html. ↩ ↩²
Chris Richardson, Pattern: Saga — microservices.io/patterns/data/saga.html. ↩
Stripe — Idempotent requests — docs.stripe.com/api/idempotent_requests. Modern Treasury offers an equivalent contract — see Modern Treasury — Idempotent requests. ↩