Distributed Locking

Distributed locks coordinate access to shared resources across processes or nodes. Unlike a single-process mutex, they must hold up under network partitions, clock drift, process pauses, and partial failures — all while delivering mutual-exclusion guarantees that range from “best effort” to “correctness critical.”

This article covers lease-based locking, the major implementations (Redis, ZooKeeper, etcd, Chubby), the Redlock controversy, fencing tokens, and when to skip locks entirely.

Distributed locks have to survive failure modes that single-process locks never see — partitions, clock drift, and stop-the-world pauses can all let two clients believe they hold the same lock.

Mental Model

Distributed locking is fundamentally harder than it appears. The safety property — at most one client holds the lock at any time — needs either a consensus protocol (ZooKeeper, etcd, Chubby) or timing assumptions that can fail in real systems (Redlock).

The first split that matters is the efficiency vs. correctness distinction popularised by Martin Kleppmann:

Efficiency locks prevent duplicate work. Occasional double-execution costs cycles or duplicate notifications, but does not corrupt state. Single-node Redis or Redlock is acceptable.
Correctness locks protect an invariant. Double-execution corrupts data, double-bills a customer, or double-publishes a job. These need consensus and fencing tokens.

The second key idea is the lease. Almost every practical distributed lock is time-bounded — a lock with a TTL that auto-expires — to avoid deadlocks from crashed clients. Leases solve liveness but introduce the central problem of distributed locking: what if the lease expires while the client is still working?

Fencing tokens are the answer. The lock service hands out a monotonically increasing token with every grant. The protected resource tracks the highest token it has ever seen and rejects operations carrying a smaller one. Lease expiration during a long operation goes from “silent corruption” to “detected and refused.”

Decision framework:

Requirement	Implementation	Trade-off
Best-effort deduplication	Redis single-node	Single point of failure
Efficiency with fault tolerance	Redlock (5 nodes)	No fencing, timing assumptions
Correctness critical	ZooKeeper / etcd + fencing	Operational complexity
Already running Consul	Consul session lock + index	Advisory; needs LockIndex check
Already using PostgreSQL	Advisory locks	Limited to single database

A consensus-backed lock funnels every grant through a single elected leader so that the lock state, the grant order, and the token are all derived from the same replicated log:

Every contender talks to the elected leader of a consensus cluster; the leader replicates the grant via quorum and hands back a monotonic token (zxid, etcd revision, Chubby generation number) that the protected resource validates. — Consensus-backed lock: a single elected leader grants the lock through a replicated log and emits a monotonic token the resource validates.

The Problem

Why Naive Solutions Fail

Approach 1: File-based locks across NFS

1// Naive NFS lock - seems simple2async function acquireLock(path: string): Promise<boolean> {3  try {4    await fs.writeFile(path, process.pid, { flag: "wx" }) // exclusive create5    return true6  } catch {7    return false // file exists8  }9}

Fails because:

NFS semantics vary: O_EXCL isn’t atomic on all NFS implementations
No expiration: If the process crashes, lock file persists forever
No fencing: Stale lock holders can still access the resource

Approach 2: Database row locks

1-- Lock by inserting a row2INSERT INTO locks (resource_id, holder, acquired_at)3VALUES ('resource-1', 'client-a', NOW())4ON CONFLICT DO NOTHING;

Fails because:

No automatic expiration: Crashed clients leave orphan locks
Clock drift: acquired_at timestamps unreliable across nodes
Single point of failure: Database becomes bottleneck

Approach 3: Redis SETNX without TTL

1SETNX resource:lock client-id

Fails because:

No expiration: Crashed client locks resource forever
Race on release: Client must check-then-delete atomically

The Core Challenge

The fundamental tension: distributed systems are asynchronous. There are no bounded delays on message delivery, no bounded process pauses, and no bounded clock drift. Acquiring a lock is a compare-and-set operation, which requires consensus¹ — and the FLP impossibility result tells us consensus in a fully asynchronous system with even one faulty process is impossible.

Distributed locks have to provide mutual exclusion in that environment. You cannot reliably distinguish a slow client from a crashed one, and you cannot trust clocks.

Lease-Based Locking

All practical distributed locks use leases — time-bounded locks that expire automatically. Leases protect liveness against crashed clients (no lock is held forever) at the cost of a new safety hazard: a slow client can outlive its lease.

Core Mechanism

A client acquires a lease, performs work, and either releases the lock or lets the TTL expire — at which point a stale holder may still be in the middle of a write. — Lease lifecycle: acquire, work, release or expire — and the dangerous window where a slow client outlives its lease.

TTL Selection Formula

The Redlock spec defines the validity time of a lease as:

1MIN_VALIDITY = TTL - (T_acquire - T_start) - CLOCK_DRIFT

Where:

TTL is the initial lease duration.
T_acquire - T_start is the wall time spent acquiring the lock across instances.
CLOCK_DRIFT is the maximum tolerated skew between the participating clocks (the spec recommends ~1% of TTL plus a few milliseconds).

If MIN_VALIDITY goes non-positive, the acquire is treated as failed and all instances are released.

Practical guidance (rules of thumb, calibrate to your workload):

JVM applications: TTL ≥ 60 s. Stop-the-world GC pauses have been observed in the multi-second range, and full pauses of minutes have been recorded in HBase.
Go / Rust: TTL ≥ 30 s. GC is less of a concern, but page faults, EBS reads, and CPU contention still pause the process.
General rule: target TTL ≈ 10 × the p99 of the protected operation, then add a renewal heartbeat for long-running work.

Clock Skew Issues

Wall-clock danger. Redis uses gettimeofday-based wall-clock time, not a monotonic clock, to expire keys. If the server clock jumps forward (NTP step, sysadmin adjustment), leases expire ahead of schedule:

Client acquires the lock at server time T with TTL = 30 s.
NTP steps the clock forward by 20 s.
Redis treats the lock as expired at “T+30 s”, actually elapsed T+10 s.
Another client acquires the lock while the first one is still working.
Two clients believe they hold the same lock.

Mitigation. Use monotonic clocks where possible. Linux clock_gettime(CLOCK_MONOTONIC) measures elapsed time without wall-clock adjustments. In his rebuttal to Kleppmann, Antirez agreed Redis itself should move TTL expiration off gettimeofday onto the monotonic clock API. Internal Redis subsystems have shifted toward monotonic time over the years, but key expiration is still wall-clock-driven, so the clock-jump failure mode has not been retired in practice.

Warning

Configure NTP to slew rather than step the clock (-x / tinker step 0 on ntpd, default for chronyd). A slewing clock cannot violate Redlock’s bounded-drift assumption; a stepping clock can.

Design Paths

Path 1: Redis Single-Node Lock

When to choose:

Lock is for efficiency (prevent duplicate work), not correctness
Single point of failure is acceptable
Lowest latency requirement

Implementation:

1import { Redis } from "ioredis"23async function acquireLock(redis: Redis, resource: string, clientId: string, ttlMs: number): Promise<boolean> {4  // SET with NX (only if not exists) and PX (millisecond expiry)5  const result = await redis.set(resource, clientId, "NX", "PX", ttlMs)6  return result === "OK"7}89async function releaseLock(redis: Redis, resource: string, clientId: string): Promise<boolean> {10  // Lua script: atomic check-and-delete11  // Only delete if we still own the lock12  const script = `13    if redis.call("get", KEYS[1]) == ARGV[1] then14      return redis.call("del", KEYS[1])15    else16      return 017    end18  `19  const result = await redis.eval(script, 1, resource, clientId)20  return result === 121}

Why the Lua script for release: Without atomic check-and-delete, this race exists:

Client A’s lock expires
Client B acquires lock
Client A (still thinking it has lock) calls DEL
Client A deletes Client B’s lock

Trade-offs:

Advantage	Disadvantage
Simple implementation	Single point of failure
Low latency (~1ms)	No automatic failover
Well-understood semantics	Lost locks on master crash

Real-world: This approach works well for rate limiting, cache stampede prevention, and other scenarios where occasional double-execution is tolerable.

Path 2: Redlock (Multi-Node Redis)

When to choose:

Need fault tolerance for efficiency locks
Can tolerate timing assumptions
Want Redis ecosystem (Lua scripts, familiar API)

Algorithm (N=5 independent Redis instances):

Get current time in milliseconds
Try to acquire lock on ALL N instances sequentially, with small timeout per instance
Lock is acquired if: majority (N/2 + 1) succeeded AND total elapsed time < TTL
Validity time = TTL - elapsed time
If failed, release lock on ALL instances (even those that succeeded)

1import { Redis } from "ioredis"2import { randomBytes } from "crypto"34interface RedlockResult {5  acquired: boolean6  validity: number7  value: string8}910async function redlockAcquire(instances: Redis[], resource: string, ttlMs: number): Promise<RedlockResult> {11  const value = randomBytes(20).toString("hex")12  const startTime = Date.now()13  const quorum = Math.floor(instances.length / 2) + 11415  let acquired = 016  for (const redis of instances) {17    try {18      const result = await redis.set(resource, value, "NX", "PX", ttlMs)19      if (result === "OK") acquired++20    } catch {21      // Instance unavailable, continue22    }23  }2425  const elapsed = Date.now() - startTime26  const validity = ttlMs - elapsed2728  if (acquired >= quorum && validity > 0) {29    return { acquired: true, validity, value }30  }3132  // Failed - release all locks33  await Promise.all(instances.map((r) => releaseLock(r, resource, value)))34  return { acquired: false, validity: 0, value }35}

Critical limitation: Redlock generates random values (20 bytes from /dev/urandom), not monotonically increasing tokens. You cannot use Redlock values for fencing because resources cannot determine which token is “newer.”

Trade-offs vs single-node:

Aspect	Single-Node	Redlock (N=5)
Fault tolerance	None	Survives N/2 failures
Latency	~1ms	~5ms (sequential attempts)
Complexity	Low	Medium
Fencing support	No	No
Clock assumptions	Server only	All N servers + client

Path 3: ZooKeeper

When to choose:

Correctness-critical locks (fencing required).
Already running ZooKeeper (Kafka, HBase, Hadoop ecosystem).
You can tolerate higher latency for stronger guarantees.

Ephemeral sequential node recipe (the canonical ZooKeeper lock recipe):

ZooKeeper lock recipe — every contender creates an ephemeral sequential znode under the lock parent and watches its immediate predecessor; the lowest sequence holds the lock. — ZooKeeper lock recipe: ephemeral sequential nodes form a chain; each waiter watches only its immediate predecessor.

Algorithm:

Client creates an EPHEMERAL_SEQUENTIAL node under /locks/<resource>.
Client lists all children and sorts by sequence number.
If the client’s node is the lowest sequence, the lock is acquired.
Otherwise, set a watch on the immediate predecessor.
When the watch fires (predecessor deleted), repeat from step 2.

Why watch the predecessor, not the parent. Watching the parent fires every waiter on every release (thundering herd). Watching only the predecessor wakes exactly one waiter per release.

Fencing via zxid. ZooKeeper’s transaction ID is a monotonically increasing 64-bit value (epoch << 32 | counter). The creation zxid of the lock znode is available via Stat.getCzxid() and works as a fencing token.

1import org.apache.zookeeper.*;2import java.util.List;3import java.util.Collections;45public class ZkLock {6    private final ZooKeeper zk;7    private final String lockPath;8    private String myNode;910    public long acquireLock(String resource) throws Exception {11        // Create ephemeral sequential node12        myNode = zk.create(13            "/locks/" + resource + "/lock-",14            new byte[0],15            ZooDefs.Ids.OPEN_ACL_UNSAFE,16            CreateMode.EPHEMERAL_SEQUENTIAL17        );1819        while (true) {20            List<String> children = zk.getChildren("/locks/" + resource, false);21            Collections.sort(children);2223            String smallest = children.get(0);24            if (myNode.endsWith(smallest)) {25                // We have the lock - return zxid as fencing token26                Stat stat = zk.exists(myNode, false);27                return stat.getCzxid();28            }2930            // Find predecessor and watch it31            int myIndex = children.indexOf(myNode.substring(myNode.lastIndexOf('/') + 1));32            String predecessor = children.get(myIndex - 1);3334            // This blocks until predecessor is deleted35            Stat stat = zk.exists("/locks/" + resource + "/" + predecessor, true);36            if (stat != null) {37                // Wait for watch notification38                synchronized (this) { wait(); }39            }40        }41    }42}

Trade-offs:

Advantage	Disadvantage
Strong consistency (Zab consensus)	Higher latency (2+ RTTs)
Automatic cleanup (ephemeral nodes)	Operational complexity
Fencing tokens (zxid)	Session management overhead
No clock assumptions	Quorum unavailable = no locks

Path 4: etcd

When to choose:

Kubernetes-native environment
Prefer gRPC over custom protocols
Need distributed KV store beyond just locking

Lease-based locking:

etcd provides first-class lease primitives. A lease is a token with a TTL; keys can be attached to leases and are automatically deleted when the lease expires.

1package main23import (4    "context"5    "time"6    clientv3 "go.etcd.io/etcd/client/v3"7    "go.etcd.io/etcd/client/v3/concurrency"8)910func acquireLock(client *clientv3.Client, resource string) (*concurrency.Mutex, error) {11    // Create session with 30s TTL12    session, err := concurrency.NewSession(client, concurrency.WithTTL(30))13    if err != nil {14        return nil, err15    }1617    // Create mutex and acquire18    mutex := concurrency.NewMutex(session, "/locks/"+resource)19    ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)20    defer cancel()2122    if err := mutex.Lock(ctx); err != nil {23        return nil, err24    }2526    // Use mutex.Header().Revision as fencing token27    return mutex, nil28}

Fencing via revision: etcd assigns a globally unique, monotonically increasing revision to every modification. Use mutex.Header().Revision as your fencing token.

Critical limitation (Jepsen finding). Kyle Kingsbury’s Jepsen analysis of etcd 3.4.3 found that etcd locks do not provide mutual exclusion. Using etcd mutexes to protect concurrent updates to an in-memory set, with two-second lease TTLs and process pauses every five seconds, “[reliably induced] the loss of ~18% of acknowledged updates.” Mutex violations were observed even in healthy clusters — exacerbated by an etcd bug (etcd-io/etcd#11456) where the server did not re-check lease validity after a contended lock was released.

“etcd locks (like all distributed locks) do not provide mutual exclusion. Multiple processes can hold an etcd lock concurrently, even in healthy clusters with perfectly synchronized clocks.” — Kyle Kingsbury, Jepsen analysis of etcd 3.4.3 (2020)

The recommended mitigation is exactly what Kleppmann argued for: treat the etcd revision of the lock key as a fencing token and have the protected resource reject operations that present a lower revision than it has already accepted.

Trade-offs:

Advantage	Disadvantage
Raft consensus (strong consistency)	Jepsen found safety violations
Native lease support	Higher latency than Redis
Kubernetes integration	Operational complexity
Revision-based fencing	Quorum unavailable = no locks

Path 5: Database Advisory Locks (PostgreSQL)

When to choose:

Already using PostgreSQL
Lock scope is single database
Don’t want external dependencies

Session-level advisory locks:

1-- Acquire lock (blocks until available)2SELECT pg_advisory_lock(hashtext('resource-1'));34-- Try acquire (returns immediately)5SELECT pg_try_advisory_lock(hashtext('resource-1'));67-- Release8SELECT pg_advisory_unlock(hashtext('resource-1'));

Transaction-level advisory locks:

1-- Automatically released at transaction end2SELECT pg_advisory_xact_lock(hashtext('resource-1'));34-- Then do your work within the transaction5UPDATE resources SET ... WHERE id = 'resource-1';

Lock ID generation: Advisory locks take a 64-bit integer key. Use hashtext() for string-based resource IDs, or encode your own scheme.

Connection pooling danger: Session-level locks are tied to the database connection. With connection pooling (PgBouncer), your “session” may be reused by another client, leaking locks. Use transaction-level locks with connection pooling.

Trade-offs:

Advantage	Disadvantage
No external dependencies	Single database scope
ACID guarantees	Connection pooling issues
Already have PostgreSQL	Not for multi-database
Automatic transaction cleanup	Lock ID collisions possible

Path 6: Consul Sessions and KV Locks

When to choose:

Already running Consul for service discovery / config.
You want session-bound locks tied to node and health-check liveness.
You need a built-in cool-down window after lock loss (the Consul LockDelay).

Mechanism. A Consul session binds together a node, its health checks, and an optional TTL. The acquire=<session-id> parameter on PUT /v1/kv/<key> is a check-and-set op: it succeeds only if the key is currently unlocked or already held by the same session. release=<session-id> is the symmetric CAS. Sessions are invalidated on node deregistration, health-check failure, explicit destroy, or TTL expiry; the Behavior field controls whether held locks are released (default) or the keys deleted.

1# Create a session with a 15s TTL, default LockDelay = 15s2curl -X PUT http://consul:8500/v1/session/create \3  -d '{"TTL": "15s", "LockDelay": "15s", "Behavior": "release"}'4# -> {"ID": "<sid>"}56# Acquire (CAS): succeeds iff key is unlocked or already held by <sid>7curl -X PUT "http://consul:8500/v1/kv/locks/order-svc/leader?acquire=<sid>" \8  -d '{"holder": "node-7"}'9# -> true

Fencing via LockIndex. Each successful acquire increments the key’s LockIndex and stamps the holding Session. Consul’s docs explicitly cast the tuple (Key, LockIndex, Session) as a sequencer: a client that intends to mutate a downstream resource can pass LockIndex along, and the resource can reject any operation whose LockIndex is lower than the highest it has already accepted. This gives Consul the same fencing-token shape as ZooKeeper’s czxid and etcd’s revision.

LockDelay as a safety buffer. When a session is invalidated, Consul refuses to grant the same lock for a default 15 s (configurable 0–60 s). The window gives a previously-leading client time to notice it lost the lock and stop performing protected operations before any new holder can race ahead — an explicit, advisory mitigation for the slow-client / stale-write problem rather than a substitute for fencing.

Warning

Setting LockDelay: 0 removes the safety buffer entirely. Pair it with strict LockIndex fencing on the protected resource, or you have a Redlock-class race built into Consul.

Trade-offs:

Advantage	Disadvantage
Session-bound + health-check-bound	Advisory only; resource must check
`LockDelay` reduces stale-write race	TTL bounded to 10 s–24 h
`LockIndex` works as fencing token	Tied to Consul cluster availability
Reuses existing Consul infra	Many subtle session/healthcheck pitfalls

Comparison Matrix

Factor	Redis Single	Redlock	ZooKeeper	etcd	Consul	PostgreSQL
Fault tolerance	None	N/2 failures	N/2 failures	N/2 failures	N/2 failures	Database HA
Fencing tokens	No	No	Yes (zxid)	Yes (revision)	Yes (`LockIndex`)	No
Latency (acquire)	~1 ms	~5–10 ms	~10–50 ms	~10–50 ms	~10–50 ms	~1–5 ms
Clock assumptions	Yes	Yes (all nodes)	No	No	TTL only	No
Correctness guarantee	No	No	Yes	Partial (Jepsen)	Advisory + `LockIndex`	Yes (single DB)
Operational complexity	Low	Medium	High	Medium	Medium	Low

Decision Framework

Decision tree for picking a distributed lock implementation: efficiency vs. correctness, fault tolerance, fencing requirement, and existing infrastructure constraints. — Lock implementation decision tree: efficiency vs. correctness, fault-tolerance, and existing infrastructure pick the path.

Fencing Tokens

The Problem They Solve

Leases expire. When they do, a “stale” lock holder may still be executing its critical section. Without fencing, this corrupts the protected resource.

Example failure scenario:

Without fencing, a client whose lease expired during a GC pause can still write to storage long after another client took the lock — last-write-wins corruption. — Without fencing: a stale client whose lease expired during a GC pause still writes to storage after a new lock holder, corrupting state.

How Fencing Tokens Work

The fencing-token contract has four pieces, and the protected resource is the part that does the actual work — a lock service alone cannot fence:

Lock service issues a strictly monotonically increasing token with each grant. Without a total order, “stale” is undefined.
Client includes the token with every mutating operation against the protected resource (write, RPC, transaction).
Resource tracks the highest token it has ever accepted for that resource key.
Resource rejects any operation whose token is lower than the highest it has seen, atomically with the operation it would otherwise perform.

The ordering must come from somewhere — Kleppmann’s punchline is that “it’s likely that you would need a consensus algorithm just to generate the fencing tokens”¹. That is exactly what ZooKeeper’s zxid, etcd’s revision, and Chubby’s lock-generation number give you for free: a totally-ordered identifier minted from the same replicated log that decided who holds the lock.

With fencing, the lock service issues monotonic tokens 33 and 34; storage rejects the stale client 1 write because 33 < 34. — Fencing tokens prevent stale writes: storage tracks the highest token it has ever accepted and refuses lower ones.

Implementation Pattern

Lock service side:

1interface LockGrant {2  token: bigint // Monotonically increasing3  expiresAt: number4}56class FencingLockService {7  private nextToken: bigint = 1n8  private locks: Map<string, { holder: string; token: bigint; expiresAt: number }> = new Map()910  acquire(resource: string, clientId: string, ttlMs: number): LockGrant | null {11    const existing = this.locks.get(resource)12    if (existing && existing.expiresAt > Date.now()) {13      return null // Lock held14    }1516    const token = this.nextToken++17    const expiresAt = Date.now() + ttlMs18    this.locks.set(resource, { holder: clientId, token, expiresAt })1920    return { token, expiresAt }21  }22}

Resource side:

1interface FencedWrite {2  token: bigint3  data: unknown4}56class FencedStorage {7  private highestToken: Map<string, bigint> = new Map()8  private data: Map<string, unknown> = new Map()910  write(resource: string, write: FencedWrite): boolean {11    const highest = this.highestToken.get(resource) ?? 0n1213    if (write.token < highest) {14      // Stale token - reject15      return false16    }1718    // Accept write, update highest seen19    this.highestToken.set(resource, write.token)20    this.data.set(resource, write.data)21    return true22  }23}

Why Random Values Don’t Work

Redlock uses 20 random bytes from /dev/urandom, not ordered tokens. A resource cannot determine whether abc123 is “newer” than xyz789. The values lack the ordering property required to reject stale operations. As Kleppmann put it: “the algorithm does not produce any number that is guaranteed to increase every time a client acquires a lock … the unique random value it uses does not provide the required monotonicity. … It’s likely that you would need a consensus algorithm just to generate the fencing tokens.”¹

ZooKeeper zxid as Fencing Token

ZooKeeper’s transaction ID (zxid) is perfect for fencing:

Monotonically increasing: Every ZK transaction increments it
Globally ordered: All clients see same ordering
Available at lock time: Stat.getCzxid() returns creation zxid

1// When acquiring lock2Stat stat = zk.exists(myLockNode, false);3long fencingToken = stat.getCzxid();45// When accessing resource6resource.write(data, fencingToken);

The Redlock Controversy

Kleppmann’s Critique (2016)

Martin Kleppmann’s core argument is that Redlock is “neither fish nor fowl”: too heavy and complex for efficiency locks, not safe enough for correctness locks. Three concrete problems:

1. Timing assumptions violated by real systems. Redlock works correctly only if the system is “partially synchronous”: bounded network delay, bounded process pauses, bounded clock drift. Real systems violate all three:

Network packets can be delayed arbitrarily — including GitHub’s 2012 incident with ~90 s of network delay.
Stop-the-world GC pauses can exceed any reasonable lease TTL (HBase observed multi-minute pauses).
Clock skew can be large under adversarial NTP conditions or operator error.

2. No fencing capability. Even if every step worked perfectly, Redlock generates random values, not monotonic tokens — so the protected resource has no way to reject stale operations.

3. Clock-jump scenario (paraphrased from the post):

Client 1 acquires the lock on nodes A, B, C; D and E are unreachable.
Clock on node C jumps forward; C’s copy of the lease expires.
Client 2 acquires the lock on C, D, E.
Both clients now hold a quorum.

Redlock clock-jump split-brain: an NTP step on a single instance reissues the lock to a new client; the two quora overlap on the skewed node.

Antirez’s Response

Salvatore Sanfilippo’s rebuttal argues:

A unique random token is enough for many uses. With check-and-set against the protected resource, only the holder of the matching token can mutate it. Antirez’s point: most practical workflows do not in fact need monotonic ordering.
Post-acquisition time check. Redlock’s spec instructs the client to record T_start, attempt all instances, then re-read the clock and refuse the lock if elapsed time has consumed the TTL. He argues this neutralises the GC-during-acquire scenario, but Kleppmann’s reply is that the pause can happen after acquisition, between the validity check and the resource access.
Monotonic clocks. Antirez agreed Kleppmann was right that Redis and Redlock should use the monotonic clock API to eliminate the clock-jump variant of the attack.

The Verdict

Neither argument is fully satisfying:

Kleppmann’s points	Antirez’s counter-points	Reality
GC pauses violate timing	Post-acquisition check helps	Pauses can happen during resource access, not just during acquisition
No fencing possible	Random token + CAS works	CAS works only when the resource itself stores the token
Clock jumps break safety	Use monotonic clocks	A cross-machine monotonic clock does not exist

Practical guidance:

Efficiency locks: Redlock is fine. Occasional double-execution is annoying, not catastrophic.
Correctness locks: use a consensus-based system (ZooKeeper, Chubby, or etcd with revision-based fencing) and enforce fencing on every protected operation. Redlock’s random values cannot fence.

Production Implementations

Google Chubby: The Original

Context. Internal distributed lock service powering GFS, BigTable, and other Google infrastructure. The Chubby paper directly inspired ZooKeeper.

Architecture (Burrows, OSDI 2006):

A Chubby cell is typically 5 replicas; replicas elect a master via Paxos and the master holds a multi-second master lease.
The master serves all client traffic; writes are committed via Paxos to a quorum of replicas.
Client sessions hold a session lease (default ~12 s); on lease loss, the client enters a “jeopardy” state and waits a default 45 s grace period before declaring the session dead and releasing locks.
The data model is a small filesystem; locks are advisory and attached to files.

Key design decisions:

Coarse-grained locks — designed for locks held seconds to hours (leader election, configuration), not millisecond-scale critical sections.
Advisory by default — files do not refuse reads or writes from non-holders; correctness depends on holders honouring the protocol or using sequencers.
Master lease renewal — brief network blips do not cause leadership churn.
Client grace period — sessions survive transient outages without dropping every lock the client holds.

Fencing via sequencers. Chubby exposes GetSequencer() / SetSequencer() / CheckSequencer(). A sequencer is an opaque byte-string carrying the lock name, mode, and lock generation number. The protected service can either pass it back to Chubby for validation or compare generation numbers locally. The Chubby paper notes: if a sequencer is no longer valid (the lock was lost or re-acquired), the validation call fails. This is the same fencing-token pattern Kleppmann argues for, predating his blog by a decade.

Scale. Chubby is optimised for reliability and small numbers of long-lived locks, not lock throughput. Heavy fan-in workloads use it for leader election and route the actual lock traffic through purpose-built systems (BigTable’s metadata, for example).

Industry Patterns

Public engineering write-ups about distributed lock usage tend to be short on first-party detail, but the same patterns recur across companies:

Per-key efficiency locks for assignment / matching. Ride-hailing, ad-serving, and inventory systems hold a millisecond-scale lock on the contended entity (driver ID, ad slot, SKU) so only one of N concurrent matchers attempts the write. Failures are caught downstream — the booking system rejects a duplicate driver assignment, the inventory system rejects a double-decrement — so the lock can use the cheapest possible backend (single-node Redis or Redlock).
Per-event deduplication around side effects. Event-driven pipelines acquire job:{event_id} for the expected job duration so a re-delivered event does not re-fire the side effect. Locks deduplicate; idempotency keys at the downstream API are the safety net; monitoring catches the drift between the two.
Layered defence. No production system relies on a single layer. The pattern is lock-for-deduplication + idempotency-for-correctness + reconciliation-for-eventual-fix. Treat any “we use Redis locks for correctness” claim with suspicion until you find the layer that actually enforces it.

Implementation Comparison

Aspect	Google Chubby	Efficiency lock (typical)	Job-dedup lock (typical)
Lock type	Correctness	Efficiency	Efficiency
Duration	Seconds–hours	Milliseconds	Seconds–minutes
Backend	Paxos (custom)	Redis (single-node or Redlock)	Redis or ZK + idempotency keys
Fencing	Sequencers	None (downstream rejects duplicates)	None (idempotent downstream)
Scale	Low frequency, high durability	High frequency, occasional loss OK	High frequency, occasional loss OK

Lock-Free Alternatives

When to Avoid Locks Entirely

Distributed locks add complexity and failure modes. Before reaching for a lock, consider:

1. Idempotent operations:

If your operation can safely execute multiple times with the same result, you don’t need a lock.

1// Bad: non-idempotent2async function incrementCounter(id: string) {3  const current = await db.get(id)4  await db.set(id, current + 1)5}67// Good: idempotent with versioning8async function setCounterIfMatch(id: string, expectedVersion: number, newValue: number) {9  await db10    .update(id)11    .where("version", expectedVersion)12    .set({ value: newValue, version: expectedVersion + 1 })13}

2. Compare-and-Swap (CAS):

Many databases support atomic CAS. Use it instead of external locks.

1-- CAS-based update2UPDATE resources3SET value = 'new-value', version = version + 14WHERE id = 'resource-1' AND version = 42;56-- Check rows affected - if 0, retry with fresh version

3. Optimistic concurrency:

Assume no conflicts; detect and retry on collision.

1interface VersionedResource {2  data: unknown3  version: number4}56async function optimisticUpdate(id: string, transform: (data: unknown) => unknown) {7  while (true) {8    const resource = await db.get(id)9    const newData = transform(resource.data)1011    const updated = await db.update(id, {12      data: newData,13      version: resource.version + 1,14      _where: { version: resource.version },15    })1617    if (updated) return // Success18    // Version conflict - retry19  }20}

4. Queue-based serialization:

Route all operations for a resource to a single queue or partition. This is how Kafka, Kinesis, and most modern command-bus designs avoid locks entirely.

Per-resource queues with single-writer workers eliminate concurrent access by partitioning operations on the resource key. — Queue-based serialization: partition by resource key so each resource has a single writer, no locks needed.

Concurrent access is eliminated by design — at the cost of latency (queue depth) and operational complexity (partition rebalancing, hot partitions).

Decision: Lock vs Lock-Free

Factor	Use Distributed Lock	Use Lock-Free
Operation complexity	Multi-step, non-decomposable	Single atomic operation
Conflict frequency	Rare	Frequent (CAS retries expensive)
Side effects	External (can’t retry)	Local (can retry)
Existing infrastructure	Lock service available	Database has CAS
Team expertise	Lock patterns understood	Lock-free patterns understood

Common Pitfalls

1. Holding Locks Across Async Boundaries

The mistake: Acquiring lock, then making RPC calls or doing I/O while holding it.

1// Dangerous: lock held during external call2const lock = await acquireLock(resource)3const data = await externalService.fetch() // Network call!4await db.update(resource, data)5await releaseLock(lock)

What goes wrong:

External call takes 10s; lock TTL is 5s
Lock expires while you’re still working
Another client acquires and corrupts data

Solution: Minimize lock scope. Fetch data first, then lock-update-unlock quickly.

1// Better: minimize lock duration2const data = await externalService.fetch()34const lock = await acquireLock(resource)5await db.update(resource, data)6await releaseLock(lock)

2. Ignoring Lock Acquisition Failure

The mistake: Assuming lock acquisition always succeeds.

1// Dangerous: no failure handling2await acquireLock(resource)3await criticalOperation()4await releaseLock(resource)

What goes wrong:

Lock service unavailable → operation proceeds without lock
Lock contention → silent failure, concurrent access

Solution: Always check acquisition result and handle failure.

1const acquired = await acquireLock(resource)2if (!acquired) {3  throw new Error("Failed to acquire lock - cannot proceed")4}5try {6  await criticalOperation()7} finally {8  await releaseLock(resource)9}

3. Lock-Release Race with TTL

The mistake: Releasing a lock you no longer own (it expired and was re-acquired).

1// Dangerous: release without ownership check2await lock.release() // May delete another client's lock!

What goes wrong:

Your lock expires due to slow operation
Another client acquires the lock
Your release() deletes their lock
Third client acquires, now two clients think they have it

Solution: Atomic release that checks ownership (shown in Redis Lua script earlier).

4. Thundering Herd on Lock Release

The mistake: All waiting clients wake simultaneously when lock releases.

What goes wrong with ZooKeeper naive implementation:

1000 clients watch /locks/resource parent node
Lock releases, all 1000 receive watch notification
All 1000 call getChildren() simultaneously
ZooKeeper overloaded, lock acquisition stalls

Solution: Watch predecessor only (shown in ZooKeeper recipe earlier). Only one client wakes per release.

5. Missing Fencing on Correctness-Critical Locks

The mistake: Using Redlock (or any lease-based lock) without fencing for correctness-critical operations.

1// Dangerous: no fencing2const lock = await redlock.acquire(resource)3await storage.write(data) // Stale lock holder can overwrite!4await redlock.release(lock)

Solution: Either use a lock service with fencing tokens (ZooKeeper) or accept that this lock is efficiency-only.

6. Session-Level Locks with Connection Pooling

The mistake: Using PostgreSQL session-level advisory locks with PgBouncer.

1-- Acquired by connection in pool2SELECT pg_advisory_lock(12345);3-- Connection returned to pool4-- Other client reuses connection5-- Lock is still held by "other" client!

Solution: Use transaction-level locks with pooling.

1BEGIN;2SELECT pg_advisory_xact_lock(12345);3-- Do work4COMMIT; -- Lock automatically released

Conclusion

Distributed locking is a coordination primitive that requires careful consideration of failure modes, timing assumptions, and fencing requirements.

Key decisions:

Efficiency vs correctness: Most locks are for efficiency (preventing duplicate work). These can use simpler implementations with known failure modes. Correctness-critical locks require consensus protocols and fencing.
Fencing is non-negotiable for correctness: Without fencing tokens, lease expiration during long operations corrupts data. Random lock values (Redlock) cannot fence.
Timing assumptions are dangerous: Redlock’s safety depends on bounded network delays, process pauses, and clock drift. Real systems violate all three.
Consider lock-free alternatives: Idempotent operations, CAS, optimistic concurrency, and queue-based serialization often work better than distributed locks.

Start simple: Single-node Redis locks work for most efficiency scenarios. Graduate to ZooKeeper with fencing only when correctness is critical and you understand the operational cost.

Appendix

Prerequisites

Distributed systems fundamentals (network partitions, consensus)
CAP theorem and consistency models
Basic understanding of lease-based coordination

Terminology

Term	Definition
Lease	Time-bounded lock that expires automatically
Fencing token	Monotonically increasing identifier that resources use to reject stale operations
Sequencer	Chubby’s term for a fencing token: opaque byte-string carrying lock name, mode, and lock generation number
TTL	Time-To-Live; duration before lease expires
Quorum	Majority of nodes (N/2 + 1) required for consensus
Split-brain	Network partition where multiple partitions believe they are authoritative
zxid	ZooKeeper transaction ID; monotonically increasing, usable as fencing token
Lock delay	Consul’s cool-down window after session invalidation during which the lock cannot be re-acquired
Advisory lock	Lock that does not prevent access — just signals intention; correctness depends on holders honoring the protocol
Ephemeral node	ZooKeeper node that is automatically deleted when the client session ends

Summary

Distributed locks are harder than they appear—network partitions, clock drift, and process pauses all cause multiple clients to believe they hold the same lock
Leases (auto-expiring locks) prevent deadlock but introduce the lease-expiration-during-work problem
Fencing tokens solve this by having the resource reject operations from stale lock holders
Redlock provides fault-tolerant efficiency locks but cannot fence (random values lack ordering)
ZooKeeper/etcd provide fencing tokens (zxid/revision) but add operational complexity
Lock-free alternatives (CAS, idempotency, queues) often work better than distributed locks
For correctness-critical locks: use consensus + fencing; for efficiency locks: Redis single-node is often sufficient

References

Foundational:

How to do distributed locking — Martin Kleppmann’s analysis of Redlock (2016).
Is Redlock safe? — Salvatore Sanfilippo’s rebuttal (2016).
The Chubby Lock Service for Loosely-Coupled Distributed Systems — Mike Burrows, OSDI 2006.
Consensus in the Presence of Partial Synchrony — Dwork, Lynch & Stockmeyer, JACM 1988. The model that Redlock implicitly assumes.
Impossibility of Distributed Consensus with One Faulty Process — Fischer, Lynch & Paterson, JACM 1985 (FLP).

Implementation documentation:

Redis Distributed Locks — official Redis distributed-lock pattern (the Redlock spec).
ZooKeeper Recipes and Solutions — canonical lock recipe with predecessor-watch.
etcd Concurrency API — etcd lease, mutex, and election APIs.
Consul Sessions and Distributed Locks — session lifecycle, LockDelay, LockIndex sequencer.
Consul KV HTTP API — acquire / release CAS semantics on KV keys.
PostgreSQL Advisory Locks — session and transaction-scoped advisory locks.

Testing and analysis:

Jepsen: etcd 3.4.3 — Kyle Kingsbury’s analysis finding mutex violations in etcd locks even in healthy clusters.
etcd Jepsen results blog — etcd team’s response and patch reference.
Designing Data-Intensive Applications — Martin Kleppmann. Chapter 8 covers distributed coordination and locking.

Libraries:

Redisson — Redis Java client with distributed locks (implements Redlock and a single-node variant).
node-redlock — Redlock implementation for Node.js.
Apache Curator — ZooKeeper recipes, including InterProcessMutex and LeaderLatch.

Martin Kleppmann. “How to do distributed locking”. 2016. ↩ ↩² ↩³