Core Distributed Patterns
20 min read

Distributed Locking

Distributed locks coordinate access to shared resources across multiple processes or nodes. Unlike single-process mutexes, they must handle network partitions, clock drift, process pauses, and partial failures—all while providing mutual exclusion guarantees that range from “best effort” to “correctness critical.”

This article covers lock implementations (Redis, ZooKeeper, etcd, Chubby), the Redlock controversy, fencing tokens, lease-based expiration, and when to avoid locks entirely.

Failure Modes

The Core Problem

Acquire

Acquire

Granted

Denied/Wait

Access

Corrupted if

Client 1

Lock Service

Client 2

Resource

Network Partition: Client thinks it has lock, server disagrees

Clock Drift: Lease expires early/late

Process Pause: GC stall outlives lease

Split-Brain: Multiple clients believe they hold lock

Distributed locks must handle failures that single-process locks never face—network partitions, clock drift, and process pauses can all cause multiple clients to believe they hold the same lock simultaneously.

Distributed locking is fundamentally harder than it appears. The safety property—at most one client holds the lock at any time—requires either consensus protocols (ZooKeeper, etcd) or careful timing assumptions that can fail under realistic conditions (Redlock).

Core mental model:

  • Efficiency locks: Prevent duplicate work. Occasional double-execution is tolerable. Redis single-node or Redlock works.
  • Correctness locks: Protect invariants. Double-execution corrupts data. Requires consensus + fencing tokens.

Key insight: Most lock implementations provide leases (auto-expiring locks) rather than indefinite locks. Leases prevent deadlock from crashed clients but introduce the fundamental problem: what if the lease expires while the client is still working?

Fencing tokens solve this: the lock service issues a monotonically increasing token with each lock grant. The protected resource rejects operations with tokens lower than the highest it has seen. This transforms lease expiration from a safety violation into a detected-and-rejected stale operation.

Decision framework:

RequirementImplementationTrade-off
Best-effort deduplicationRedis single-nodeSingle point of failure
Efficiency with fault toleranceRedlock (5 nodes)No fencing, timing assumptions
Correctness criticalZooKeeper/etcd + fencingOperational complexity
Already using PostgreSQLAdvisory locksLimited to single database

Approach 1: File-based locks across NFS

// Naive NFS lock - seems simple
async function acquireLock(path: string): Promise<boolean> {
try {
await fs.writeFile(path, process.pid, { flag: "wx" }) // exclusive create
return true
} catch {
return false // file exists
}
}

Fails because:

  • NFS semantics vary: O_EXCL isn’t atomic on all NFS implementations
  • No expiration: If the process crashes, lock file persists forever
  • No fencing: Stale lock holders can still access the resource

Approach 2: Database row locks

-- Lock by inserting a row
INSERT INTO locks (resource_id, holder, acquired_at)
VALUES ('resource-1', 'client-a', NOW())
ON CONFLICT DO NOTHING;

Fails because:

  • No automatic expiration: Crashed clients leave orphan locks
  • Clock drift: acquired_at timestamps unreliable across nodes
  • Single point of failure: Database becomes bottleneck

Approach 3: Redis SETNX without TTL

SETNX resource:lock client-id

Fails because:

  • No expiration: Crashed client locks resource forever
  • Race on release: Client must check-then-delete atomically

The fundamental tension: distributed systems are asynchronous—there are no bounded delays on message delivery, no bounded process pauses, and no bounded clock drift.

Distributed locks exist to provide mutual exclusion across this asynchronous environment. The challenge: you cannot distinguish a slow client from a crashed client, and you cannot trust clocks.

“Distributed locks are not just a scaling challenge—they’re a correctness challenge. The algorithm must be correct even when clocks are wrong, networks are partitioned, and processes pause unexpectedly.” — Martin Kleppmann, “How to do distributed locking” (2016)

All practical distributed locks use leases—time-bounded locks that expire automatically. This prevents indefinite lock holding by crashed clients.

ResourceLock ServiceClientResourceLock ServiceClientWork beginsWork continues...Acquire lock (TTL=30s)Granted (expires at T+30s)Access resourceRelease lock (or let expire)Released
MIN_VALIDITY = TTL - (T_acquire - T_start) - CLOCK_DRIFT

Where:

  • TTL: Initial lease duration
  • T_acquire - T_start: Time elapsed acquiring the lock
  • CLOCK_DRIFT: Maximum expected clock skew between client and server

Practical guidance:

  • JVM applications: TTL ≥ 60s (stop-the-world GC can pause for seconds)
  • Go/Rust applications: TTL ≥ 30s (less GC concern, but network issues)
  • General rule: TTL should be 10x your expected operation duration

Wall-clock danger: Redis uses wall-clock time for TTL expiration. If the server’s clock jumps forward (NTP adjustment, manual change), leases expire prematurely.

Example failure scenario:

  1. Client acquires lock with TTL=30s at server time T
  2. NTP adjusts server clock forward by 20s
  3. Lock expires at “T+30s” = actual T+10s
  4. Client still working; another client acquires lock
  5. Two clients now hold the “same” lock

Mitigation: Use monotonic clocks where possible. Linux clock_gettime(CLOCK_MONOTONIC) measures elapsed time without wall-clock adjustments.

Prior to Redis 7.0: TTL expiration relied entirely on wall-clock time. Redis 7.0+ uses monotonic clocks internally for some operations, but the fundamental issue remains for distributed Redlock scenarios where multiple independent clocks are involved.

When to choose:

  • Lock is for efficiency (prevent duplicate work), not correctness
  • Single point of failure is acceptable
  • Lowest latency requirement

Implementation:

2 collapsed lines
import { Redis } from "ioredis"
async function acquireLock(redis: Redis, resource: string, clientId: string, ttlMs: number): Promise<boolean> {
// SET with NX (only if not exists) and PX (millisecond expiry)
const result = await redis.set(resource, clientId, "NX", "PX", ttlMs)
return result === "OK"
}
async function releaseLock(redis: Redis, resource: string, clientId: string): Promise<boolean> {
// Lua script: atomic check-and-delete
// Only delete if we still own the lock
const script = `
if redis.call("get", KEYS[1]) == ARGV[1] then
return redis.call("del", KEYS[1])
else
return 0
end
`
const result = await redis.eval(script, 1, resource, clientId)
return result === 1
}

Why the Lua script for release: Without atomic check-and-delete, this race exists:

  1. Client A’s lock expires
  2. Client B acquires lock
  3. Client A (still thinking it has lock) calls DEL
  4. Client A deletes Client B’s lock

Trade-offs:

AdvantageDisadvantage
Simple implementationSingle point of failure
Low latency (~1ms)No automatic failover
Well-understood semanticsLost locks on master crash

Real-world: This approach works well for rate limiting, cache stampede prevention, and other scenarios where occasional double-execution is tolerable.

When to choose:

  • Need fault tolerance for efficiency locks
  • Can tolerate timing assumptions
  • Want Redis ecosystem (Lua scripts, familiar API)

Algorithm (N=5 independent Redis instances):

  1. Get current time in milliseconds
  2. Try to acquire lock on ALL N instances sequentially, with small timeout per instance
  3. Lock is acquired if: majority (N/2 + 1) succeeded AND total elapsed time < TTL
  4. Validity time = TTL - elapsed time
  5. If failed, release lock on ALL instances (even those that succeeded)
8 collapsed lines
import { Redis } from "ioredis"
import { randomBytes } from "crypto"
interface RedlockResult {
acquired: boolean
validity: number
value: string
}
async function redlockAcquire(instances: Redis[], resource: string, ttlMs: number): Promise<RedlockResult> {
const value = randomBytes(20).toString("hex")
const startTime = Date.now()
const quorum = Math.floor(instances.length / 2) + 1
let acquired = 0
for (const redis of instances) {
try {
const result = await redis.set(resource, value, "NX", "PX", ttlMs)
if (result === "OK") acquired++
} catch {
// Instance unavailable, continue
}
}
const elapsed = Date.now() - startTime
const validity = ttlMs - elapsed
if (acquired >= quorum && validity > 0) {
return { acquired: true, validity, value }
}
// Failed - release all locks
await Promise.all(instances.map((r) => releaseLock(r, resource, value)))
return { acquired: false, validity: 0, value }
}

Critical limitation: Redlock generates random values (20 bytes from /dev/urandom), not monotonically increasing tokens. You cannot use Redlock values for fencing because resources cannot determine which token is “newer.”

Trade-offs vs single-node:

AspectSingle-NodeRedlock (N=5)
Fault toleranceNoneSurvives N/2 failures
Latency~1ms~5ms (sequential attempts)
ComplexityLowMedium
Fencing supportNoNo
Clock assumptionsServer onlyAll N servers + client

When to choose:

  • Correctness-critical locks (fencing required)
  • Already running ZooKeeper (Kafka, HBase ecosystem)
  • Can tolerate higher latency for stronger guarantees

Ephemeral sequential node recipe:

/locks/resource-1

Holds lock

Watches N1

Watches N2

lock-0000000001 (Client A)

lock-0000000002 (Client B)

lock-0000000003 (Client C)

Lock

Algorithm:

  1. Client creates ephemeral sequential node under /locks/resource
  2. Client lists all children, sorts by sequence number
  3. If client’s node has lowest sequence: lock acquired
  4. Otherwise: set watch on the node with next-lowest sequence number
  5. When watch fires: repeat step 2

Why watch predecessor, not parent:

  • Watching parent causes thundering herd: all N clients wake when lock releases
  • Watching predecessor: only next client wakes

Fencing via zxid: ZooKeeper’s transaction ID (zxid) is a monotonically increasing 64-bit number. Use the zxid of your lock node as a fencing token.

6 collapsed lines
import org.apache.zookeeper.*;
import java.util.List;
import java.util.Collections;
public class ZkLock {
private final ZooKeeper zk;
private final String lockPath;
private String myNode;
public long acquireLock(String resource) throws Exception {
// Create ephemeral sequential node
myNode = zk.create(
"/locks/" + resource + "/lock-",
new byte[0],
ZooDefs.Ids.OPEN_ACL_UNSAFE,
CreateMode.EPHEMERAL_SEQUENTIAL
);
while (true) {
List<String> children = zk.getChildren("/locks/" + resource, false);
Collections.sort(children);
String smallest = children.get(0);
if (myNode.endsWith(smallest)) {
// We have the lock - return zxid as fencing token
Stat stat = zk.exists(myNode, false);
return stat.getCzxid();
}
// Find predecessor and watch it
int myIndex = children.indexOf(myNode.substring(myNode.lastIndexOf('/') + 1));
String predecessor = children.get(myIndex - 1);
// This blocks until predecessor is deleted
Stat stat = zk.exists("/locks/" + resource + "/" + predecessor, true);
if (stat != null) {
// Wait for watch notification
synchronized (this) { wait(); }
}
}
}
}

Trade-offs:

AdvantageDisadvantage
Strong consistency (Zab consensus)Higher latency (2+ RTTs)
Automatic cleanup (ephemeral nodes)Operational complexity
Fencing tokens (zxid)Session management overhead
No clock assumptionsQuorum unavailable = no locks

When to choose:

  • Kubernetes-native environment
  • Prefer gRPC over custom protocols
  • Need distributed KV store beyond just locking

Lease-based locking:

etcd provides first-class lease primitives. A lease is a token with a TTL; keys can be attached to leases and are automatically deleted when the lease expires.

10 collapsed lines
package main
import (
"context"
"time"
clientv3 "go.etcd.io/etcd/client/v3"
"go.etcd.io/etcd/client/v3/concurrency"
)
func acquireLock(client *clientv3.Client, resource string) (*concurrency.Mutex, error) {
// Create session with 30s TTL
session, err := concurrency.NewSession(client, concurrency.WithTTL(30))
if err != nil {
return nil, err
}
// Create mutex and acquire
mutex := concurrency.NewMutex(session, "/locks/"+resource)
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
if err := mutex.Lock(ctx); err != nil {
return nil, err
}
// Use mutex.Header().Revision as fencing token
return mutex, nil
}

Fencing via revision: etcd assigns a globally unique, monotonically increasing revision to every modification. Use mutex.Header().Revision as your fencing token.

Critical limitation (Jepsen finding): Under network partitions, etcd locks can fail to provide mutual exclusion. Jepsen testing found ~18% loss of acknowledged updates when locks protected concurrent modifications. The root cause: etcd must sacrifice correctness to preserve liveness in asynchronous systems.

“etcd’s lock is not safe. It is possible for two processes to simultaneously hold the same lock, even in healthy clusters.” — Kyle Kingsbury, Jepsen analysis of etcd 3.4.3 (2020)

Trade-offs:

AdvantageDisadvantage
Raft consensus (strong consistency)Jepsen found safety violations
Native lease supportHigher latency than Redis
Kubernetes integrationOperational complexity
Revision-based fencingQuorum unavailable = no locks

When to choose:

  • Already using PostgreSQL
  • Lock scope is single database
  • Don’t want external dependencies

Session-level advisory locks:

-- Acquire lock (blocks until available)
SELECT pg_advisory_lock(hashtext('resource-1'));
-- Try acquire (returns immediately)
SELECT pg_try_advisory_lock(hashtext('resource-1'));
-- Release
SELECT pg_advisory_unlock(hashtext('resource-1'));

Transaction-level advisory locks:

-- Automatically released at transaction end
SELECT pg_advisory_xact_lock(hashtext('resource-1'));
-- Then do your work within the transaction
UPDATE resources SET ... WHERE id = 'resource-1';

Lock ID generation: Advisory locks take a 64-bit integer key. Use hashtext() for string-based resource IDs, or encode your own scheme.

Connection pooling danger: Session-level locks are tied to the database connection. With connection pooling (PgBouncer), your “session” may be reused by another client, leaking locks. Use transaction-level locks with connection pooling.

Trade-offs:

AdvantageDisadvantage
No external dependenciesSingle database scope
ACID guaranteesConnection pooling issues
Already have PostgreSQLNot for multi-database
Automatic transaction cleanupLock ID collisions possible
FactorRedis SingleRedlockZooKeeperetcdPostgreSQL
Fault toleranceNoneN/2 failuresN/2 failuresN/2 failuresDatabase HA
Fencing tokensNoNoYes (zxid)Yes (revision)No
Latency (acquire)~1ms~5-10ms~10-50ms~10-50ms~1-5ms
Clock assumptionsYesYes (all nodes)NoNoNo
Correctness guaranteeNoNoYesPartial (Jepsen)Yes (single DB)
Operational complexityLowMediumHighMediumLow

No: efficiency only

Yes: data corruption risk

No

Yes

Yes

No

Yes

No

Need distributed lock

Correctness critical?

Fault tolerance needed?

Already have ZooKeeper?

Redis single-node

Redlock

ZooKeeper + fencing

Single database scope?

PostgreSQL advisory

ZooKeeper or etcd + fencing

Leases expire. When they do, a “stale” lock holder may still be executing its critical section. Without fencing, this corrupts the protected resource.

Example failure scenario:

Client 2StorageLock ServiceClient 1Client 2StorageLock ServiceClient 1GC pause begins (45s)Lease expires at 30s...still paused...GC pause endsData corrupted!Acquire lockGranted (30s lease)Acquire lockGrantedWrite "value-B"OKWrite "value-A" (stale!)OK (without fencing)
  1. Lock service issues monotonically increasing token with each grant
  2. Client includes token with every operation on protected resource
  3. Resource tracks highest token ever seen
  4. Resource rejects operations with token < highest seen
Client 2StorageLock ServiceClient 1Client 2StorageLock ServiceClient 1GC pause (lease expires)GC endsData protected!Acquire lockGranted, token=33Acquire lockGranted, token=34Write (token=34)OK (highest=34)Write (token=33)REJECTED (33 < 34)

Lock service side:

4 collapsed lines
interface LockGrant {
token: bigint // Monotonically increasing
expiresAt: number
}
class FencingLockService {
private nextToken: bigint = 1n
private locks: Map<string, { holder: string; token: bigint; expiresAt: number }> = new Map()
acquire(resource: string, clientId: string, ttlMs: number): LockGrant | null {
const existing = this.locks.get(resource)
if (existing && existing.expiresAt > Date.now()) {
return null // Lock held
}
const token = this.nextToken++
const expiresAt = Date.now() + ttlMs
this.locks.set(resource, { holder: clientId, token, expiresAt })
return { token, expiresAt }
}
}

Resource side:

6 collapsed lines
interface FencedWrite {
token: bigint
data: unknown
}
class FencedStorage {
private highestToken: Map<string, bigint> = new Map()
private data: Map<string, unknown> = new Map()
write(resource: string, write: FencedWrite): boolean {
const highest = this.highestToken.get(resource) ?? 0n
if (write.token < highest) {
// Stale token - reject
return false
}
// Accept write, update highest seen
this.highestToken.set(resource, write.token)
this.data.set(resource, write.data)
return true
}
}

Redlock uses random values (20 bytes), not ordered tokens. A resource cannot determine if abc123 is “newer” than xyz789. This is why Redlock cannot provide fencing—the values lack the ordering property required to reject stale operations.

“To make the lock safe with fencing, you need not just a random token, but a monotonically increasing token. And the only way to generate a monotonically increasing token is to use a consensus protocol.” — Martin Kleppmann

ZooKeeper’s transaction ID (zxid) is perfect for fencing:

  • Monotonically increasing: Every ZK transaction increments it
  • Globally ordered: All clients see same ordering
  • Available at lock time: Stat.getCzxid() returns creation zxid
// When acquiring lock
Stat stat = zk.exists(myLockNode, false);
long fencingToken = stat.getCzxid();
// When accessing resource
resource.write(data, fencingToken);

Martin Kleppmann identified fundamental problems with Redlock:

1. Timing assumptions violated by real systems:

Redlock assumes bounded network delay, bounded process pauses, and bounded clock drift. Real systems violate all three:

  • Network packets can be delayed arbitrarily (TCP retransmits, routing changes)
  • GC pauses can exceed lease TTL (observed: 1+ minutes in production JVMs)
  • Clock skew can be seconds under adversarial NTP conditions

2. No fencing capability:

Even if Redlock worked perfectly, it generates random values, not monotonic tokens. Resources cannot reject stale operations.

3. Clock jump scenario:

  1. Client acquires lock on 3 of 5 Redis instances
  2. Clock on one instance jumps forward (NTP sync)
  3. Lock expires prematurely on that instance
  4. Another client acquires on 3 instances (the jumped one + 2 others)
  5. Two clients now hold majority

Salvatore Sanfilippo (Redis creator) responded:

1. Random values + CAS = sufficient:

“The token is a random string. If you use check-and-set (CAS), you can use the random string to ensure that only the lock owner can modify the resource.”

2. Post-acquisition time check:

Redlock spec includes checking elapsed time after acquisition. If elapsed > TTL, the lock is considered invalid. This allegedly handles delayed responses.

3. Monotonic clocks:

Proposed using CLOCK_MONOTONIC instead of wall clocks to eliminate clock jump issues.

Neither argument is fully satisfying:

Kleppmann’s pointsAntirez’s counterpointsReality
GC pauses violate timingPost-acquisition check helpsPauses can happen during resource access, not just during acquire
No fencing possibleRandom + CAS worksCAS requires resource to store lock value; not always feasible
Clock jumps break safetyUse monotonic clocksCross-machine monotonic clocks don’t exist

Practical guidance:

  • Efficiency locks: Redlock is acceptable. Double-execution is annoying but not catastrophic.
  • Correctness locks: Use consensus-based systems (ZooKeeper) with fencing tokens. Redlock’s random values cannot fence.

Context: Internal distributed lock service powering GFS, BigTable, and other Google infrastructure. Open-sourced concept inspired ZooKeeper.

Architecture:

  • 5-replica Paxos cluster
  • Replicas elect master using Paxos; master lease is several seconds
  • Client sessions with grace periods (45s default)
  • Files + locks (locks are files with special semantics)

Key design decisions:

  • Coarse-grained locks: Designed for locks held minutes to hours, not milliseconds
  • Advisory locks by default: Files don’t prevent access without explicit lock checks
  • Master lease renewal: Master doesn’t lose leadership on brief network blips
  • Client grace period: On leader change, clients have 45s to reconnect before session (and locks) invalidate

Fencing mechanism: Chubby supports sequencers (fencing tokens). The lock service hands out sequencers; resources can verify them with Chubby before accepting writes.

“If a process’s lease has expired, the lock server will refuse to validate the sequencer.” — Mike Burrows, “The Chubby Lock Service” (2006)

Scale: Chubby is not designed for high-frequency locking. It’s optimized for reliability of infrequent operations, not throughput.

Context: When a rider requests a cab, multiple nearby drivers could be assigned. Exactly one driver must be assigned per ride.

Problem:

  • Multiple matching service instances receive the same request
  • Race condition: both try to assign the same driver
  • Result: driver assigned to multiple rides, customer experience failure

Solution:

  • Distributed lock on driver:{driver_id} before assignment
  • Lock held only during assignment operation (~10-100ms)
  • Redis-based (likely Redlock or single-node with replication)

Why it works: This is an efficiency lock. If two services somehow both assign the same driver (lock failure), the booking system downstream rejects the duplicate. Occasional failures are detected and handled.

Context: Millions of distributed jobs, some triggered by events that may arrive multiple times.

Problem:

  • Event arrives at multiple consumer instances
  • Same job should execute exactly once
  • Idempotency alone doesn’t help if job has side effects

Solution approach:

  • Acquire lock before processing event
  • Lock key: job:{event_id}:{job_type}
  • TTL: Expected job duration + buffer
  • Combined with idempotency keys in downstream services

Insight: Netflix uses a layered approach—locks provide first-line deduplication, idempotent operations provide safety net, and monitoring detects drift.

AspectGoogle ChubbyUberNetflix
Lock typeCorrectnessEfficiencyEfficiency
DurationMinutes-hoursMillisecondsSeconds
BackendPaxos (custom)RedisRedis/ZK hybrid
FencingSequencersDownstream checksIdempotency keys
ScaleLow freq, high reliabilityHigh freq, acceptable lossHigh freq, acceptable loss

Distributed locks add complexity and failure modes. Before reaching for a lock, consider:

1. Idempotent operations:

If your operation can safely execute multiple times with the same result, you don’t need a lock.

// Bad: non-idempotent
async function incrementCounter(id: string) {
const current = await db.get(id)
await db.set(id, current + 1)
}
// Good: idempotent with versioning
async function setCounterIfMatch(id: string, expectedVersion: number, newValue: number) {
await db
.update(id)
.where("version", expectedVersion)
.set({ value: newValue, version: expectedVersion + 1 })
}

2. Compare-and-Swap (CAS):

Many databases support atomic CAS. Use it instead of external locks.

-- CAS-based update
UPDATE resources
SET value = 'new-value', version = version + 1
WHERE id = 'resource-1' AND version = 42;
-- Check rows affected - if 0, retry with fresh version

3. Optimistic concurrency:

Assume no conflicts; detect and retry on collision.

6 collapsed lines
interface VersionedResource {
data: unknown
version: number
}
async function optimisticUpdate(id: string, transform: (data: unknown) => unknown) {
while (true) {
const resource = await db.get(id)
const newData = transform(resource.data)
const updated = await db.update(id, {
data: newData,
version: resource.version + 1,
_where: { version: resource.version },
})
if (updated) return // Success
// Version conflict - retry
}
}

4. Queue-based serialization:

Route all operations for a resource to a single queue/partition.

Op for R1

Op for R1

Op for R2

Sequential

Sequential

Client 1

Queue: R1

Client 2

Client 3

Queue: R2

Worker

Worker

Resource 1

Resource 2

This eliminates concurrent access by design.

FactorUse Distributed LockUse Lock-Free
Operation complexityMulti-step, non-decomposableSingle atomic operation
Conflict frequencyRareFrequent (CAS retries expensive)
Side effectsExternal (can’t retry)Local (can retry)
Existing infrastructureLock service availableDatabase has CAS
Team expertiseLock patterns understoodLock-free patterns understood

The mistake: Acquiring lock, then making RPC calls or doing I/O while holding it.

// Dangerous: lock held during external call
const lock = await acquireLock(resource)
const data = await externalService.fetch() // Network call!
await db.update(resource, data)
await releaseLock(lock)

What goes wrong:

  • External call takes 10s; lock TTL is 5s
  • Lock expires while you’re still working
  • Another client acquires and corrupts data

Solution: Minimize lock scope. Fetch data first, then lock-update-unlock quickly.

// Better: minimize lock duration
const data = await externalService.fetch()
const lock = await acquireLock(resource)
await db.update(resource, data)
await releaseLock(lock)

The mistake: Assuming lock acquisition always succeeds.

// Dangerous: no failure handling
await acquireLock(resource)
await criticalOperation()
await releaseLock(resource)

What goes wrong:

  • Lock service unavailable → operation proceeds without lock
  • Lock contention → silent failure, concurrent access

Solution: Always check acquisition result and handle failure.

const acquired = await acquireLock(resource)
if (!acquired) {
throw new Error("Failed to acquire lock - cannot proceed")
}
try {
await criticalOperation()
} finally {
await releaseLock(resource)
}

The mistake: Releasing a lock you no longer own (it expired and was re-acquired).

// Dangerous: release without ownership check
await lock.release() // May delete another client's lock!

What goes wrong:

  1. Your lock expires due to slow operation
  2. Another client acquires the lock
  3. Your release() deletes their lock
  4. Third client acquires, now two clients think they have it

Solution: Atomic release that checks ownership (shown in Redis Lua script earlier).

The mistake: All waiting clients wake simultaneously when lock releases.

What goes wrong with ZooKeeper naive implementation:

  • 1000 clients watch /locks/resource parent node
  • Lock releases, all 1000 receive watch notification
  • All 1000 call getChildren() simultaneously
  • ZooKeeper overloaded, lock acquisition stalls

Solution: Watch predecessor only (shown in ZooKeeper recipe earlier). Only one client wakes per release.

The mistake: Using Redlock (or any lease-based lock) without fencing for correctness-critical operations.

// Dangerous: no fencing
const lock = await redlock.acquire(resource)
await storage.write(data) // Stale lock holder can overwrite!
await redlock.release(lock)

Solution: Either use a lock service with fencing tokens (ZooKeeper) or accept that this lock is efficiency-only.

The mistake: Using PostgreSQL session-level advisory locks with PgBouncer.

-- Acquired by connection in pool
SELECT pg_advisory_lock(12345);
-- Connection returned to pool
-- Other client reuses connection
-- Lock is still held by "other" client!

Solution: Use transaction-level locks with pooling.

BEGIN;
SELECT pg_advisory_xact_lock(12345);
-- Do work
COMMIT; -- Lock automatically released

Distributed locking is a coordination primitive that requires careful consideration of failure modes, timing assumptions, and fencing requirements.

Key decisions:

  1. Efficiency vs correctness: Most locks are for efficiency (preventing duplicate work). These can use simpler implementations with known failure modes. Correctness-critical locks require consensus protocols and fencing.

  2. Fencing is non-negotiable for correctness: Without fencing tokens, lease expiration during long operations corrupts data. Random lock values (Redlock) cannot fence.

  3. Timing assumptions are dangerous: Redlock’s safety depends on bounded network delays, process pauses, and clock drift. Real systems violate all three.

  4. Consider lock-free alternatives: Idempotent operations, CAS, optimistic concurrency, and queue-based serialization often work better than distributed locks.

Start simple: Single-node Redis locks work for most efficiency scenarios. Graduate to ZooKeeper with fencing only when correctness is critical and you understand the operational cost.

  • Distributed systems fundamentals (network partitions, consensus)
  • CAP theorem and consistency models
  • Basic understanding of lease-based coordination
TermDefinition
LeaseTime-bounded lock that expires automatically
Fencing tokenMonotonically increasing identifier that resources use to reject stale operations
TTLTime-To-Live; duration before lease expires
QuorumMajority of nodes (N/2 + 1) required for consensus
Split-brainNetwork partition where multiple partitions believe they are authoritative
zxidZooKeeper transaction ID; monotonically increasing, usable as fencing token
Advisory lockLock that doesn’t prevent access—just signals intention
Ephemeral nodeZooKeeper node that is automatically deleted when the client session ends
  • Distributed locks are harder than they appear—network partitions, clock drift, and process pauses all cause multiple clients to believe they hold the same lock
  • Leases (auto-expiring locks) prevent deadlock but introduce the lease-expiration-during-work problem
  • Fencing tokens solve this by having the resource reject operations from stale lock holders
  • Redlock provides fault-tolerant efficiency locks but cannot fence (random values lack ordering)
  • ZooKeeper/etcd provide fencing tokens (zxid/revision) but add operational complexity
  • Lock-free alternatives (CAS, idempotency, queues) often work better than distributed locks
  • For correctness-critical locks: use consensus + fencing; for efficiency locks: Redis single-node is often sufficient

Foundational:

Implementation Documentation:

Testing and Analysis:

Libraries:

  • Redisson - Redis Java client with distributed locks.
  • node-redlock - Redlock implementation for Node.js.
  • Curator - ZooKeeper recipes including distributed locks.

Read more

  • Previous

    Operational Transformation

    System Design / Core Distributed Patterns 17 min read

    Deep-dive into Operational Transformation (OT): the algorithm powering Google Docs, with its design variants, correctness properties, and production trade-offs.OT enables real-time collaborative editing by transforming concurrent operations so that all clients converge to the same document state. Despite being the foundation of nearly every production collaborative editor since 1995, OT has a troubled academic history—most published algorithms were later proven incorrect. This article covers why OT is hard, which approaches actually work, and how production systems sidestep the theoretical pitfalls.

  • Next

    Exactly-Once Delivery

    System Design / Core Distributed Patterns 23 min read

    True exactly-once delivery is impossible in distributed systems—the Two Generals Problem (1975) and FLP impossibility theorem (1985) prove this mathematically. What we call “exactly-once” is actually “effectively exactly-once”: at-least-once delivery combined with idempotency and deduplication mechanisms that ensure each message’s effect occurs exactly once, even when the message itself is delivered multiple times.