Image Processing Service Design: CDN, Transforms, and APIs

This document presents the architectural design for a cloud-agnostic, multi-tenant image processing platform that provides on-the-fly transformations with enterprise-grade security, performance, and cost optimization. The platform supports hierarchical multi-tenancy (Organization → Tenant → Space), public and private image delivery, and deployment across AWS, GCP, Azure, or on-premise infrastructure. Key capabilities include deterministic transformation caching to ensure sub-second delivery, HMAC-SHA256 signed URLs for secure private access, CDN (Content Delivery Network) integration for global edge caching, and a “transform-once-serve-forever” approach that minimizes processing costs while guaranteeing HTTP 200 responses even for first-time transformation requests.

High-level architecture: Clients request images through CDN, with cache misses handled by the Image Gateway which orchestrates transformation, caching, and storage

Abstract

Image processing at scale requires balancing three competing concerns: latency (users expect sub-second delivery), cost (processing and storage grow with traffic), and correctness (transformations must be deterministic and secure). This architecture resolves these tensions through a layered caching strategy with content-addressed storage.

Multi-layer caching eliminates 99.9% of redundant processing—only the first request for each unique transformation hits the Transform Engine.

Core mental model:

Content-addressed storage: Hash(original + operations) → unique derived asset. Same inputs always produce the same output, enabling infinite caching.
Synchronous-first with async fallback: Transform inline for < 5MB images (< 800ms). Queue larger images but return 202 with polling URL.
Efficiency locks, not safety locks: Redlock prevents duplicate processing but doesn’t guarantee mutual exclusion. If two transforms race, both succeed—we just store one.
Hierarchical policies: Organization → Tenant → Space inheritance. Override at any level. Enforce at every layer (API, database, CDN).

Technology selection rationale:

Component	Choice	Why
Image processor	Sharp 0.34+ (libvips)	26x faster than jimp, 4-5x faster than ImageMagick, ~50MB memory per worker
Distributed lock	Redlock	Sufficient for efficiency (not correctness); simpler than etcd/ZooKeeper
Formats	AVIF → WebP → JPEG	AVIF: 94.89% browser support, 50% smaller than JPEG. WebP: 95.93% support, 25-34% savings
Database	PostgreSQL + JSONB	Row-level security, flexible policy storage, proven at scale

System Overview

Core Capabilities

Multi-Tenancy Hierarchy
- Organization: Top-level tenant boundary
- Tenant: Logical partition within organization (brands, environments)
- Space: Project workspace containing assets
Image Access Models
- Public Images: Direct URL access with CDN caching
- Private Images: Cryptographically signed URLs with expiration
On-the-Fly Processing
- Real-time transformations (resize, crop, format, quality, effects)
- Named presets for common transformation patterns
- Automatic format optimization (WebP, AVIF)
- Guaranteed 200 response even on first transform request
Cloud-Agnostic Design
- Deployment to AWS, GCP, Azure, or on-premise
- Storage abstraction layer for portability
- Kubernetes-based orchestration
Performance & Cost Optimization
- Multi-layer caching (CDN → Redis → Database → Storage)
- Transform deduplication with content-addressed storage
- Lazy preset generation
- Storage lifecycle management

Component Naming

Core Services

Component	Name	Purpose
Entry point	Image Gateway	API gateway, routing, authentication
Transform service	Transform Engine	On-demand image processing
Upload handler	Asset Ingestion Service	Image upload and validation
Admin API	Control Plane API	Tenant management, configuration
Background jobs	Transform Workers	Async preset generation
Metadata store	Registry Service	Asset and transformation metadata
Storage layer	Object Store Adapter	Cloud-agnostic storage interface
CDN layer	Edge Cache	Global content delivery
URL signing	Signature Service	Private URL cryptographic signing

Data Entities

Entity	Name	Description
Uploaded file	Asset	Original uploaded image
Processed variant	Derived Asset	Transformed image
Named transform	Preset	Reusable transformation template
Transform result	Variant	Cached transformation output

Architecture Principles

1. Cloud Portability First

Storage Abstraction: Unified interface for S3, GCS, Azure Blob, MinIO
Queue Abstraction: Support for SQS, Pub/Sub, Service Bus, RabbitMQ
Kubernetes Native: Deploy consistently across clouds
No Vendor Lock-in: Use open standards where possible

2. Performance SLA

Edge Hit: < 50ms (CDN cache)
Origin Hit: < 200ms (application cache)
First Transform: < 800ms (sync processing for images < 5MB)
Always Return 200: Never return 202 or redirect

3. Transform Once, Serve Forever

Content-addressed transformation storage
Idempotent processing with distributed locking
Permanent caching with invalidation API
Deduplication across requests

4. Security by Default

Signed URLs for private content
Row-level tenancy isolation
Encryption at rest and in transit
Comprehensive audit logging

5. Cost Optimization

Multi-layer caching to reduce processing
Storage lifecycle automation
Format optimization (WebP/AVIF)
Rate limiting and resource quotas

Technology Stack

Core Technologies

Image Processing Library

Technology	Pros	Cons	Recommendation
Sharp (libvips)	26x faster than jimp, low memory (~50MB), modern formats	Linux-focused build	✅ Recommended
ImageMagick	Feature-rich, mature	4-5x slower than Sharp	Use for complex operations
Jimp	Pure JavaScript, portable	Very slow, limited formats	Development only

Choice: Sharp 0.34+ (latest: 0.34.5, November 2025) with libvips 8.18 for primary processing.

Why Sharp over alternatives:

Performance: 64.42 ops/sec for JPEG processing on x64, 49.20 ops/sec on ARM64 (benchmarked without libvips caching; production performance higher)
Memory efficiency: Uses streaming and memory-mapped I/O; set MALLOC_ARENA_MAX="2" on Linux to reduce glibc fragmentation
Modern format support: AVIF, WebP, and as of libvips 8.18: UltraHDR (for HDR displays), Camera RAW via libraw, Oklab colorspace (CSS4-standard, faster than CIELAB)

libvips 8.18 (December 2025) notable additions:

UltraHDR: Single image file displays optimally on both SDR and HDR screens via Google’s libultrahdr
Camera RAW: 28% faster and 40% less memory than ImageMagick when resizing CR2/NEF files
BigTIFF output: Support for TIFF files > 4GB

npm install sharp

Caching Layer

Technology	Use Case	Pros	Cons	Recommendation
Redis	Application cache, locks	Fast, pub/sub, clustering	Memory cost	✅ Primary cache
Memcached	Simple KV cache	Faster for simple gets	No persistence, limited data types	Skip
Hazelcast	Distributed cache	Java ecosystem, compute	Complexity	Skip for Node.js

Choice: Redis (6+ with Redis Cluster for HA)

npm install ioredis

Storage Clients

Provider	Library	Notes
AWS S3	`@aws-sdk/client-s3`	Official v3 SDK
Google Cloud Storage	`@google-cloud/storage`	Official SDK
Azure Blob	`@azure/storage-blob`	Official SDK
MinIO (on-prem)	`minio` or S3 SDK	S3-compatible

npm install @aws-sdk/client-s3 @google-cloud/storage @azure/storage-blob minio

Message Queue

Provider	Library	Use Case
AWS SQS	`@aws-sdk/client-sqs`	AWS deployments
GCP Pub/Sub	`@google-cloud/pubsub`	GCP deployments
Azure Service Bus	`@azure/service-bus`	Azure deployments
RabbitMQ	`amqplib`	On-premise, multi-cloud

Choice: Provider-specific for cloud, RabbitMQ for on-premise

npm install amqplib

Web Framework

Framework	Pros	Cons	Recommendation
Fastify	Fast, low overhead, TypeScript support	Less mature ecosystem	✅ Recommended
Express	Mature, large ecosystem	Slower, callback-based	Acceptable
Koa	Modern, async/await	Smaller ecosystem	Acceptable

Choice: Fastify for performance

npm install fastify @fastify/multipart @fastify/cors

Database

Technology	Pros	Cons	Recommendation
PostgreSQL	JSONB, full-text search, reliability	Complex clustering	✅ Recommended
MySQL	Mature, simple	Limited JSON support	Acceptable
MongoDB	Flexible schema	Tenancy complexity	Not recommended

Choice: PostgreSQL 15+ with JSONB for policies

npm install pg

URL Signing

Library	Algorithm	Recommendation
Node crypto (built-in)	HMAC-SHA256	✅ Recommended
`jsonwebtoken`	JWT (HMAC/RSA)	Use for JWT tokens
`tweetnacl`	Ed25519	Use for EdDSA

Choice: Built-in crypto module for HMAC-SHA256 signatures

1
import crypto from "crypto"

Distributed Locking

Technology	Pros	Cons	Recommendation
Redlock (Redis)	Simple, Redis-based	No fencing tokens, clock skew risk	✅ For efficiency only
etcd	Linearizable, fencing tokens	Separate service, higher latency	Safety-critical use
ZooKeeper	Strong consistency, mature	Complex operations, JVM dependency	Safety-critical use
Database locks	Simple, transactional	Contention, less scalable	Development only

Choice: Redlock with Redis for transform deduplication (efficiency), not for safety-critical mutual exclusion.

Why Redlock is sufficient here:

The image service uses locks to prevent duplicate work, not to prevent data corruption. If two workers race past the lock:

Both fetch the original image
Both apply the same transformation (deterministic)
Both attempt to store the result
One wins (upsert semantics), the other’s write is a no-op

This is inefficient (wasted compute) but not incorrect. The content-addressed storage ensures idempotency.

Why Redlock is insufficient for safety-critical scenarios (per Martin Kleppmann’s analysis):

No fencing tokens: Cannot generate monotonically increasing tokens to detect stale lock holders after process pauses/GC stops
Timing assumptions: Depends on bounded network delays and clock accuracy that frequently break in practice
Clock vulnerabilities: Uses gettimeofday() (not monotonic); NTP adjustments can cause time jumps

Redis’s current recommendation (from official docs): Use N=5 Redis masters with majority voting, implement fencing tokens separately if correctness matters, monitor clock drift.

npm install redlock

High-Level Architecture

System Diagram

Request Flow: Public Image

Request Flow: Private Image

Edge Authentication Patterns

Modern CDNs support signature validation at the edge, eliminating origin round-trips for private content. This section covers three deployment patterns with different security/complexity tradeoffs.

Pattern 1: Origin-based validation (simplest)

All requests hit the origin, which validates signatures. The CDN caches responses keyed by the full URL including signature parameters.

Pros: Simple deployment, no edge configuration
Cons: Every unique signed URL generates a cache miss, origin must handle all validation
When to use: Low traffic, simple deployments, or when CDN doesn’t support edge compute

Pattern 2: Edge signature validation with normalized cache keys

The edge validates the signature, then strips signature parameters before checking the cache. This allows multiple signed URLs for the same content to share a single cache entry.


5 collapsed lines
1
// Cloudflare Worker for edge signature validation
2
// Workers run on V8 isolates: ~1/10th memory of Node.js, <5ms cold start
3

4
export default {
5
  async fetch(request, env) {
6
    const url = new URL(request.url)
7

8
    // Extract and validate signature
9
    const sig = url.searchParams.get("sig")
10
    const exp = url.searchParams.get("exp")
11
    const kid = url.searchParams.get("kid")
12

13
    if (!sig || !exp || !kid) {
14
      return new Response("Missing signature", { status: 401 })
15
    }
16

17
    // Check expiration
18
    if (Date.now() / 1000 > parseInt(exp)) {
19
      return new Response("Signature expired", { status: 401 })
20
    }
21

22
    // Validate HMAC (key fetched from Workers KV or secrets)
23
    const key = await env.SIGNING_KEYS.get(kid)
24
    if (!key) {
16 collapsed lines
25
      return new Response("Invalid key", { status: 401 })
26
    }
27

28
    // Reconstruct canonical string and verify
29
    const canonical = createCanonicalString(url.pathname, exp, url.hostname)
30
    const expected = await computeHmac(key, canonical)
31

32
    if (!timingSafeEqual(sig, expected)) {
33
      return new Response("Invalid signature", { status: 401 })
34
    }
35

36
    // Strip signature params for cache key normalization
37
    url.searchParams.delete("sig")
38
    url.searchParams.delete("exp")
39
    url.searchParams.delete("kid")
40

41
    // Fetch from origin/cache with normalized URL
42
    return fetch(url.toString(), {
43
      cf: { cacheKey: url.toString() }, // Normalized cache key
44
    })
45
  },
46
}

Pros: High cache efficiency, reduced origin load, sub-5ms auth latency
Cons: Requires edge compute (CloudFlare Workers, CloudFront Functions, Fastly Compute)
When to use: High-traffic private content, latency-sensitive applications

Pattern 3: JWT tokens with edge validation

Use JWT (JSON Web Token) instead of HMAC signatures. The edge can decode and validate JWTs without origin contact, and the token can carry claims (user ID, tenant ID, allowed operations).

Pros: Self-contained tokens with embedded claims, standard format
Cons: Larger URLs, no revocation without short expiry or edge-stored blocklist
When to use: When tokens need to carry user context, or when integrating with existing JWT infrastructure

CDN-specific implementation notes:

CDN	Edge Auth Capability	Cache Key Normalization
CloudFlare	Workers (full JS), Rules (limited)	`cf.cacheKey` in Workers
CloudFront	Functions (limited JS), Lambda@Edge (full Node.js)	`cache-policy` with query keys
Fastly	Compute@Edge (Rust/JS/Go), VCL	`req.hash` manipulation in VCL
Akamai	EdgeWorkers (JS), Property Manager	Cache ID modification

Data Models

Database Schema


30 collapsed lines
1
-- Organizations (Top-level tenants)
2
CREATE TABLE organizations (
3
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
4
    slug VARCHAR(100) UNIQUE NOT NULL,
5
    name VARCHAR(255) NOT NULL,
6
    status VARCHAR(20) DEFAULT 'active',
7

8
    -- Metadata
9
    created_at TIMESTAMPTZ DEFAULT NOW(),
10
    updated_at TIMESTAMPTZ DEFAULT NOW(),
11
    deleted_at TIMESTAMPTZ NULL
12
);
13

14
-- Tenants (Optional subdivision within org)
15
CREATE TABLE tenants (
16
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
17
    organization_id UUID NOT NULL REFERENCES organizations(id) ON DELETE CASCADE,
18
    slug VARCHAR(100) NOT NULL,
19
    name VARCHAR(255) NOT NULL,
20
    status VARCHAR(20) DEFAULT 'active',
21

22
    -- Metadata
23
    created_at TIMESTAMPTZ DEFAULT NOW(),
24
    updated_at TIMESTAMPTZ DEFAULT NOW(),
25
    deleted_at TIMESTAMPTZ NULL,
26

27
    UNIQUE(organization_id, slug)
28
);
29

30
-- Spaces (Projects within tenant)
31
CREATE TABLE spaces (
32
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
33
    organization_id UUID NOT NULL REFERENCES organizations(id) ON DELETE CASCADE,
34
    tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
35
    slug VARCHAR(100) NOT NULL,
36
    name VARCHAR(255) NOT NULL,
37

38
    -- Default policies (inherit from tenant/org if NULL)
39
    default_access VARCHAR(20) DEFAULT 'private', -- 'public' or 'private'
40

41
    -- Metadata
42
    created_at TIMESTAMPTZ DEFAULT NOW(),
43
    updated_at TIMESTAMPTZ DEFAULT NOW(),
44
    deleted_at TIMESTAMPTZ NULL,
45

46
    UNIQUE(tenant_id, slug),
47
    CONSTRAINT valid_access CHECK (default_access IN ('public', 'private'))
48
);
49

50
-- Policies (Hierarchical configuration)
51
CREATE TABLE policies (
52
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
53

54
    -- Scope (org, tenant, or space)
55
    scope_type VARCHAR(20) NOT NULL, -- 'organization', 'tenant', 'space'
56
    scope_id UUID NOT NULL,
57

58
    -- Policy data
59
    key VARCHAR(100) NOT NULL,
60
    value JSONB NOT NULL,
61

62
    -- Metadata
63
    updated_at TIMESTAMPTZ DEFAULT NOW(),
64

65
    UNIQUE(scope_type, scope_id, key),
66
    CONSTRAINT valid_scope_type CHECK (scope_type IN ('organization', 'tenant', 'space'))
67
);
68

69
-- API Keys for authentication
70
CREATE TABLE api_keys (
71
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
72
    organization_id UUID NOT NULL REFERENCES organizations(id) ON DELETE CASCADE,
73
    tenant_id UUID REFERENCES tenants(id) ON DELETE CASCADE,
74

75
    -- Key identity
76
    key_id VARCHAR(50) UNIQUE NOT NULL, -- kid for rotation
77
    name VARCHAR(255) NOT NULL,
78
    secret_hash VARCHAR(255) NOT NULL, -- bcrypt/argon2
79

80
    -- Permissions
81
    scopes TEXT[] DEFAULT ARRAY['image:read']::TEXT[],
82

83
    -- Status
84
    status VARCHAR(20) DEFAULT 'active',
85
    expires_at TIMESTAMPTZ NULL,
86
    last_used_at TIMESTAMPTZ NULL,
87

88
    -- Metadata
89
    created_at TIMESTAMPTZ DEFAULT NOW(),
90
    rotated_at TIMESTAMPTZ NULL
91
);
92

93
-- Assets (Original uploaded images)
94
CREATE TABLE assets (
95
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
96
    organization_id UUID NOT NULL REFERENCES organizations(id) ON DELETE CASCADE,
97
    tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE,
98
    space_id UUID NOT NULL REFERENCES spaces(id) ON DELETE CASCADE,
99

100
    -- Versioning
101
    version INTEGER NOT NULL DEFAULT 1,
102

103
    -- File info
104
    filename VARCHAR(500) NOT NULL,
105
    original_filename VARCHAR(500) NOT NULL,
106
    mime_type VARCHAR(100) NOT NULL,
107

108
    -- Storage
109
    storage_provider VARCHAR(50) NOT NULL, -- 'aws', 'gcp', 'azure', 'minio'
31 collapsed lines
110
    storage_key VARCHAR(1000) NOT NULL UNIQUE,
111

112
    -- Content
113
    size_bytes BIGINT NOT NULL,
114
    content_hash VARCHAR(64) NOT NULL, -- SHA-256 for deduplication
115

116
    -- Image metadata
117
    width INTEGER,
118
    height INTEGER,
119
    format VARCHAR(10),
120
    color_space VARCHAR(20),
121
    has_alpha BOOLEAN,
122

123
    -- Organization
124
    tags TEXT[] DEFAULT ARRAY[]::TEXT[],
125
    folder VARCHAR(1000) DEFAULT '/',
126

127
    -- Access control
128
    access_policy VARCHAR(20) NOT NULL DEFAULT 'private',
129

130
    -- EXIF and metadata
131
    exif JSONB,
132

133
    -- Upload info
134
    uploaded_by UUID, -- Reference to user
135
    uploaded_at TIMESTAMPTZ DEFAULT NOW(),
136

137
    -- Metadata
138
    created_at TIMESTAMPTZ DEFAULT NOW(),
139
    updated_at TIMESTAMPTZ DEFAULT NOW(),
140
    deleted_at TIMESTAMPTZ NULL,
141

142
    CONSTRAINT valid_access_policy CHECK (access_policy IN ('public', 'private'))
143
);
144

145
-- Transformation Presets (Named transformation templates)
146
CREATE TABLE presets (
147
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
148
    organization_id UUID NOT NULL REFERENCES organizations(id) ON DELETE CASCADE,
149
    tenant_id UUID REFERENCES tenants(id) ON DELETE CASCADE,
150
    space_id UUID REFERENCES spaces(id) ON DELETE CASCADE,
151

152
    -- Preset identity
153
    name VARCHAR(100) NOT NULL,
154
    slug VARCHAR(100) NOT NULL,
155
    description TEXT,
156

157
    -- Transformation definition
158
    operations JSONB NOT NULL,
159
    /*
160
    Example:
161
    {
162
        "resize": {"width": 800, "height": 600, "fit": "cover"},
163
        "format": "webp",
164
        "quality": 85,
165
        "sharpen": 1
166
    }
167
    */
168

169
    -- Auto-generation rules
170
    auto_generate BOOLEAN DEFAULT false,
171
    match_tags TEXT[] DEFAULT NULL,
172
    match_folders TEXT[] DEFAULT NULL,
173

174
    -- Metadata
36 collapsed lines
175
    created_at TIMESTAMPTZ DEFAULT NOW(),
176
    updated_at TIMESTAMPTZ DEFAULT NOW(),
177

178
    UNIQUE(organization_id, tenant_id, space_id, slug)
179
);
180

181
-- Derived Assets (Transformed images)
182
CREATE TABLE derived_assets (
183
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
184
    asset_id UUID NOT NULL REFERENCES assets(id) ON DELETE CASCADE,
185

186
    -- Transformation identity
187
    operations_canonical VARCHAR(500) NOT NULL, -- Canonical string representation
188
    operations_hash VARCHAR(64) NOT NULL, -- SHA-256 of (canonical_ops + asset.content_hash)
189

190
    -- Output
191
    output_format VARCHAR(10) NOT NULL,
192

193
    -- Storage
194
    storage_provider VARCHAR(50) NOT NULL,
195
    storage_key VARCHAR(1000) NOT NULL UNIQUE,
196

197
    -- Content
198
    size_bytes BIGINT NOT NULL,
199
    content_hash VARCHAR(64) NOT NULL,
200

201
    -- Image metadata
202
    width INTEGER,
203
    height INTEGER,
204

205
    -- Performance tracking
206
    processing_time_ms INTEGER,
207
    access_count BIGINT DEFAULT 0,
208
    last_accessed_at TIMESTAMPTZ,
209

210
    -- Cache tier for lifecycle
211
    cache_tier VARCHAR(20) DEFAULT 'hot', -- 'hot', 'warm', 'cold'
212

213
    -- Metadata
214
    created_at TIMESTAMPTZ DEFAULT NOW(),
215
    updated_at TIMESTAMPTZ DEFAULT NOW(),
216

217
    UNIQUE(asset_id, operations_hash)
218
);
219

220
-- Transform Cache (Fast lookup for existing transforms)
221
CREATE TABLE transform_cache (
222
    asset_id UUID NOT NULL REFERENCES assets(id) ON DELETE CASCADE,
223
    operations_hash VARCHAR(64) NOT NULL,
224
    derived_asset_id UUID NOT NULL REFERENCES derived_assets(id) ON DELETE CASCADE,
225

226
    -- Metadata
227
    created_at TIMESTAMPTZ DEFAULT NOW(),
228

229
    PRIMARY KEY(asset_id, operations_hash)
230
);
231

232
-- Usage tracking (for cost and analytics)
233
CREATE TABLE usage_metrics (
234
    id BIGSERIAL PRIMARY KEY,
235
    date DATE NOT NULL,
236
    organization_id UUID NOT NULL,
237
    tenant_id UUID NOT NULL,
238
    space_id UUID NOT NULL,
239

240
    -- Metrics
241
    request_count BIGINT DEFAULT 0,
242
    bandwidth_bytes BIGINT DEFAULT 0,
243
    storage_bytes BIGINT DEFAULT 0,
244
    transform_count BIGINT DEFAULT 0,
245
    transform_cpu_ms BIGINT DEFAULT 0,
246

247
    UNIQUE(date, organization_id, tenant_id, space_id)
248
);
249

250
-- Audit logs
251
CREATE TABLE audit_logs (
252
    id BIGSERIAL PRIMARY KEY,
253
    organization_id UUID NOT NULL,
254
    tenant_id UUID,
255

256
    -- Actor
257
    actor_type VARCHAR(20) NOT NULL, -- 'user', 'api_key', 'system'
258
    actor_id UUID NOT NULL,
259

260
    -- Action
261
    action VARCHAR(100) NOT NULL, -- 'asset.upload', 'asset.delete', etc.
262
    resource_type VARCHAR(50) NOT NULL,
263
    resource_id UUID,
264

265
    -- Context
266
    metadata JSONB,
267
    ip_address INET,
268
    user_agent TEXT,
269

270
    -- Timestamp
271
    created_at TIMESTAMPTZ DEFAULT NOW()
272
);
273

274
-- Indexes for performance
275
CREATE INDEX idx_tenants_org ON tenants(organization_id);
276
CREATE INDEX idx_spaces_tenant ON spaces(tenant_id);
277
CREATE INDEX idx_spaces_org ON spaces(organization_id);
278
CREATE INDEX idx_policies_scope ON policies(scope_type, scope_id);
279

280
CREATE INDEX idx_assets_space ON assets(space_id) WHERE deleted_at IS NULL;
281
CREATE INDEX idx_assets_org ON assets(organization_id) WHERE deleted_at IS NULL;
282
CREATE INDEX idx_assets_hash ON assets(content_hash);
283
CREATE INDEX idx_assets_tags ON assets USING GIN(tags);
284
CREATE INDEX idx_assets_folder ON assets(folder);
285

286
CREATE INDEX idx_derived_asset ON derived_assets(asset_id);
287
CREATE INDEX idx_derived_hash ON derived_assets(operations_hash);
288
CREATE INDEX idx_derived_tier ON derived_assets(cache_tier);
289
CREATE INDEX idx_derived_access ON derived_assets(last_accessed_at);
290

291
CREATE INDEX idx_usage_date_org ON usage_metrics(date, organization_id);
292
CREATE INDEX idx_audit_org_time ON audit_logs(organization_id, created_at);

URL Design

URL Structure Philosophy

URLs should be:

Self-describing: Clearly indicate access mode (public vs private)
Cacheable: CDN-friendly with stable cache keys
Deterministic: Same transformation = same URL
Human-readable: Easy to understand and debug

URL Patterns

Public Images

1
Format:
2
https://{cdn-domain}/v1/pub/{org}/{tenant}/{space}/img/{asset-id}/v{version}/{operations}.{ext}
3

4
Examples:
5
- Original:
6
  https://img.example.com/v1/pub/acme/website/marketing/img/01JBXYZ.../v1/original.jpg
7

8
- Resized:
9
  https://img.example.com/v1/pub/acme/website/marketing/img/01JBXYZ.../v1/w_800-h_600-f_cover.webp
10

11
- With preset:
12
  https://img.example.com/v1/pub/acme/website/marketing/img/01JBXYZ.../v1/preset_thumbnail.webp
13

14
- Format auto-negotiation:
15
  https://img.example.com/v1/pub/acme/website/marketing/img/01JBXYZ.../v1/w_1200-f_auto-q_auto.jpg

Private Images (Base URL)

1
Format:
2
https://{cdn-domain}/v1/priv/{org}/{tenant}/{space}/img/{asset-id}/v{version}/{operations}.{ext}
3

4
Example:
5
https://img.example.com/v1/priv/acme/internal/confidential/img/01JBXYZ.../v1/w_800-h_600.jpg

Private Images (Signed URL)

1
Format:
2
{base-url}?sig={signature}&exp={unix-timestamp}&kid={key-id}
3

4
Example:
5
https://img.example.com/v1/priv/acme/internal/confidential/img/01JBXYZ.../v1/w_800-h_600.jpg?sig=dGVzdHNpZ25hdHVyZQ&exp=1731427200&kid=key_123
6

7
Components:
8
- sig: Base64URL-encoded HMAC-SHA256 signature
9
- exp: Unix timestamp (seconds) when URL expires
10
- kid: Key ID for signature rotation support

Transformation Parameters

Operations are encoded as hyphen-separated key-value pairs:

1
Parameter Format: {key}_{value}
2

3
Supported Parameters:
4
- w_{pixels}         : Width (e.g., w_800)
5
- h_{pixels}         : Height (e.g., h_600)
6
- f_{mode}           : Fit mode - cover, contain, fill, inside, outside, pad
7
- q_{quality}        : Quality 1-100 or 'auto' (e.g., q_85)
8
- fmt_{format}       : Format - jpg, png, webp, avif, gif, 'auto'
9
- r_{degrees}        : Rotation - 90, 180, 270
10
- g_{gravity}        : Crop gravity - center, north, south, east, west, etc.
11
- b_{color}          : Background color for pad (e.g., b_ffffff)
12
- blur_{radius}      : Blur radius 0.3-1000 (e.g., blur_5)
13
- sharpen_{amount}   : Sharpen amount 0-10 (e.g., sharpen_2)
14
- bw                 : Convert to black & white (grayscale)
15
- flip               : Flip horizontal
16
- flop               : Flip vertical
17
- preset_{name}      : Apply named preset
18

19
Examples:
20
- w_800-h_600-f_cover-q_85
21
- w_400-h_400-f_contain-fmt_webp
22
- preset_thumbnail
23
- w_1200-sharpen_2-fmt_webp-q_90
24
- w_800-h_600-f_pad-b_ffffff

Operation Canonicalization

To ensure cache hit consistency, operations must be canonicalized:

1
/**
2
 * Canonicalizes transformation operations to ensure consistent cache keys
3
 */
4
function canonicalizeOperations(opsString) {
5
  const ops = parseOperations(opsString)
6

7
  // Apply defaults
8
  if (!ops.quality && ops.format !== "png") ops.quality = 85
9
  if (!ops.fit && (ops.width || ops.height)) ops.fit = "cover"
10

11
  // Normalize values
12
  if (ops.quality) ops.quality = Math.max(1, Math.min(100, ops.quality))
13
  if (ops.width) ops.width = Math.floor(ops.width)
14
  if (ops.height) ops.height = Math.floor(ops.height)
15

16
  // Canonical order: fmt, w, h, f, g, b, q, r, sharpen, blur, bw, flip, flop
17
  const order = ["fmt", "w", "h", "f", "g", "b", "q", "r", "sharpen", "blur", "bw", "flip", "flop"]
18

19
  return order
20
    .filter((key) => ops[key] !== undefined)
21
    .map((key) => `${key}_${ops[key]}`)
22
    .join("-")
23
}

Core Request Flows

Upload Flow with Auto-Presets

Synchronous Transform Flow (Guaranteed 200)

Image Processing Pipeline

Processing Implementation


17 collapsed lines
1
import sharp from "sharp"
2
import crypto from "crypto"
3

4
/**
5
 * Transform Engine - Core image processing service
6
 */
7
class TransformEngine {
8
  constructor(storage, registry, cache, lockManager) {
9
    this.storage = storage
10
    this.registry = registry
11
    this.cache = cache
12
    this.lockManager = lockManager
13
  }
14

15
  /**
16
   * Process image transformation with deduplication
17
   */
18
  async transform(assetId, operations, acceptHeader) {
19
    // 1. Canonicalize operations
20
    const canonicalOps = this.canonicalizeOps(operations)
21
    const outputFormat = this.determineFormat(operations.format, acceptHeader)
22

23
    // 2. Generate transformation hash (content-addressed)
24
    const asset = await this.registry.getAsset(assetId)
25
    const opsHash = this.generateOpsHash(canonicalOps, asset.contentHash, outputFormat)
26

27
    // 3. Check multi-layer cache
28
    const cacheKey = `transform:${assetId}:${opsHash}`
29

30
    // Layer 1: Redis cache
31
    const cached = await this.cache.get(cacheKey)
32
    if (cached) {
33
      return {
34
        buffer: Buffer.from(cached.buffer, "base64"),
35
        contentType: cached.contentType,
36
        fromCache: "redis",
37
      }
38
    }
39

40
    // Layer 2: Database + Storage
41
    const derived = await this.registry.getDerivedAsset(assetId, opsHash)
42
    if (derived) {
43
      const buffer = await this.storage.get(derived.storageKey)
44

45
      // Populate Redis cache
46
      await this.cache.set(
47
        cacheKey,
48
        {
49
          buffer: buffer.toString("base64"),
50
          contentType: `image/${derived.outputFormat}`,
51
        },
52
        3600,
53
      ) // 1 hour TTL
54

55
      // Update access metrics
56
      await this.registry.incrementAccessCount(derived.id)
57

58
      return {
59
        buffer,
60
        contentType: `image/${derived.outputFormat}`,
61
        fromCache: "storage",
62
      }
63
    }
64

65
    // Layer 3: Process new transformation (with distributed locking)
66
    const lockKey = `lock:transform:${assetId}:${opsHash}`
67
    const lock = await this.lockManager.acquire(lockKey, 60000) // 60s TTL
68

69
    try {
70
      // Double-check after acquiring lock
71
      const doubleCheck = await this.registry.getDerivedAsset(assetId, opsHash)
72
      if (doubleCheck) {
73
        const buffer = await this.storage.get(doubleCheck.storageKey)
74
        return {
75
          buffer,
76
          contentType: `image/${doubleCheck.outputFormat}`,
77
          fromCache: "concurrent",
78
        }
79
      }
80

81
      // Process transformation
82
      const startTime = Date.now()
83

84
      // Fetch original
85
      const originalBuffer = await this.storage.get(asset.storageKey)
86

87
      // Apply transformations
88
      const processedBuffer = await this.applyTransformations(originalBuffer, canonicalOps, outputFormat)
89

90
      const processingTime = Date.now() - startTime
91

92
      // Get metadata of processed image
93
      const metadata = await sharp(processedBuffer).metadata()
94

95
      // Generate storage key
96
      const storageKey = `derived/${asset.organizationId}/${asset.tenantId}/${asset.spaceId}/${assetId}/v${asset.version}/${opsHash}.${outputFormat}`
97

98
      // Store processed image
99
      await this.storage.put(storageKey, processedBuffer, `image/${outputFormat}`)
100

101
      // Compute content hash
102
      const contentHash = crypto.createHash("sha256").update(processedBuffer).digest("hex")
103

104
      // Save to database
105
      const derivedAsset = await this.registry.createDerivedAsset({
106
        assetId,
107
        operationsCanonical: canonicalOps,
108
        operationsHash: opsHash,
109
        outputFormat,
110
        storageProvider: this.storage.provider,
111
        storageKey,
112
        sizeBytes: processedBuffer.length,
113
        contentHash,
114
        width: metadata.width,
115
        height: metadata.height,
116
        processingTimeMs: processingTime,
117
      })
118

119
      // Update transform cache index
120
      await this.registry.cacheTransform(assetId, opsHash, derivedAsset.id)
121

122
      // Populate Redis cache
123
      await this.cache.set(
124
        cacheKey,
125
        {
126
          buffer: processedBuffer.toString("base64"),
127
          contentType: `image/${outputFormat}`,
128
        },
129
        3600,
130
      )
131

132
      return {
133
        buffer: processedBuffer,
134
        contentType: `image/${outputFormat}`,
135
        fromCache: "none",
136
        processingTime,
137
      }
138
    } finally {
139
      await lock.release()
140
    }
141
  }
142

143
  /**
144
   * Apply transformations using Sharp
145
   */
146
  async applyTransformations(inputBuffer, operations, outputFormat) {
147
    let pipeline = sharp(inputBuffer)
148

149
    // Rotation
150
    if (operations.rotation) {
151
      pipeline = pipeline.rotate(operations.rotation)
152
    }
153

154
    // Flip/Flop
155
    if (operations.flip) {
156
      pipeline = pipeline.flip()
157
    }
158
    if (operations.flop) {
159
      pipeline = pipeline.flop()
160
    }
161

162
    // Resize
163
    if (operations.width || operations.height) {
164
      const resizeOptions = {
165
        width: operations.width,
166
        height: operations.height,
167
        fit: operations.fit || "cover",
168
        position: operations.gravity || "centre",
169
        withoutEnlargement: true,
170
      }
171

172
      // Background for 'pad' fit
173
      if (operations.fit === "pad" && operations.background) {
174
        resizeOptions.background = this.parseColor(operations.background)
175
      }
176

177
      pipeline = pipeline.resize(resizeOptions)
178
    }
179

180
    // Effects
181
    if (operations.blur) {
182
      pipeline = pipeline.blur(operations.blur)
183
    }
184

185
    if (operations.sharpen) {
186
      pipeline = pipeline.sharpen(operations.sharpen)
187
    }
188

189
    if (operations.grayscale) {
190
      pipeline = pipeline.grayscale()
191
    }
192

193
    // Format conversion and quality
194
    const quality = operations.quality === "auto" ? this.getAutoQuality(outputFormat) : operations.quality || 85
195

196
    switch (outputFormat) {
197
      case "jpg":
198
      case "jpeg":
199
        pipeline = pipeline.jpeg({
200
          quality,
201
          mozjpeg: true, // Better compression
202
        })
203
        break
204

205
      case "png":
206
        pipeline = pipeline.png({
207
          quality,
208
          compressionLevel: 9,
209
          adaptiveFiltering: true,
210
        })
211
        break
212

213
      case "webp":
214
        pipeline = pipeline.webp({
215
          quality,
216
          effort: 6, // Compression effort (0-6)
217
        })
218
        break
219

220
      case "avif":
221
        pipeline = pipeline.avif({
222
          quality,
223
          effort: 6,
224
        })
225
        break
226

227
      case "gif":
228
        pipeline = pipeline.gif()
229
        break
230
    }
231

232
    return await pipeline.toBuffer()
233
  }
234

235
  /**
236
   * Determine output format based on operations and Accept header
237
   *
238
   * Format selection priority (as of 2025):
239
   * - AVIF: 94.89% browser support, ~50% smaller than JPEG, 20-25% smaller than WebP
240
   * - WebP: 95.93% browser support, 25-34% smaller than JPEG
241
   * - JPEG: Universal fallback
242
   *
243
   * Note: JPEG XL gained Chrome support in Jan 2026 but adoption is still emerging.
244
   * Consider adding once browser support exceeds 80%.
245
   */
246
  determineFormat(requestedFormat, acceptHeader) {
247
    if (requestedFormat && requestedFormat !== "auto") {
248
      return requestedFormat
249
    }
250

251
    // Format negotiation based on Accept header
252
    const accept = (acceptHeader || "").toLowerCase()
253

254
    if (accept.includes("image/avif")) {
255
      return "avif" // Best compression: ~50% smaller than JPEG
256
    }
257

258
    if (accept.includes("image/webp")) {
259
      return "webp" // Good compression: 25-34% smaller than JPEG, slightly wider support
260
    }
261

262
    return "jpg" // Fallback
263
  }
264

265
  /**
266
   * Get automatic quality based on format
267
   *
268
   * Quality values are calibrated to produce visually similar output across formats.
269
   * AVIF and WebP compress more efficiently, so they need lower quality values
270
   * to achieve similar file sizes with equivalent visual quality.
271
   *
272
   * Real-world example (2000×2000 product photo):
273
   * - JPEG q=80: ~540 KB
274
   * - WebP q=85: ~350 KB (35% smaller)
275
   * - AVIF q=75 (CQ 28): ~210 KB (61% smaller)
276
   */
277
  getAutoQuality(format) {
278
    const qualityMap = {
279
      avif: 75, // AVIF compresses very well; q=75 ≈ JPEG q=85 visually
280
      webp: 80, // WebP compresses well; q=80 ≈ JPEG q=85 visually
281
      jpg: 85, // JPEG baseline quality
282
      jpeg: 85,
283
      png: 90, // PNG quality affects compression, not visual fidelity (lossless)
284
    }
285

286
    return qualityMap[format] || 85
287
  }
288

289
  /**
290
   * Generate deterministic hash for transformation
291
   */
292
  generateOpsHash(canonicalOps, assetContentHash, outputFormat) {
293
    const payload = `${canonicalOps};${assetContentHash};fmt=${outputFormat}`
294
    return crypto.createHash("sha256").update(payload).digest("hex")
295
  }
296

297
  /**
298
   * Parse color hex string to RGB object
299
   */
300
  parseColor(hex) {
301
    hex = hex.replace("#", "")
302
    return {
303
      r: parseInt(hex.substr(0, 2), 16),
304
      g: parseInt(hex.substr(2, 2), 16),
305
      b: parseInt(hex.substr(4, 2), 16),
306
    }
307
  }
308

309
  /**
310
   * Canonicalize operations
311
   */
312
  canonicalizeOps(ops) {
313
    // Implementation details...
314
    // Return canonical string like "w_800-h_600-f_cover-q_85-fmt_webp"
315
  }
316
}
317

318
export default TransformEngine

Distributed Locking


18 collapsed lines
1
import Redlock from "redlock"
2
import Redis from "ioredis"
3

4
/**
5
 * Distributed lock manager using Redlock algorithm
6
 *
7
 * IMPORTANT: This lock manager is designed for EFFICIENCY optimization, not
8
 * CORRECTNESS guarantees. Redlock cannot provide fencing tokens, so:
9
 *
10
 * - SAFE: Preventing duplicate transforms (if lock fails, we waste compute but don't corrupt data)
11
 * - UNSAFE: Protecting financial transactions, inventory updates, or any operation where
12
 *   concurrent execution could cause data inconsistency
13
 *
14
 * For safety-critical mutual exclusion, use etcd (Raft consensus) or ZooKeeper (ZAB protocol).
15
 * See: https://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html
16
 */
17
class LockManager {
18
  constructor(redisClients) {
19
    // Initialize Redlock with multiple Redis instances (N=5 recommended for production)
20
    this.redlock = new Redlock(redisClients, {
21
      driftFactor: 0.01,
22
      retryCount: 10,
23
      retryDelay: 200,
24
      retryJitter: 200,
25
      automaticExtensionThreshold: 500,
26
    })
27
  }
28

29
  /**
30
   * Acquire distributed lock
31
   */
32
  async acquire(key, ttl = 30000) {
33
    try {
34
      const lock = await this.redlock.acquire([`lock:${key}`], ttl)
35
      return lock
36
    } catch (error) {
37
      throw new Error(`Failed to acquire lock for ${key}: ${error.message}`)
38
    }
39
  }
40

41
  /**
42
   * Try to acquire lock without waiting
43
   */
44
  async tryAcquire(key, ttl = 30000) {
45
    try {
46
      return await this.redlock.acquire([`lock:${key}`], ttl)
47
    } catch (error) {
48
      return null // Lock not acquired
49
    }
50
  }
51
}
52

53
// Usage
54
const redis1 = new Redis({ host: "redis-1" })
55
const redis2 = new Redis({ host: "redis-2" })
56
const redis3 = new Redis({ host: "redis-3" })
57

58
const lockManager = new LockManager([redis1, redis2, redis3])
59

60
export default LockManager

Security & Access Control

Signed URL Implementation


7 collapsed lines
1
import crypto from "crypto"
2

3
/**
4
 * Signature Service - Generate and verify signed URLs
5
 */
6
class SignatureService {
7
  constructor(registry) {
8
    this.registry = registry
9
  }
10

11
  /**
12
   * Generate signed URL for private images
13
   */
14
  async generateSignedUrl(baseUrl, orgId, tenantId, ttl = null) {
15
    // Get signing key for tenant/org
16
    const apiKey = await this.registry.getSigningKey(orgId, tenantId)
17

18
    // Get effective policy for TTL
19
    const policy = await this.registry.getEffectivePolicy(orgId, tenantId)
20
    const defaultTtl = policy.signed_url_ttl_default_seconds || 3600
21
    const maxTtl = policy.signed_url_ttl_max_seconds || 86400
22

23
    // Calculate expiry
24
    const requestedTtl = ttl || defaultTtl
25
    const effectiveTtl = Math.min(requestedTtl, maxTtl)
26
    const expiresAt = Math.floor(Date.now() / 1000) + effectiveTtl
27

28
    // Create canonical string for signing
29
    const url = new URL(baseUrl)
30
    const canonicalString = this.createCanonicalString(url.pathname, expiresAt, url.hostname, tenantId)
31

32
    // Generate HMAC-SHA256 signature
33
    const signature = crypto.createHmac("sha256", apiKey.secret).update(canonicalString).digest("base64url") // URL-safe base64
34

35
    // Append signature, expiry, and key ID to URL
36
    url.searchParams.set("sig", signature)
37
    url.searchParams.set("exp", expiresAt.toString())
38
    url.searchParams.set("kid", apiKey.keyId)
39

40
    return {
41
      url: url.toString(),
42
      expiresAt: new Date(expiresAt * 1000),
43
      expiresIn: effectiveTtl,
44
    }
45
  }
46

47
  /**
48
   * Verify signed URL
49
   */
50
  async verifySignedUrl(signedUrl, orgId, tenantId) {
51
    const url = new URL(signedUrl)
52

53
    // Extract signature components
54
    const signature = url.searchParams.get("sig")
55
    const expiresAt = parseInt(url.searchParams.get("exp"))
56
    const keyId = url.searchParams.get("kid")
57

58
    if (!signature || !expiresAt || !keyId) {
59
      return {
60
        valid: false,
61
        error: "Missing signature components",
62
      }
63
    }
64

65
    // Check expiration
66
    const now = Math.floor(Date.now() / 1000)
67
    if (now > expiresAt) {
68
      return {
69
        valid: false,
70
        expired: true,
71
        error: "Signature expired",
72
      }
73
    }
74

75
    // Get signing key
76
    const apiKey = await this.registry.getApiKeyById(keyId)
77
    if (!apiKey || apiKey.status !== "active") {
78
      return {
79
        valid: false,
80
        error: "Invalid key ID",
81
      }
82
    }
83

84
    // Verify tenant/org ownership
85
    if (apiKey.organizationId !== orgId || apiKey.tenantId !== tenantId) {
86
      return {
87
        valid: false,
88
        error: "Key does not match tenant",
89
      }
90
    }
91

92
    // Reconstruct canonical string
93
    url.searchParams.delete("sig")
94
    url.searchParams.delete("exp")
95
    url.searchParams.delete("kid")
96

97
    const canonicalString = this.createCanonicalString(url.pathname, expiresAt, url.hostname, tenantId)
98

99
    // Compute expected signature
100
    const expectedSignature = crypto.createHmac("sha256", apiKey.secret).update(canonicalString).digest("base64url")
101

102
    // Constant-time comparison to prevent timing attacks
103
    const valid = crypto.timingSafeEqual(Buffer.from(signature), Buffer.from(expectedSignature))
104

105
    return {
106
      valid,
107
      error: valid ? null : "Invalid signature",
108
    }
109
  }
110

111
  /**
112
   * Create canonical string for signing
113
   */
114
  createCanonicalString(pathname, expiresAt, hostname, tenantId) {
115
    return ["GET", pathname, expiresAt, hostname, tenantId].join("\n")
116
  }
117

118
  /**
119
   * Rotate signing keys
120
   */
121
  async rotateSigningKey(orgId, tenantId) {
122
    // Generate new secret
123
    const newSecret = crypto.randomBytes(32).toString("hex")
124
    const newKeyId = `key_${Date.now()}_${crypto.randomBytes(8).toString("hex")}`
125

126
    // Create new key
127
    const newKey = await this.registry.createApiKey({
128
      organizationId: orgId,
129
      tenantId,
130
      keyId: newKeyId,
131
      name: `Signing Key (rotated ${new Date().toISOString()})`,
132
      secret: newSecret,
133
      scopes: ["signing"],
134
    })
135

136
    // Mark old keys for deprecation (keep valid for grace period)
137
    await this.registry.deprecateOldSigningKeys(orgId, tenantId, newKey.id)
138

139
    return newKey
140
  }
141
}
142

143
export default SignatureService

Authentication Middleware


5 collapsed lines
1
import crypto from "crypto"
2

3
/**
4
 * Authentication middleware for Fastify
5
 */
6
class AuthMiddleware {
7
  constructor(registry) {
8
    this.registry = registry
9
  }
10

11
  /**
12
   * API Key authentication
13
   */
14
  async authenticateApiKey(request, reply) {
15
    const apiKey = request.headers["x-api-key"]
16

17
    if (!apiKey) {
18
      return reply.code(401).send({
19
        error: "Unauthorized",
20
        message: "API key required",
21
      })
22
    }
23

24
    // Hash the API key
25
    const keyHash = crypto.createHash("sha256").update(apiKey).digest("hex")
26

27
    // Look up in database
28
    const keyRecord = await this.registry.getApiKeyByHash(keyHash)
29

30
    if (!keyRecord) {
31
      return reply.code(401).send({
32
        error: "Unauthorized",
33
        message: "Invalid API key",
34
      })
35
    }
36

37
    // Check status and expiration
38
    if (keyRecord.status !== "active") {
39
      return reply.code(401).send({
40
        error: "Unauthorized",
41
        message: "API key is inactive",
42
      })
43
    }
44

45
    if (keyRecord.expiresAt && new Date(keyRecord.expiresAt) < new Date()) {
46
      return reply.code(401).send({
47
        error: "Unauthorized",
48
        message: "API key has expired",
49
      })
50
    }
51

52
    // Update last used timestamp (async, don't wait)
53
    this.registry.updateApiKeyLastUsed(keyRecord.id).catch(console.error)
54

55
    // Attach to request context
56
    request.auth = {
57
      organizationId: keyRecord.organizationId,
58
      tenantId: keyRecord.tenantId,
59
      scopes: keyRecord.scopes,
60
      keyId: keyRecord.id,
61
    }
62
  }
63

64
  /**
65
   * Scope-based authorization
66
   */
67
  requireScope(scope) {
68
    return async (request, reply) => {
69
      if (!request.auth) {
70
        return reply.code(401).send({
71
          error: "Unauthorized",
72
          message: "Authentication required",
73
        })
74
      }
75

76
      if (!request.auth.scopes.includes(scope)) {
77
        return reply.code(403).send({
78
          error: "Forbidden",
79
          message: `Required scope: ${scope}`,
80
        })
81
      }
82
    }
83
  }
84

85
  /**
86
   * Tenant boundary check
87
   */
88
  async checkTenantAccess(request, reply, orgId, tenantId, spaceId) {
89
    if (!request.auth) {
90
      return reply.code(401).send({
91
        error: "Unauthorized",
92
      })
93
    }
94

95
    // Check organization match
96
    if (request.auth.organizationId !== orgId) {
97
      return reply.code(403).send({
98
        error: "Forbidden",
99
        message: "Access denied to this organization",
100
      })
101
    }
102

103
    // Check tenant match (if key is tenant-scoped)
104
    if (request.auth.tenantId && request.auth.tenantId !== tenantId) {
105
      return reply.code(403).send({
106
        error: "Forbidden",
107
        message: "Access denied to this tenant",
108
      })
109
    }
110

111
    return true
112
  }
113
}
114

115
export default AuthMiddleware

Rate Limiting


5 collapsed lines
1
import Redis from "ioredis"
2

3
/**
4
 * Rate limiter using sliding window algorithm
5
 */
6
class RateLimiter {
7
  constructor(redis) {
8
    this.redis = redis
9
  }
10

11
  /**
12
   * Check and enforce rate limit
13
   */
14
  async checkLimit(identifier, limit, windowSeconds) {
15
    const key = `ratelimit:${identifier}`
16
    const now = Date.now()
17
    const windowStart = now - windowSeconds * 1000
18

19
    // Use Redis pipeline for atomicity
20
    const pipeline = this.redis.pipeline()
21

22
    // Remove old entries outside the window
23
    pipeline.zremrangebyscore(key, "-inf", windowStart)
24

25
    // Count requests in current window
26
    pipeline.zcard(key)
27

28
    // Add current request
29
    const requestId = `${now}:${Math.random()}`
30
    pipeline.zadd(key, now, requestId)
31

32
    // Set expiry on key
33
    pipeline.expire(key, windowSeconds)
34

35
    const results = await pipeline.exec()
36
    const count = results[1][1] // Result of ZCARD
37

38
    const allowed = count < limit
39
    const remaining = Math.max(0, limit - count - 1)
40

41
    // Calculate reset time
42
    const oldestEntry = await this.redis.zrange(key, 0, 0, "WITHSCORES")
43
    const resetAt =
44
      oldestEntry.length > 0
45
        ? new Date(parseInt(oldestEntry[1]) + windowSeconds * 1000)
46
        : new Date(now + windowSeconds * 1000)
47

48
    return {
49
      allowed,
50
      limit,
51
      remaining,
52
      resetAt,
53
    }
54
  }
55

56
  /**
57
   * Rate limiting middleware for Fastify
58
   */
59
  middleware(getLimitConfig) {
60
    return async (request, reply) => {
61
      // Get limit configuration based on request context
62
      const { identifier, limit, window } = getLimitConfig(request)
63

64
      const result = await this.checkLimit(identifier, limit, window)
65

66
      // Set rate limit headers
67
      reply.header("X-RateLimit-Limit", result.limit)
68
      reply.header("X-RateLimit-Remaining", result.remaining)
69
      reply.header("X-RateLimit-Reset", result.resetAt.toISOString())
70

71
      if (!result.allowed) {
72
        return reply.code(429).send({
73
          error: "Too Many Requests",
74
          message: `Rate limit exceeded. Try again after ${result.resetAt.toISOString()}`,
75
          retryAfter: Math.ceil((result.resetAt.getTime() - Date.now()) / 1000),
76
        })
77
      }
78
    }
79
  }
80
}
81

82
// Usage example
83
const redis = new Redis()
84
const rateLimiter = new RateLimiter(redis)
85

86
// Apply to route
87
app.get(
88
  "/v1/pub/*",
89
  {
90
    preHandler: rateLimiter.middleware((request) => ({
91
      identifier: `org:${request.params.org}`,
92
      limit: 1000, // requests
93
      window: 60, // seconds
94
    })),
95
  },
96
  handler,
97
)
98

99
export default RateLimiter

Deployment Architecture

Kubernetes Deployment

Storage Abstraction Layer


33 collapsed lines
1
/**
2
 * Abstract storage interface
3
 */
4
class StorageAdapter {
5
  async put(key, buffer, contentType, metadata = {}) {
6
    throw new Error("Not implemented")
7
  }
8

9
  async get(key) {
10
    throw new Error("Not implemented")
11
  }
12

13
  async delete(key) {
14
    throw new Error("Not implemented")
15
  }
16

17
  async exists(key) {
18
    throw new Error("Not implemented")
19
  }
20

21
  async getSignedUrl(key, ttl) {
22
    throw new Error("Not implemented")
23
  }
24

25
  get provider() {
26
    throw new Error("Not implemented")
27
  }
28
}
29

30
// AWS S3 Implementation (imports collapsed)
31
// import { S3Client, PutObjectCommand, GetObjectCommand, ... } from "@aws-sdk/client-s3"
32
// import { getSignedUrl } from "@aws-sdk/s3-request-presigner"
33

34
class S3StorageAdapter extends StorageAdapter {
35
  constructor(config) {
36
    super()
37
    this.client = new S3Client({
38
      region: config.region,
39
      credentials: config.credentials,
40
    })
41
    this.bucket = config.bucket
42
  }
43

44
  async put(key, buffer, contentType, metadata = {}) {
45
    const command = new PutObjectCommand({
46
      Bucket: this.bucket,
47
      Key: key,
48
      Body: buffer,
49
      ContentType: contentType,
50
      Metadata: metadata,
51
      ServerSideEncryption: "AES256",
52
    })
53

54
    await this.client.send(command)
55
  }
56

57
  async get(key) {
58
    const command = new GetObjectCommand({
59
      Bucket: this.bucket,
60
      Key: key,
61
    })
62

63
    const response = await this.client.send(command)
64
    const chunks = []
65

66
    for await (const chunk of response.Body) {
67
      chunks.push(chunk)
68
    }
69

70
    return Buffer.concat(chunks)
71
  }
72

73
  async delete(key) {
74
    const command = new DeleteObjectCommand({
75
      Bucket: this.bucket,
76
      Key: key,
77
    })
78

79
    await this.client.send(command)
80
  }
81

82
  async exists(key) {
83
    try {
84
      const command = new HeadObjectCommand({
85
        Bucket: this.bucket,
86
        Key: key,
87
      })
88

89
      await this.client.send(command)
90
      return true
91
    } catch (error) {
92
      if (error.name === "NotFound") {
93
        return false
94
      }
95
      throw error
96
    }
97
  }
98

99
  async getSignedUrl(key, ttl = 3600) {
100
    const command = new GetObjectCommand({
101
      Bucket: this.bucket,
102
      Key: key,
103
    })
104

105
    return await getSignedUrl(this.client, command, { expiresIn: ttl })
106
  }
107

108
  get provider() {
109
    return "aws"
110
  }
111
}
112

113
// Google Cloud Storage Implementation (imports collapsed)
114
// import { Storage } from "@google-cloud/storage"
115

116
class GCSStorageAdapter extends StorageAdapter {
117
  constructor(config) {
118
    super()
119
    this.storage = new Storage({
120
      projectId: config.projectId,
121
      credentials: config.credentials,
122
    })
123
    this.bucket = this.storage.bucket(config.bucket)
124
  }
125

126
  async put(key, buffer, contentType, metadata = {}) {
127
    const file = this.bucket.file(key)
128
    await file.save(buffer, {
129
      contentType,
130
      metadata,
131
      resumable: false,
132
    })
133
  }
134

135
  async get(key) {
136
    const file = this.bucket.file(key)
137
    const [contents] = await file.download()
138
    return contents
139
  }
140

141
  async delete(key) {
142
    const file = this.bucket.file(key)
143
    await file.delete()
144
  }
145

146
  async exists(key) {
147
    const file = this.bucket.file(key)
148
    const [exists] = await file.exists()
149
    return exists
150
  }
151

152
  async getSignedUrl(key, ttl = 3600) {
153
    const file = this.bucket.file(key)
154
    const [url] = await file.getSignedUrl({
155
      action: "read",
156
      expires: Date.now() + ttl * 1000,
157
    })
158
    return url
159
  }
160

161
  get provider() {
162
    return "gcp"
163
  }
164
}
165

166
// Azure Blob Storage Implementation (imports collapsed)
167
// import { BlobServiceClient } from "@azure/storage-blob"
168

169
class AzureBlobStorageAdapter extends StorageAdapter {
170
  constructor(config) {
171
    super()
172
    this.blobServiceClient = BlobServiceClient.fromConnectionString(config.connectionString)
173
    this.containerClient = this.blobServiceClient.getContainerClient(config.containerName)
174
  }
175

176
  async put(key, buffer, contentType, metadata = {}) {
177
    const blockBlobClient = this.containerClient.getBlockBlobClient(key)
178
    await blockBlobClient.upload(buffer, buffer.length, {
179
      blobHTTPHeaders: { blobContentType: contentType },
180
      metadata,
181
    })
182
  }
183

184
  async get(key) {
185
    const blobClient = this.containerClient.getBlobClient(key)
186
    const downloadResponse = await blobClient.download()
187

188
    return await this.streamToBuffer(downloadResponse.readableStreamBody)
189
  }
190

191
  async delete(key) {
192
    const blobClient = this.containerClient.getBlobClient(key)
193
    await blobClient.delete()
194
  }
195

196
  async exists(key) {
197
    const blobClient = this.containerClient.getBlobClient(key)
198
    return await blobClient.exists()
199
  }
200

201
  async getSignedUrl(key, ttl = 3600) {
202
    const blobClient = this.containerClient.getBlobClient(key)
203
    const expiresOn = new Date(Date.now() + ttl * 1000)
204

205
    return await blobClient.generateSasUrl({
206
      permissions: "r",
207
      expiresOn,
208
    })
209
  }
210

211
  async streamToBuffer(readableStream) {
212
    return new Promise((resolve, reject) => {
213
      const chunks = []
214
      readableStream.on("data", (chunk) => chunks.push(chunk))
215
      readableStream.on("end", () => resolve(Buffer.concat(chunks)))
216
      readableStream.on("error", reject)
217
    })
218
  }
219

220
  get provider() {
221
    return "azure"
222
  }
223
}
224

225
// MinIO Implementation (S3-compatible for on-premise, imports collapsed)
226
// import * as Minio from "minio"
227

228
class MinIOStorageAdapter extends StorageAdapter {
229
  constructor(config) {
230
    super()
231
    this.client = new Minio.Client({
232
      endPoint: config.endPoint,
233
      port: config.port || 9000,
234
      useSSL: config.useSSL !== false,
235
      accessKey: config.accessKey,
236
      secretKey: config.secretKey,
237
    })
238
    this.bucket = config.bucket
239
  }
240

241
  async put(key, buffer, contentType, metadata = {}) {
242
    await this.client.putObject(this.bucket, key, buffer, buffer.length, {
243
      "Content-Type": contentType,
244
      ...metadata,
245
    })
246
  }
247

248
  async get(key) {
249
    const stream = await this.client.getObject(this.bucket, key)
250

251
    return new Promise((resolve, reject) => {
252
      const chunks = []
253
      stream.on("data", (chunk) => chunks.push(chunk))
254
      stream.on("end", () => resolve(Buffer.concat(chunks)))
255
      stream.on("error", reject)
256
    })
257
  }
258

259
  async delete(key) {
260
    await this.client.removeObject(this.bucket, key)
261
  }
262

263
  async exists(key) {
264
    try {
265
      await this.client.statObject(this.bucket, key)
266
      return true
267
    } catch (error) {
268
      if (error.code === "NotFound") {
269
        return false
270
      }
271
      throw error
272
    }
273
  }
274

275
  async getSignedUrl(key, ttl = 3600) {
276
    return await this.client.presignedGetObject(this.bucket, key, ttl)
277
  }
278

279
  get provider() {
280
    return "minio"
281
  }
282
}
283

284
/**
285
 * Storage Factory
286
 */
287
class StorageFactory {
288
  static create(config) {
289
    switch (config.provider) {
290
      case "aws":
291
      case "s3":
292
        return new S3StorageAdapter(config)
293

294
      case "gcp":
295
      case "gcs":
296
        return new GCSStorageAdapter(config)
297

298
      case "azure":
299
        return new AzureBlobStorageAdapter(config)
300

301
      case "minio":
302
      case "onprem":
303
        return new MinIOStorageAdapter(config)
304

305
      default:
306
        throw new Error(`Unsupported storage provider: ${config.provider}`)
307
    }
308
  }
309
}
310

311
export { StorageAdapter, StorageFactory }

Deployment Configuration

1
# docker-compose.yml for local development
2
version: "3.8"
3

4
services:
5
  # API Gateway
6
  gateway:
7
    build: ./services/gateway
8
    ports:
9
      - "3000:3000"
10
    environment:
11
      NODE_ENV: development
12
      DATABASE_URL: postgresql://postgres:password@postgres:5432/imageservice
13
      REDIS_URL: redis://redis:6379
14
      STORAGE_PROVIDER: minio
15
      MINIO_ENDPOINT: minio
16
      MINIO_ACCESS_KEY: minioadmin
17
      MINIO_SECRET_KEY: minioadmin
18
    depends_on:
19
      - postgres
20
      - redis
20 collapsed lines
21
      - minio
22

23
  # Transform Engine
24
  transform:
25
    build: ./services/transform
26
    deploy:
27
      replicas: 3
28
    environment:
29
      DATABASE_URL: postgresql://postgres:password@postgres:5432/imageservice
30
      REDIS_URL: redis://redis:6379
31
      STORAGE_PROVIDER: minio
32
      MINIO_ENDPOINT: minio
33
      MINIO_ACCESS_KEY: minioadmin
34
      MINIO_SECRET_KEY: minioadmin
35
    depends_on:
36
      - postgres
37
      - redis
38
      - minio
39

40
  # Transform Workers
41
  worker:
42
    build: ./services/worker
43
    deploy:
44
      replicas: 3
45
    environment:
46
      DATABASE_URL: postgresql://postgres:password@postgres:5432/imageservice
47
      RABBITMQ_URL: amqp://rabbitmq:5672
48
      STORAGE_PROVIDER: minio
49
      MINIO_ENDPOINT: minio
50
      MINIO_ACCESS_KEY: minioadmin
51
      MINIO_SECRET_KEY: minioadmin
52
    depends_on:
53
      - postgres
54
      - rabbitmq
55
      - minio
56

57
  # PostgreSQL
58
  postgres:
59
    image: postgres:15-alpine
36 collapsed lines
60
    environment:
61
      POSTGRES_DB: imageservice
62
      POSTGRES_USER: postgres
63
      POSTGRES_PASSWORD: password
64
    volumes:
65
      - postgres-data:/var/lib/postgresql/data
66
    ports:
67
      - "5432:5432"
68

69
  # Redis
70
  redis:
71
    image: redis:7-alpine
72
    command: redis-server --appendonly yes
73
    volumes:
74
      - redis-data:/data
75
    ports:
76
      - "6379:6379"
77

78
  # RabbitMQ
79
  rabbitmq:
80
    image: rabbitmq:3-management-alpine
81
    environment:
82
      RABBITMQ_DEFAULT_USER: admin
83
      RABBITMQ_DEFAULT_PASS: password
84
    ports:
85
      - "5672:5672"
86
      - "15672:15672"
87
    volumes:
88
      - rabbitmq-data:/var/lib/rabbitmq
89

90
  # MinIO (S3-compatible storage)
91
  minio:
92
    image: minio/minio:latest
93
    command: server /data --console-address ":9001"
94
    environment:
95
      MINIO_ROOT_USER: minioadmin
96
      MINIO_ROOT_PASSWORD: minioadmin
97
    ports:
98
      - "9000:9000"
99
      - "9001:9001"
100
    volumes:
101
      - minio-data:/data
102

103
volumes:
104
  postgres-data:
105
  redis-data:
106
  rabbitmq-data:
107
  minio-data:

Cost Optimization

Multi-Layer Caching Strategy

Storage Lifecycle Management


9 collapsed lines
1
/**
2
 * Storage lifecycle manager
3
 */
4
class LifecycleManager {
5
  constructor(registry, storage) {
6
    this.registry = registry
7
    this.storage = storage
8
  }
9

10
  /**
11
   * Move derived assets to cold tier based on access patterns
12
   */
13
  async moveToColdTier() {
14
    const coldThresholdDays = 30
15
    const warmThresholdDays = 7
16

17
    // Find candidates for tiering
18
    const candidates = await this.registry.query(`
19
      SELECT id, storage_key, cache_tier, last_accessed_at, size_bytes
20
      FROM derived_assets
21
      WHERE cache_tier = 'hot'
22
        AND last_accessed_at < NOW() - INTERVAL '${coldThresholdDays} days'
23
        AND deleted_at IS NULL
24
      ORDER BY last_accessed_at ASC
25
      LIMIT 1000
26
    `)
27

28
    for (const asset of candidates.rows) {
29
      try {
30
        // Move to cold storage tier (Glacier Instant Retrieval, Coldline, etc.)
31
        await this.storage.moveToTier(asset.storageKey, "cold")
32

33
        // Update database
34
        await this.registry.updateCacheTier(asset.id, "cold")
35

36
        console.log(`Moved asset ${asset.id} to cold tier`)
37
      } catch (error) {
38
        console.error(`Failed to move asset ${asset.id}:`, error)
39
      }
40
    }
41

42
    // Similar logic for warm tier
43
    const warmCandidates = await this.registry.query(`
44
      SELECT id, storage_key, cache_tier
45
      FROM derived_assets
46
      WHERE cache_tier = 'hot'
47
        AND last_accessed_at < NOW() - INTERVAL '${warmThresholdDays} days'
48
        AND last_accessed_at >= NOW() - INTERVAL '${coldThresholdDays} days'
49
      LIMIT 1000
50
    `)
51

52
    for (const asset of warmCandidates.rows) {
53
      await this.storage.moveToTier(asset.storageKey, "warm")
54
      await this.registry.updateCacheTier(asset.id, "warm")
55
    }
56
  }
57

58
  /**
59
   * Delete unused derived assets
60
   */
61
  async pruneUnused() {
62
    const pruneThresholdDays = 90
63

64
    const unused = await this.registry.query(`
65
      SELECT id, storage_key
66
      FROM derived_assets
67
      WHERE access_count = 0
68
        AND created_at < NOW() - INTERVAL '${pruneThresholdDays} days'
69
      LIMIT 1000
70
    `)
71

72
    for (const asset of unused.rows) {
73
      try {
74
        await this.storage.delete(asset.storageKey)
75
        await this.registry.deleteDerivedAsset(asset.id)
76

77
        console.log(`Pruned unused asset ${asset.id}`)
78
      } catch (error) {
79
        console.error(`Failed to prune asset ${asset.id}:`, error)
80
      }
81
    }
82
  }
83
}

Cost Projection

For a service serving 10 million requests/month:

Component	Without Optimization	With Optimization	Savings
Processing	1M transforms × $0.001	50K transforms × $0.001	95%
Storage	100TB × $0.023	100TB × $0.013 (tiered)	43%
Bandwidth	100TB × $0.09 (origin)	100TB × $0.02 (CDN)	78%
CDN	—	100TB × $0.02	—
Total	$12,300/month	$5,400/month	56%

Key optimizations:

95% CDN hit rate reduces origin bandwidth
Transform deduplication prevents reprocessing
Storage tiering moves cold data to cheaper tiers
Smart caching minimizes processing costs

Monitoring & Operations

Metrics Collection


6 collapsed lines
1
import prometheus from "prom-client"
2

3
/**
4
 * Metrics registry
5
 */
6
class MetricsRegistry {
7
  constructor() {
8
    this.register = new prometheus.Registry()
9

10
    // Default metrics (CPU, memory, etc.)
11
    prometheus.collectDefaultMetrics({ register: this.register })
12

13
    // HTTP metrics
14
    this.httpRequestDuration = new prometheus.Histogram({
15
      name: "http_request_duration_seconds",
16
      help: "HTTP request duration in seconds",
17
      labelNames: ["method", "route", "status"],
18
      buckets: [0.01, 0.05, 0.1, 0.5, 1, 2, 5, 10],
19
    })
20

21
    this.httpRequestTotal = new prometheus.Counter({
22
      name: "http_requests_total",
23
      help: "Total HTTP requests",
24
      labelNames: ["method", "route", "status"],
25
    })
26

27
    // Transform metrics
28
    this.transformDuration = new prometheus.Histogram({
29
      name: "transform_duration_seconds",
30
      help: "Image transformation duration in seconds",
31
      labelNames: ["org", "format", "cached"],
32
      buckets: [0.1, 0.2, 0.5, 1, 2, 5, 10],
33
    })
34

35
    this.transformTotal = new prometheus.Counter({
36
      name: "transforms_total",
37
      help: "Total image transformations",
38
      labelNames: ["org", "format", "cached"],
39
    })
40

41
    this.transformErrors = new prometheus.Counter({
42
      name: "transform_errors_total",
43
      help: "Total transformation errors",
44
      labelNames: ["org", "error_type"],
45
    })
46

47
    // Cache metrics
48
    this.cacheHits = new prometheus.Counter({
49
      name: "cache_hits_total",
50
      help: "Total cache hits",
51
      labelNames: ["layer"], // cdn, redis, database
52
    })
53

54
    this.cacheMisses = new prometheus.Counter({
55
      name: "cache_misses_total",
56
      help: "Total cache misses",
57
      labelNames: ["layer"],
58
    })
59

60
    // Storage metrics
61
    this.storageOperations = new prometheus.Counter({
62
      name: "storage_operations_total",
63
      help: "Total storage operations",
64
      labelNames: ["provider", "operation"], // put, get, delete
65
    })
66

67
    this.storageBytesTransferred = new prometheus.Counter({
68
      name: "storage_bytes_transferred_total",
69
      help: "Total bytes transferred to/from storage",
70
      labelNames: ["provider", "direction"], // upload, download
71
    })
72

73
    // Business metrics
74
    this.assetsUploaded = new prometheus.Counter({
75
      name: "assets_uploaded_total",
76
      help: "Total assets uploaded",
77
      labelNames: ["org", "format"],
78
    })
79

80
    this.bandwidthServed = new prometheus.Counter({
81
      name: "bandwidth_served_bytes_total",
82
      help: "Total bandwidth served",
83
      labelNames: ["org", "space"],
84
    })
85

86
    // Register all metrics
87
    this.register.registerMetric(this.httpRequestDuration)
88
    this.register.registerMetric(this.httpRequestTotal)
89
    this.register.registerMetric(this.transformDuration)
90
    this.register.registerMetric(this.transformTotal)
91
    this.register.registerMetric(this.transformErrors)
92
    this.register.registerMetric(this.cacheHits)
93
    this.register.registerMetric(this.cacheMisses)
94
    this.register.registerMetric(this.storageOperations)
95
    this.register.registerMetric(this.storageBytesTransferred)
96
    this.register.registerMetric(this.assetsUploaded)
97
    this.register.registerMetric(this.bandwidthServed)
98
  }
99

100
  /**
101
   * Get metrics in Prometheus format
102
   */
103
  async getMetrics() {
104
    return await this.register.metrics()
105
  }
106
}
107

108
// Singleton instance
109
const metricsRegistry = new MetricsRegistry()
110

111
export default metricsRegistry

Alerting Configuration

1
groups:
2
  - name: image_service_alerts
3
    interval: 30s
4
    rules:
5
      # High error rate
6
      - alert: HighErrorRate
7
        expr: |
8
          (
9
            sum(rate(http_requests_total{status=~"5.."}[5m])) by (service)
10
            /
11
            sum(rate(http_requests_total[5m])) by (service)
12
          ) > 0.05
13
        for: 5m
14
        labels:
15
          severity: critical
16
        annotations:
17
          summary: "High error rate on {{ $labels.service }}"
18
          description: "Error rate is {{ $value | humanizePercentage }}"
19

20
      # Low cache hit rate
21
      - alert: LowCacheHitRate
22
        expr: |
23
          (
24
            sum(rate(cache_hits_total{layer="redis"}[10m]))
25
            /
26
            (sum(rate(cache_hits_total{layer="redis"}[10m])) + sum(rate(cache_misses_total{layer="redis"}[10m])))
27
          ) < 0.70
28
        for: 15m
29
        labels:
30
          severity: warning
31
        annotations:
32
          summary: "Low cache hit rate"
33
          description: "Cache hit rate is {{ $value | humanizePercentage }}, expected > 70%"
34

35
      # Slow transformations
36
      - alert: SlowTransformations
37
        expr: |
38
          histogram_quantile(0.95,
39
            sum(rate(transform_duration_seconds_bucket[5m])) by (le)
40
          ) > 2
41
        for: 10m
42
        labels:
43
          severity: warning
44
        annotations:
45
          summary: "Slow image transformations"
46
          description: "P95 transform time is {{ $value }}s, expected < 2s"
47

48
      # Queue backup
49
      - alert: QueueBacklog
50
        expr: rabbitmq_queue_messages{queue="transforms"} > 1000
51
        for: 10m
52
        labels:
53
          severity: warning
54
        annotations:
55
          summary: "Transform queue has backlog"
56
          description: "Queue depth is {{ $value }}, workers may be overwhelmed"
57

58
      # Storage quota warning
59
      - alert: StorageQuotaWarning
60
        expr: |
61
          (
62
            sum(storage_bytes_used) by (organization_id)
63
            /
64
            sum(storage_bytes_quota) by (organization_id)
65
          ) > 0.80
66
        for: 1h
67
        labels:
68
          severity: warning
69
        annotations:
70
          summary: "Organization {{ $labels.organization_id }} approaching storage quota"
71
          description: "Usage is {{ $value | humanizePercentage }} of quota"

Health Checks


8 collapsed lines
1
/**
2
 * Health check service
3
 */
4
class HealthCheckService {
5
  constructor(dependencies) {
6
    this.db = dependencies.db
7
    this.redis = dependencies.redis
8
    this.storage = dependencies.storage
9
    this.queue = dependencies.queue
10
  }
11

12
  /**
13
   * Liveness probe - is the service running?
14
   */
15
  async liveness() {
16
    return {
17
      status: "ok",
18
      timestamp: new Date().toISOString(),
19
      uptime: process.uptime(),
20
    }
21
  }
22

23
  /**
24
   * Readiness probe - is the service ready to accept traffic?
25
   */
26
  async readiness() {
27
    const checks = {
28
      database: false,
29
      redis: false,
30
      storage: false,
31
      queue: false,
32
    }
33

34
    // Check database
35
    try {
36
      await this.db.query("SELECT 1")
37
      checks.database = true
38
    } catch (error) {
39
      console.error("Database health check failed:", error)
40
    }
41

42
    // Check Redis
43
    try {
44
      await this.redis.ping()
45
      checks.redis = true
46
    } catch (error) {
47
      console.error("Redis health check failed:", error)
48
    }
49

50
    // Check storage
51
    try {
52
      const testKey = ".health-check"
53
      const testData = Buffer.from("health")
54
      await this.storage.put(testKey, testData, "text/plain")
55
      await this.storage.get(testKey)
56
      await this.storage.delete(testKey)
57
      checks.storage = true
58
    } catch (error) {
59
      console.error("Storage health check failed:", error)
60
    }
61

62
    // Check queue
63
    try {
64
      // Implement queue-specific health check
65
      checks.queue = true
66
    } catch (error) {
67
      console.error("Queue health check failed:", error)
68
    }
69

70
    const allHealthy = Object.values(checks).every((v) => v === true)
71

72
    return {
73
      status: allHealthy ? "ready" : "not ready",
74
      checks,
75
      timestamp: new Date().toISOString(),
76
    }
77
  }
78
}
79

80
export default HealthCheckService

Failure Modes and Edge Cases

This section documents failure scenarios, their detection, and recovery strategies. Understanding these modes is critical for production operations.

Transform Timeout (> 800ms SLA breach)

Cause: Large images (> 5MB), complex operations (multiple resize + effects), cold storage retrieval, or resource contention.

Detection: transform_duration_seconds histogram exceeds p95 threshold.

Mitigation strategies:

Size-based routing: Queue images > 5MB to async workers, return 202 with polling URL
Operation limits: Cap maximum output dimensions (e.g., 4096×4096), reject excessive blur/sharpen values
Timeout with fallback: Return lower-quality transform or original if timeout approaches
Pre-warm cold storage: Move frequently accessed cold-tier assets back to hot tier proactively


5 collapsed lines
1
async function transformWithTimeout(assetId, operations, timeoutMs = 750) {
2
  const controller = new AbortController()
3
  const timeout = setTimeout(() => controller.abort(), timeoutMs)
4

5
  try {
6
    return await transform(assetId, operations, { signal: controller.signal })
7
  } catch (error) {
8
    if (error.name === "AbortError") {
9
      // Return degraded response or queue for async processing
10
      metrics.transformTimeouts.inc({ org: assetId.split("/")[0] })
11

12
      // Option 1: Return original (fastest fallback)
13
      return { fallback: "original", reason: "timeout" }
14

15
      // Option 2: Queue and return 202
16
      // await queue.publish('transforms', { assetId, operations })
17
      // return { status: 202, pollUrl: `/v1/transforms/${jobId}` }
18
    }
19
    throw error
20
  } finally {
21
    clearTimeout(timeout)
22
  }
23
}

Storage Tier Restoration Latency

Cold storage retrieval (Glacier, Coldline, Archive) adds 1-12 hours of latency. This breaks the synchronous transform guarantee.

Mitigation:

Tier tracking in database: derived_assets.cache_tier column indicates current storage tier
Proactive restoration: Cron job restores cold assets with recent last_accessed_at updates
Graceful degradation: For cold original assets, return 202 and trigger async restoration

1
-- Find cold assets accessed recently that should be restored
2
SELECT id, storage_key, cache_tier
3
FROM derived_assets
4
WHERE cache_tier = 'cold'
5
  AND last_accessed_at > NOW() - INTERVAL '24 hours'
6
ORDER BY access_count DESC
7
LIMIT 100;

CDN Cache Invalidation Failures

Scenario: Asset updated, but stale version persists in CDN edge caches.

Root causes:

Invalidation API rate limits exceeded
Propagation delays (CDNs quote 0-60 seconds, but outliers exist)
Wildcard invalidation missed specific paths

Mitigation:

Version in URL: Include asset version (/v{version}/) so updates get new cache keys automatically
Soft purge with fallback: Use CDN’s stale-while-revalidate to serve stale during revalidation
Invalidation monitoring: Track invalidation success rates and propagation times
Dual-write period: For critical updates, serve from origin for 60 seconds before relying on CDN

Lock Contention Under Load

Scenario: Multiple workers compete for the same transform lock, causing lock acquisition timeouts.

Detection: redlock_acquisition_failures metric spikes, lock_wait_time increases.

Mitigation:

Lock-free fast path: Check if transform exists before acquiring lock (optimistic check)
Retry with jitter: Exponential backoff with randomized jitter to prevent thundering herd
Lock timeout tuning: Set lock TTL to 2x expected transform time, not a fixed value
Shard by hash prefix: Distribute lock contention across multiple Redis masters

Partial Upload / Corrupt Original

Scenario: Upload interrupted, storage contains partial file, transform fails with cryptic libvips error.

Detection: Sharp throws VipsError on invalid input; content hash doesn’t match expected.

Mitigation:

Hash verification on upload: Compute SHA-256 during upload, verify before marking complete
Input validation: Check magic bytes and basic structure before transformation
Graceful error messages: Map libvips errors to user-friendly responses


3 collapsed lines
1
import sharp from "sharp"
2

3
async function validateImage(buffer) {
4
  try {
5
    const metadata = await sharp(buffer).metadata()
6

7
    // Check for reasonable dimensions
8
    if (metadata.width > 50000 || metadata.height > 50000) {
9
      return { valid: false, error: "Image dimensions exceed maximum (50000×50000)" }
10
    }
11

12
    // Check for minimum size (likely corrupt if too small)
13
    if (buffer.length < 100) {
14
      return { valid: false, error: "Image file too small, possibly corrupt" }
15
    }
16

17
    return { valid: true, metadata }
18
  } catch (error) {
19
    return { valid: false, error: `Invalid image: ${error.message}` }
20
  }
21
}

Rate Limit Exhaustion

Scenario: Burst traffic exhausts rate limits, legitimate requests rejected.

Mitigation:

Tiered limits: Higher limits for authenticated requests vs. anonymous
Burst allowance: Sliding window with small burst buffer (e.g., 110% of limit for 10 seconds)
Priority queuing: VIP tenants get separate, higher limits
Graceful 429 responses: Include Retry-After header with exact reset time

Conclusion

This architecture provides a production-ready foundation for building a cloud-agnostic image processing platform. The key insight is that image transformation is an ideal candidate for aggressive caching: transformations are pure functions (same inputs → same outputs), making content-addressed storage highly effective.

Critical tradeoffs made in this design:

Synchronous-first over queue-first: We accept higher p99 latency for small images in exchange for simpler client integration (no polling). For large images, we fall back to async.
Efficiency locks over safety locks: Redlock prevents duplicate work but doesn’t guarantee mutual exclusion. This is acceptable because content-addressed storage ensures idempotency—duplicate transforms are wasteful, not dangerous.
Edge authentication over origin-only: Moving signature validation to the edge adds complexity but dramatically improves private content latency and reduces origin load.
Storage tiering over uniform hot storage: Cold storage introduces retrieval latency but reduces costs by 40-60% for infrequently accessed content.

What this architecture does not cover:

Video transcoding (different latency characteristics, requires different chunking strategies)
Real-time image editing (collaborative features, operational transforms)
AI/ML-based transformations (background removal, upscaling—requires GPU infrastructure)
Geographic data residency requirements (beyond standard CDN region configuration)

Appendix

Prerequisites

Familiarity with distributed systems concepts (caching, consistency, partitioning)
Understanding of HTTP caching semantics (Cache-Control, ETags, CDN behavior)
Basic knowledge of image formats and compression (JPEG, WebP, AVIF characteristics)
Experience with at least one cloud provider’s storage and CDN offerings

Terminology

Term	Definition
Asset	An original uploaded image, stored with its content hash
Derived Asset	A transformed version of an asset, identified by the hash of (original + operations)
Content-Addressed	Storage keyed by content hash rather than arbitrary ID; same content → same key
Fencing Token	Monotonically increasing token used to detect stale lock holders
Operations Hash	SHA-256 of (canonical operation string + original content hash + output format)
Signed URL	URL with cryptographic signature proving authorization; includes expiration timestamp
Storage Tier	Access latency class: hot (ms), warm (seconds), cold (minutes to hours)
Transform Canonicalization	Normalizing operation parameters to ensure equivalent transforms produce identical cache keys

Summary

Multi-layer caching (CDN → Redis → Database → Storage) eliminates 99.9% of redundant processing
Content-addressed storage with deterministic hashing ensures transform idempotency
Sharp (libvips 8.18) provides 26x performance over alternatives with ~50MB memory footprint
AVIF (94.89% browser support) offers 50% compression improvement over JPEG; WebP (95.93%) offers 25-34%
Redlock is appropriate for efficiency optimization but not safety-critical mutual exclusion
Edge authentication with normalized cache keys maximizes CDN hit rates for private content
Hierarchical policies (Organization → Tenant → Space) enable flexible multi-tenant isolation

References

Specifications and Official Documentation

RFC 2104 - HMAC - HMAC-SHA256 specification for signed URLs
AV1 Image File Format (AVIF) - AOM AVIF specification
WebP Container Specification - Google WebP format spec
HTTP Caching (RFC 9111) - HTTP caching semantics

Core Library Documentation

Sharp Documentation - High-performance Node.js image processing
Sharp Performance Benchmarks - 64.42 ops/sec JPEG, 26x faster than jimp
Sharp Changelog v0.34.5 - November 2025 release notes
libvips 8.18 Release Notes - UltraHDR, Camera RAW, Oklab support
Redis Distributed Locks - Official Redlock documentation

Design Rationale and Analysis

How to do Distributed Locking - Martin Kleppmann’s Redlock analysis (fencing tokens, timing assumptions)
Is Redlock Safe? - Salvatore Sanfilippo’s (antirez) response

Browser Support and Format Comparison

Can I Use: AVIF - 94.89% global support (January 2026)
Can I Use: WebP - 95.93% global support (January 2026)
JPEG XL in Chromium - Chrome team’s January 2026 merge

Cloud Provider SDKs

AWS SDK for JavaScript v3 - S3 client
Google Cloud Storage Node.js - GCS client
Azure Blob Storage SDK - Azure Storage client

Edge Computing

CloudFlare Workers Documentation - Edge compute platform
CloudFront Functions - AWS edge compute
Fastly Compute@Edge - Fastly edge platform

Frameworks and Tools

Fastify - Low-overhead Node.js web framework
PostgreSQL JSONB - JSON support documentation
Prometheus - Monitoring and alerting toolkit

Read more