Building a Multi-Tenant Image Service Platform
This document presents the architectural design for a cloud-agnostic, multi-tenant image processing platform that provides on-the-fly transformations with enterprise-grade security, performance, and cost optimization. The platform supports hierarchical multi-tenancy (Organization → Tenant → Space), public and private image delivery, and deployment across AWS, GCP, Azure, or on-premise infrastructure. Key capabilities include deterministic transformation caching to ensure sub-second delivery, signed URL generation for secure private access, CDN integration for global edge caching, and a “transform-once-serve-forever” approach that minimizes processing costs while guaranteeing HTTP 200 responses even for first-time transformation requests.
- System Overview
- Component Naming
- Architecture Principles
- Technology Stack
- High-Level Architecture
- Data Models
- URL Design
- Core Request Flows
- Image Processing Pipeline
- Security & Access Control
- Deployment Architecture
- Cost Optimization
- Monitoring & Operations
System Overview
Core Capabilities
-
Multi-Tenancy Hierarchy
- Organization: Top-level tenant boundary
- Tenant: Logical partition within organization (brands, environments)
- Space: Project workspace containing assets
-
Image Access Models
- Public Images: Direct URL access with CDN caching
- Private Images: Cryptographically signed URLs with expiration
-
On-the-Fly Processing
- Real-time transformations (resize, crop, format, quality, effects)
- Named presets for common transformation patterns
- Automatic format optimization (WebP, AVIF)
- Guaranteed 200 response even on first transform request
-
Cloud-Agnostic Design
- Deployment to AWS, GCP, Azure, or on-premise
- Storage abstraction layer for portability
- Kubernetes-based orchestration
-
Performance & Cost Optimization
- Multi-layer caching (CDN → Redis → Database → Storage)
- Transform deduplication with content-addressed storage
- Lazy preset generation
- Storage lifecycle management
Component Naming
Core Services
| Component | Name | Purpose |
|---|---|---|
| Entry point | Image Gateway | API gateway, routing, authentication |
| Transform service | Transform Engine | On-demand image processing |
| Upload handler | Asset Ingestion Service | Image upload and validation |
| Admin API | Control Plane API | Tenant management, configuration |
| Background jobs | Transform Workers | Async preset generation |
| Metadata store | Registry Service | Asset and transformation metadata |
| Storage layer | Object Store Adapter | Cloud-agnostic storage interface |
| CDN layer | Edge Cache | Global content delivery |
| URL signing | Signature Service | Private URL cryptographic signing |
Data Entities
| Entity | Name | Description |
|---|---|---|
| Uploaded file | Asset | Original uploaded image |
| Processed variant | Derived Asset | Transformed image |
| Named transform | Preset | Reusable transformation template |
| Transform result | Variant | Cached transformation output |
Architecture Principles
1. Cloud Portability First
- Storage Abstraction: Unified interface for S3, GCS, Azure Blob, MinIO
- Queue Abstraction: Support for SQS, Pub/Sub, Service Bus, RabbitMQ
- Kubernetes Native: Deploy consistently across clouds
- No Vendor Lock-in: Use open standards where possible
2. Performance SLA
- Edge Hit: < 50ms (CDN cache)
- Origin Hit: < 200ms (application cache)
- First Transform: < 800ms (sync processing for images < 5MB)
- Always Return 200: Never return 202 or redirect
3. Transform Once, Serve Forever
- Content-addressed transformation storage
- Idempotent processing with distributed locking
- Permanent caching with invalidation API
- Deduplication across requests
4. Security by Default
- Signed URLs for private content
- Row-level tenancy isolation
- Encryption at rest and in transit
- Comprehensive audit logging
5. Cost Optimization
- Multi-layer caching to reduce processing
- Storage lifecycle automation
- Format optimization (WebP/AVIF)
- Rate limiting and resource quotas
Technology Stack
Core Technologies
Image Processing Library
| Technology | Pros | Cons | Recommendation |
|---|---|---|---|
| Sharp (libvips) | Fast, low memory, modern formats, Node.js native | Linux-focused build | ✅ Recommended |
| ImageMagick | Feature-rich, mature | Slower, higher memory | Use for complex operations |
| Jimp | Pure JavaScript, portable | Slower, limited formats | Development only |
Choice: Sharp for primary processing with ImageMagick fallback for advanced features.
npm install sharpCaching Layer
| Technology | Use Case | Pros | Cons | Recommendation |
|---|---|---|---|---|
| Redis | Application cache, locks | Fast, pub/sub, clustering | Memory cost | ✅ Primary cache |
| Memcached | Simple KV cache | Faster for simple gets | No persistence, limited data types | Skip |
| Hazelcast | Distributed cache | Java ecosystem, compute | Complexity | Skip for Node.js |
Choice: Redis (6+ with Redis Cluster for HA)
npm install ioredisStorage Clients
| Provider | Library | Notes |
|---|---|---|
| AWS S3 | @aws-sdk/client-s3 | Official v3 SDK |
| Google Cloud Storage | @google-cloud/storage | Official SDK |
| Azure Blob | @azure/storage-blob | Official SDK |
| MinIO (on-prem) | minio or S3 SDK | S3-compatible |
npm install @aws-sdk/client-s3 @google-cloud/storage @azure/storage-blob minioMessage Queue
| Provider | Library | Use Case |
|---|---|---|
| AWS SQS | @aws-sdk/client-sqs | AWS deployments |
| GCP Pub/Sub | @google-cloud/pubsub | GCP deployments |
| Azure Service Bus | @azure/service-bus | Azure deployments |
| RabbitMQ | amqplib | On-premise, multi-cloud |
Choice: Provider-specific for cloud, RabbitMQ for on-premise
npm install amqplibWeb Framework
| Framework | Pros | Cons | Recommendation |
|---|---|---|---|
| Fastify | Fast, low overhead, TypeScript support | Less mature ecosystem | ✅ Recommended |
| Express | Mature, large ecosystem | Slower, callback-based | Acceptable |
| Koa | Modern, async/await | Smaller ecosystem | Acceptable |
Choice: Fastify for performance
npm install fastify @fastify/multipart @fastify/corsDatabase
| Technology | Pros | Cons | Recommendation |
|---|---|---|---|
| PostgreSQL | JSONB, full-text search, reliability | Complex clustering | ✅ Recommended |
| MySQL | Mature, simple | Limited JSON support | Acceptable |
| MongoDB | Flexible schema | Tenancy complexity | Not recommended |
Choice: PostgreSQL 15+ with JSONB for policies
npm install pgURL Signing
| Library | Algorithm | Recommendation |
|---|---|---|
| Node crypto (built-in) | HMAC-SHA256 | ✅ Recommended |
jsonwebtoken | JWT (HMAC/RSA) | Use for JWT tokens |
tweetnacl | Ed25519 | Use for EdDSA |
Choice: Built-in crypto module for HMAC-SHA256 signatures
import crypto from "crypto"Distributed Locking
| Technology | Pros | Cons | Recommendation |
|---|---|---|---|
| Redlock (Redis) | Simple, Redis-based | Network partitions | ✅ Recommended |
| etcd | Consistent, Kubernetes native | Separate service | Use if already running |
| Database locks | Simple, transactional | Contention, less scalable | Development only |
Choice: Redlock with Redis
npm install redlockHigh-Level Architecture
System Diagram
Request Flow: Public Image
Request Flow: Private Image
Data Models
Database Schema
-- Organizations (Top-level tenants)CREATE TABLE organizations ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), slug VARCHAR(100) UNIQUE NOT NULL, name VARCHAR(255) NOT NULL, status VARCHAR(20) DEFAULT 'active',
-- Metadata created_at TIMESTAMPTZ DEFAULT NOW(), updated_at TIMESTAMPTZ DEFAULT NOW(), deleted_at TIMESTAMPTZ NULL);
-- Tenants (Optional subdivision within org)CREATE TABLE tenants ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), organization_id UUID NOT NULL REFERENCES organizations(id) ON DELETE CASCADE, slug VARCHAR(100) NOT NULL, name VARCHAR(255) NOT NULL, status VARCHAR(20) DEFAULT 'active',
-- Metadata created_at TIMESTAMPTZ DEFAULT NOW(), updated_at TIMESTAMPTZ DEFAULT NOW(), deleted_at TIMESTAMPTZ NULL,
UNIQUE(organization_id, slug));
-- Spaces (Projects within tenant)CREATE TABLE spaces ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), organization_id UUID NOT NULL REFERENCES organizations(id) ON DELETE CASCADE, tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, slug VARCHAR(100) NOT NULL, name VARCHAR(255) NOT NULL,
-- Default policies (inherit from tenant/org if NULL) default_access VARCHAR(20) DEFAULT 'private', -- 'public' or 'private'
-- Metadata created_at TIMESTAMPTZ DEFAULT NOW(), updated_at TIMESTAMPTZ DEFAULT NOW(), deleted_at TIMESTAMPTZ NULL,
UNIQUE(tenant_id, slug), CONSTRAINT valid_access CHECK (default_access IN ('public', 'private')));
-- Policies (Hierarchical configuration)CREATE TABLE policies ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
-- Scope (org, tenant, or space) scope_type VARCHAR(20) NOT NULL, -- 'organization', 'tenant', 'space' scope_id UUID NOT NULL,
-- Policy data key VARCHAR(100) NOT NULL, value JSONB NOT NULL,
-- Metadata updated_at TIMESTAMPTZ DEFAULT NOW(),
UNIQUE(scope_type, scope_id, key), CONSTRAINT valid_scope_type CHECK (scope_type IN ('organization', 'tenant', 'space')));
-- API Keys for authenticationCREATE TABLE api_keys ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), organization_id UUID NOT NULL REFERENCES organizations(id) ON DELETE CASCADE, tenant_id UUID REFERENCES tenants(id) ON DELETE CASCADE,
-- Key identity key_id VARCHAR(50) UNIQUE NOT NULL, -- kid for rotation name VARCHAR(255) NOT NULL, secret_hash VARCHAR(255) NOT NULL, -- bcrypt/argon2
-- Permissions scopes TEXT[] DEFAULT ARRAY['image:read']::TEXT[],
-- Status status VARCHAR(20) DEFAULT 'active', expires_at TIMESTAMPTZ NULL, last_used_at TIMESTAMPTZ NULL,
-- Metadata created_at TIMESTAMPTZ DEFAULT NOW(), rotated_at TIMESTAMPTZ NULL);
-- Assets (Original uploaded images)CREATE TABLE assets ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), organization_id UUID NOT NULL REFERENCES organizations(id) ON DELETE CASCADE, tenant_id UUID NOT NULL REFERENCES tenants(id) ON DELETE CASCADE, space_id UUID NOT NULL REFERENCES spaces(id) ON DELETE CASCADE,
-- Versioning version INTEGER NOT NULL DEFAULT 1,
-- File info filename VARCHAR(500) NOT NULL, original_filename VARCHAR(500) NOT NULL, mime_type VARCHAR(100) NOT NULL,
-- Storage storage_provider VARCHAR(50) NOT NULL, -- 'aws', 'gcp', 'azure', 'minio' storage_key VARCHAR(1000) NOT NULL UNIQUE,
-- Content size_bytes BIGINT NOT NULL, content_hash VARCHAR(64) NOT NULL, -- SHA-256 for deduplication
-- Image metadata width INTEGER, height INTEGER, format VARCHAR(10), color_space VARCHAR(20), has_alpha BOOLEAN,
-- Organization tags TEXT[] DEFAULT ARRAY[]::TEXT[], folder VARCHAR(1000) DEFAULT '/',
-- Access control access_policy VARCHAR(20) NOT NULL DEFAULT 'private',
-- EXIF and metadata exif JSONB,
-- Upload info uploaded_by UUID, -- Reference to user uploaded_at TIMESTAMPTZ DEFAULT NOW(),
-- Metadata created_at TIMESTAMPTZ DEFAULT NOW(), updated_at TIMESTAMPTZ DEFAULT NOW(), deleted_at TIMESTAMPTZ NULL,
CONSTRAINT valid_access_policy CHECK (access_policy IN ('public', 'private')));
-- Transformation Presets (Named transformation templates)CREATE TABLE presets ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), organization_id UUID NOT NULL REFERENCES organizations(id) ON DELETE CASCADE, tenant_id UUID REFERENCES tenants(id) ON DELETE CASCADE, space_id UUID REFERENCES spaces(id) ON DELETE CASCADE,
-- Preset identity name VARCHAR(100) NOT NULL, slug VARCHAR(100) NOT NULL, description TEXT,
-- Transformation definition operations JSONB NOT NULL, /* Example: { "resize": {"width": 800, "height": 600, "fit": "cover"}, "format": "webp", "quality": 85, "sharpen": 1 } */
-- Auto-generation rules auto_generate BOOLEAN DEFAULT false, match_tags TEXT[] DEFAULT NULL, match_folders TEXT[] DEFAULT NULL,
-- Metadata created_at TIMESTAMPTZ DEFAULT NOW(), updated_at TIMESTAMPTZ DEFAULT NOW(),
UNIQUE(organization_id, tenant_id, space_id, slug));
-- Derived Assets (Transformed images)CREATE TABLE derived_assets ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), asset_id UUID NOT NULL REFERENCES assets(id) ON DELETE CASCADE,
-- Transformation identity operations_canonical VARCHAR(500) NOT NULL, -- Canonical string representation operations_hash VARCHAR(64) NOT NULL, -- SHA-256 of (canonical_ops + asset.content_hash)
-- Output output_format VARCHAR(10) NOT NULL,
-- Storage storage_provider VARCHAR(50) NOT NULL, storage_key VARCHAR(1000) NOT NULL UNIQUE,
-- Content size_bytes BIGINT NOT NULL, content_hash VARCHAR(64) NOT NULL,
-- Image metadata width INTEGER, height INTEGER,
-- Performance tracking processing_time_ms INTEGER, access_count BIGINT DEFAULT 0, last_accessed_at TIMESTAMPTZ,
-- Cache tier for lifecycle cache_tier VARCHAR(20) DEFAULT 'hot', -- 'hot', 'warm', 'cold'
-- Metadata created_at TIMESTAMPTZ DEFAULT NOW(), updated_at TIMESTAMPTZ DEFAULT NOW(),
UNIQUE(asset_id, operations_hash));
-- Transform Cache (Fast lookup for existing transforms)CREATE TABLE transform_cache ( asset_id UUID NOT NULL REFERENCES assets(id) ON DELETE CASCADE, operations_hash VARCHAR(64) NOT NULL, derived_asset_id UUID NOT NULL REFERENCES derived_assets(id) ON DELETE CASCADE,
-- Metadata created_at TIMESTAMPTZ DEFAULT NOW(),
PRIMARY KEY(asset_id, operations_hash));
-- Usage tracking (for cost and analytics)CREATE TABLE usage_metrics ( id BIGSERIAL PRIMARY KEY, date DATE NOT NULL, organization_id UUID NOT NULL, tenant_id UUID NOT NULL, space_id UUID NOT NULL,
-- Metrics request_count BIGINT DEFAULT 0, bandwidth_bytes BIGINT DEFAULT 0, storage_bytes BIGINT DEFAULT 0, transform_count BIGINT DEFAULT 0, transform_cpu_ms BIGINT DEFAULT 0,
UNIQUE(date, organization_id, tenant_id, space_id));
-- Audit logsCREATE TABLE audit_logs ( id BIGSERIAL PRIMARY KEY, organization_id UUID NOT NULL, tenant_id UUID,
-- Actor actor_type VARCHAR(20) NOT NULL, -- 'user', 'api_key', 'system' actor_id UUID NOT NULL,
-- Action action VARCHAR(100) NOT NULL, -- 'asset.upload', 'asset.delete', etc. resource_type VARCHAR(50) NOT NULL, resource_id UUID,
-- Context metadata JSONB, ip_address INET, user_agent TEXT,
-- Timestamp created_at TIMESTAMPTZ DEFAULT NOW());
-- Indexes for performanceCREATE INDEX idx_tenants_org ON tenants(organization_id);CREATE INDEX idx_spaces_tenant ON spaces(tenant_id);CREATE INDEX idx_spaces_org ON spaces(organization_id);CREATE INDEX idx_policies_scope ON policies(scope_type, scope_id);
CREATE INDEX idx_assets_space ON assets(space_id) WHERE deleted_at IS NULL;CREATE INDEX idx_assets_org ON assets(organization_id) WHERE deleted_at IS NULL;CREATE INDEX idx_assets_hash ON assets(content_hash);CREATE INDEX idx_assets_tags ON assets USING GIN(tags);CREATE INDEX idx_assets_folder ON assets(folder);
CREATE INDEX idx_derived_asset ON derived_assets(asset_id);CREATE INDEX idx_derived_hash ON derived_assets(operations_hash);CREATE INDEX idx_derived_tier ON derived_assets(cache_tier);CREATE INDEX idx_derived_access ON derived_assets(last_accessed_at);
CREATE INDEX idx_usage_date_org ON usage_metrics(date, organization_id);CREATE INDEX idx_audit_org_time ON audit_logs(organization_id, created_at);URL Design
URL Structure Philosophy
URLs should be:
- Self-describing: Clearly indicate access mode (public vs private)
- Cacheable: CDN-friendly with stable cache keys
- Deterministic: Same transformation = same URL
- Human-readable: Easy to understand and debug
URL Patterns
Public Images
Format:https://{cdn-domain}/v1/pub/{org}/{tenant}/{space}/img/{asset-id}/v{version}/{operations}.{ext}
Examples:- Original: https://img.example.com/v1/pub/acme/website/marketing/img/01JBXYZ.../v1/original.jpg
- Resized: https://img.example.com/v1/pub/acme/website/marketing/img/01JBXYZ.../v1/w_800-h_600-f_cover.webp
- With preset: https://img.example.com/v1/pub/acme/website/marketing/img/01JBXYZ.../v1/preset_thumbnail.webp
- Format auto-negotiation: https://img.example.com/v1/pub/acme/website/marketing/img/01JBXYZ.../v1/w_1200-f_auto-q_auto.jpgPrivate Images (Base URL)
Format:https://{cdn-domain}/v1/priv/{org}/{tenant}/{space}/img/{asset-id}/v{version}/{operations}.{ext}
Example:https://img.example.com/v1/priv/acme/internal/confidential/img/01JBXYZ.../v1/w_800-h_600.jpgPrivate Images (Signed URL)
Format:{base-url}?sig={signature}&exp={unix-timestamp}&kid={key-id}
Example:https://img.example.com/v1/priv/acme/internal/confidential/img/01JBXYZ.../v1/w_800-h_600.jpg?sig=dGVzdHNpZ25hdHVyZQ&exp=1731427200&kid=key_123
Components:- sig: Base64URL-encoded HMAC-SHA256 signature- exp: Unix timestamp (seconds) when URL expires- kid: Key ID for signature rotation supportTransformation Parameters
Operations are encoded as hyphen-separated key-value pairs:
Parameter Format: {key}_{value}
Supported Parameters:- w_{pixels} : Width (e.g., w_800)- h_{pixels} : Height (e.g., h_600)- f_{mode} : Fit mode - cover, contain, fill, inside, outside, pad- q_{quality} : Quality 1-100 or 'auto' (e.g., q_85)- fmt_{format} : Format - jpg, png, webp, avif, gif, 'auto'- r_{degrees} : Rotation - 90, 180, 270- g_{gravity} : Crop gravity - center, north, south, east, west, etc.- b_{color} : Background color for pad (e.g., b_ffffff)- blur_{radius} : Blur radius 0.3-1000 (e.g., blur_5)- sharpen_{amount} : Sharpen amount 0-10 (e.g., sharpen_2)- bw : Convert to black & white (grayscale)- flip : Flip horizontal- flop : Flip vertical- preset_{name} : Apply named preset
Examples:- w_800-h_600-f_cover-q_85- w_400-h_400-f_contain-fmt_webp- preset_thumbnail- w_1200-sharpen_2-fmt_webp-q_90- w_800-h_600-f_pad-b_ffffffOperation Canonicalization
To ensure cache hit consistency, operations must be canonicalized:
/** * Canonicalizes transformation operations to ensure consistent cache keys */function canonicalizeOperations(opsString) { const ops = parseOperations(opsString)
// Apply defaults if (!ops.quality && ops.format !== "png") ops.quality = 85 if (!ops.fit && (ops.width || ops.height)) ops.fit = "cover"
// Normalize values if (ops.quality) ops.quality = Math.max(1, Math.min(100, ops.quality)) if (ops.width) ops.width = Math.floor(ops.width) if (ops.height) ops.height = Math.floor(ops.height)
// Canonical order: fmt, w, h, f, g, b, q, r, sharpen, blur, bw, flip, flop const order = ["fmt", "w", "h", "f", "g", "b", "q", "r", "sharpen", "blur", "bw", "flip", "flop"]
return order .filter((key) => ops[key] !== undefined) .map((key) => `${key}_${ops[key]}`) .join("-")}Core Request Flows
Upload Flow with Auto-Presets
Synchronous Transform Flow (Guaranteed 200)
Image Processing Pipeline
Processing Implementation
import sharp from "sharp"import crypto from "crypto"
/** * Transform Engine - Core image processing service */class TransformEngine { constructor(storage, registry, cache, lockManager) { this.storage = storage this.registry = registry this.cache = cache this.lockManager = lockManager }
/** * Process image transformation with deduplication */ async transform(assetId, operations, acceptHeader) { // 1. Canonicalize operations const canonicalOps = this.canonicalizeOps(operations) const outputFormat = this.determineFormat(operations.format, acceptHeader)
// 2. Generate transformation hash (content-addressed) const asset = await this.registry.getAsset(assetId) const opsHash = this.generateOpsHash(canonicalOps, asset.contentHash, outputFormat)
// 3. Check multi-layer cache const cacheKey = `transform:${assetId}:${opsHash}`
// Layer 1: Redis cache const cached = await this.cache.get(cacheKey) if (cached) { return { buffer: Buffer.from(cached.buffer, "base64"), contentType: cached.contentType, fromCache: "redis", } }
// Layer 2: Database + Storage const derived = await this.registry.getDerivedAsset(assetId, opsHash) if (derived) { const buffer = await this.storage.get(derived.storageKey)
// Populate Redis cache await this.cache.set( cacheKey, { buffer: buffer.toString("base64"), contentType: `image/${derived.outputFormat}`, }, 3600, ) // 1 hour TTL
// Update access metrics await this.registry.incrementAccessCount(derived.id)
return { buffer, contentType: `image/${derived.outputFormat}`, fromCache: "storage", } }
// Layer 3: Process new transformation (with distributed locking) const lockKey = `lock:transform:${assetId}:${opsHash}` const lock = await this.lockManager.acquire(lockKey, 60000) // 60s TTL
try { // Double-check after acquiring lock const doubleCheck = await this.registry.getDerivedAsset(assetId, opsHash) if (doubleCheck) { const buffer = await this.storage.get(doubleCheck.storageKey) return { buffer, contentType: `image/${doubleCheck.outputFormat}`, fromCache: "concurrent", } }
// Process transformation const startTime = Date.now()
// Fetch original const originalBuffer = await this.storage.get(asset.storageKey)
// Apply transformations const processedBuffer = await this.applyTransformations(originalBuffer, canonicalOps, outputFormat)
const processingTime = Date.now() - startTime
// Get metadata of processed image const metadata = await sharp(processedBuffer).metadata()
// Generate storage key const storageKey = `derived/${asset.organizationId}/${asset.tenantId}/${asset.spaceId}/${assetId}/v${asset.version}/${opsHash}.${outputFormat}`
// Store processed image await this.storage.put(storageKey, processedBuffer, `image/${outputFormat}`)
// Compute content hash const contentHash = crypto.createHash("sha256").update(processedBuffer).digest("hex")
// Save to database const derivedAsset = await this.registry.createDerivedAsset({ assetId, operationsCanonical: canonicalOps, operationsHash: opsHash, outputFormat, storageProvider: this.storage.provider, storageKey, sizeBytes: processedBuffer.length, contentHash, width: metadata.width, height: metadata.height, processingTimeMs: processingTime, })
// Update transform cache index await this.registry.cacheTransform(assetId, opsHash, derivedAsset.id)
// Populate Redis cache await this.cache.set( cacheKey, { buffer: processedBuffer.toString("base64"), contentType: `image/${outputFormat}`, }, 3600, )
return { buffer: processedBuffer, contentType: `image/${outputFormat}`, fromCache: "none", processingTime, } } finally { await lock.release() } }
/** * Apply transformations using Sharp */ async applyTransformations(inputBuffer, operations, outputFormat) { let pipeline = sharp(inputBuffer)
// Rotation if (operations.rotation) { pipeline = pipeline.rotate(operations.rotation) }
// Flip/Flop if (operations.flip) { pipeline = pipeline.flip() } if (operations.flop) { pipeline = pipeline.flop() }
// Resize if (operations.width || operations.height) { const resizeOptions = { width: operations.width, height: operations.height, fit: operations.fit || "cover", position: operations.gravity || "centre", withoutEnlargement: true, }
// Background for 'pad' fit if (operations.fit === "pad" && operations.background) { resizeOptions.background = this.parseColor(operations.background) }
pipeline = pipeline.resize(resizeOptions) }
// Effects if (operations.blur) { pipeline = pipeline.blur(operations.blur) }
if (operations.sharpen) { pipeline = pipeline.sharpen(operations.sharpen) }
if (operations.grayscale) { pipeline = pipeline.grayscale() }
// Format conversion and quality const quality = operations.quality === "auto" ? this.getAutoQuality(outputFormat) : operations.quality || 85
switch (outputFormat) { case "jpg": case "jpeg": pipeline = pipeline.jpeg({ quality, mozjpeg: true, // Better compression }) break
case "png": pipeline = pipeline.png({ quality, compressionLevel: 9, adaptiveFiltering: true, }) break
case "webp": pipeline = pipeline.webp({ quality, effort: 6, // Compression effort (0-6) }) break
case "avif": pipeline = pipeline.avif({ quality, effort: 6, }) break
case "gif": pipeline = pipeline.gif() break }
return await pipeline.toBuffer() }
/** * Determine output format based on operations and Accept header */ determineFormat(requestedFormat, acceptHeader) { if (requestedFormat && requestedFormat !== "auto") { return requestedFormat }
// Format negotiation based on Accept header const accept = (acceptHeader || "").toLowerCase()
if (accept.includes("image/avif")) { return "avif" // Best compression }
if (accept.includes("image/webp")) { return "webp" // Good compression, wide support }
return "jpg" // Fallback }
/** * Get automatic quality based on format */ getAutoQuality(format) { const qualityMap = { avif: 75, // AVIF compresses very well webp: 80, // WebP compresses well jpg: 85, // JPEG needs higher quality jpeg: 85, png: 90, // PNG is lossless }
return qualityMap[format] || 85 }
/** * Generate deterministic hash for transformation */ generateOpsHash(canonicalOps, assetContentHash, outputFormat) { const payload = `${canonicalOps};${assetContentHash};fmt=${outputFormat}` return crypto.createHash("sha256").update(payload).digest("hex") }
/** * Parse color hex string to RGB object */ parseColor(hex) { hex = hex.replace("#", "") return { r: parseInt(hex.substr(0, 2), 16), g: parseInt(hex.substr(2, 2), 16), b: parseInt(hex.substr(4, 2), 16), } }
/** * Canonicalize operations */ canonicalizeOps(ops) { // Implementation details... // Return canonical string like "w_800-h_600-f_cover-q_85-fmt_webp" }}
export default TransformEngineDistributed Locking
import Redlock from "redlock"import Redis from "ioredis"
/** * Distributed lock manager using Redlock algorithm */class LockManager { constructor(redisClients) { // Initialize Redlock with multiple Redis instances for reliability this.redlock = new Redlock(redisClients, { driftFactor: 0.01, retryCount: 10, retryDelay: 200, retryJitter: 200, automaticExtensionThreshold: 500, }) }
/** * Acquire distributed lock */ async acquire(key, ttl = 30000) { try { const lock = await this.redlock.acquire([`lock:${key}`], ttl) return lock } catch (error) { throw new Error(`Failed to acquire lock for ${key}: ${error.message}`) } }
/** * Try to acquire lock without waiting */ async tryAcquire(key, ttl = 30000) { try { return await this.redlock.acquire([`lock:${key}`], ttl) } catch (error) { return null // Lock not acquired } }}
// Usageconst redis1 = new Redis({ host: "redis-1" })const redis2 = new Redis({ host: "redis-2" })const redis3 = new Redis({ host: "redis-3" })
const lockManager = new LockManager([redis1, redis2, redis3])
export default LockManagerSecurity & Access Control
Signed URL Implementation
import crypto from "crypto"
/** * Signature Service - Generate and verify signed URLs */class SignatureService { constructor(registry) { this.registry = registry }
/** * Generate signed URL for private images */ async generateSignedUrl(baseUrl, orgId, tenantId, ttl = null) { // Get signing key for tenant/org const apiKey = await this.registry.getSigningKey(orgId, tenantId)
// Get effective policy for TTL const policy = await this.registry.getEffectivePolicy(orgId, tenantId) const defaultTtl = policy.signed_url_ttl_default_seconds || 3600 const maxTtl = policy.signed_url_ttl_max_seconds || 86400
// Calculate expiry const requestedTtl = ttl || defaultTtl const effectiveTtl = Math.min(requestedTtl, maxTtl) const expiresAt = Math.floor(Date.now() / 1000) + effectiveTtl
// Create canonical string for signing const url = new URL(baseUrl) const canonicalString = this.createCanonicalString(url.pathname, expiresAt, url.hostname, tenantId)
// Generate HMAC-SHA256 signature const signature = crypto.createHmac("sha256", apiKey.secret).update(canonicalString).digest("base64url") // URL-safe base64
// Append signature, expiry, and key ID to URL url.searchParams.set("sig", signature) url.searchParams.set("exp", expiresAt.toString()) url.searchParams.set("kid", apiKey.keyId)
return { url: url.toString(), expiresAt: new Date(expiresAt * 1000), expiresIn: effectiveTtl, } }
/** * Verify signed URL */ async verifySignedUrl(signedUrl, orgId, tenantId) { const url = new URL(signedUrl)
// Extract signature components const signature = url.searchParams.get("sig") const expiresAt = parseInt(url.searchParams.get("exp")) const keyId = url.searchParams.get("kid")
if (!signature || !expiresAt || !keyId) { return { valid: false, error: "Missing signature components", } }
// Check expiration const now = Math.floor(Date.now() / 1000) if (now > expiresAt) { return { valid: false, expired: true, error: "Signature expired", } }
// Get signing key const apiKey = await this.registry.getApiKeyById(keyId) if (!apiKey || apiKey.status !== "active") { return { valid: false, error: "Invalid key ID", } }
// Verify tenant/org ownership if (apiKey.organizationId !== orgId || apiKey.tenantId !== tenantId) { return { valid: false, error: "Key does not match tenant", } }
// Reconstruct canonical string url.searchParams.delete("sig") url.searchParams.delete("exp") url.searchParams.delete("kid")
const canonicalString = this.createCanonicalString(url.pathname, expiresAt, url.hostname, tenantId)
// Compute expected signature const expectedSignature = crypto.createHmac("sha256", apiKey.secret).update(canonicalString).digest("base64url")
// Constant-time comparison to prevent timing attacks const valid = crypto.timingSafeEqual(Buffer.from(signature), Buffer.from(expectedSignature))
return { valid, error: valid ? null : "Invalid signature", } }
/** * Create canonical string for signing */ createCanonicalString(pathname, expiresAt, hostname, tenantId) { return ["GET", pathname, expiresAt, hostname, tenantId].join("\n") }
/** * Rotate signing keys */ async rotateSigningKey(orgId, tenantId) { // Generate new secret const newSecret = crypto.randomBytes(32).toString("hex") const newKeyId = `key_${Date.now()}_${crypto.randomBytes(8).toString("hex")}`
// Create new key const newKey = await this.registry.createApiKey({ organizationId: orgId, tenantId, keyId: newKeyId, name: `Signing Key (rotated ${new Date().toISOString()})`, secret: newSecret, scopes: ["signing"], })
// Mark old keys for deprecation (keep valid for grace period) await this.registry.deprecateOldSigningKeys(orgId, tenantId, newKey.id)
return newKey }}
export default SignatureServiceAuthentication Middleware
import crypto from "crypto"
/** * Authentication middleware for Fastify */class AuthMiddleware { constructor(registry) { this.registry = registry }
/** * API Key authentication */ async authenticateApiKey(request, reply) { const apiKey = request.headers["x-api-key"]
if (!apiKey) { return reply.code(401).send({ error: "Unauthorized", message: "API key required", }) }
// Hash the API key const keyHash = crypto.createHash("sha256").update(apiKey).digest("hex")
// Look up in database const keyRecord = await this.registry.getApiKeyByHash(keyHash)
if (!keyRecord) { return reply.code(401).send({ error: "Unauthorized", message: "Invalid API key", }) }
// Check status and expiration if (keyRecord.status !== "active") { return reply.code(401).send({ error: "Unauthorized", message: "API key is inactive", }) }
if (keyRecord.expiresAt && new Date(keyRecord.expiresAt) < new Date()) { return reply.code(401).send({ error: "Unauthorized", message: "API key has expired", }) }
// Update last used timestamp (async, don't wait) this.registry.updateApiKeyLastUsed(keyRecord.id).catch(console.error)
// Attach to request context request.auth = { organizationId: keyRecord.organizationId, tenantId: keyRecord.tenantId, scopes: keyRecord.scopes, keyId: keyRecord.id, } }
/** * Scope-based authorization */ requireScope(scope) { return async (request, reply) => { if (!request.auth) { return reply.code(401).send({ error: "Unauthorized", message: "Authentication required", }) }
if (!request.auth.scopes.includes(scope)) { return reply.code(403).send({ error: "Forbidden", message: `Required scope: ${scope}`, }) } } }
/** * Tenant boundary check */ async checkTenantAccess(request, reply, orgId, tenantId, spaceId) { if (!request.auth) { return reply.code(401).send({ error: "Unauthorized", }) }
// Check organization match if (request.auth.organizationId !== orgId) { return reply.code(403).send({ error: "Forbidden", message: "Access denied to this organization", }) }
// Check tenant match (if key is tenant-scoped) if (request.auth.tenantId && request.auth.tenantId !== tenantId) { return reply.code(403).send({ error: "Forbidden", message: "Access denied to this tenant", }) }
return true }}
export default AuthMiddlewareRate Limiting
import Redis from "ioredis"
/** * Rate limiter using sliding window algorithm */class RateLimiter { constructor(redis) { this.redis = redis }
/** * Check and enforce rate limit */ async checkLimit(identifier, limit, windowSeconds) { const key = `ratelimit:${identifier}` const now = Date.now() const windowStart = now - windowSeconds * 1000
// Use Redis pipeline for atomicity const pipeline = this.redis.pipeline()
// Remove old entries outside the window pipeline.zremrangebyscore(key, "-inf", windowStart)
// Count requests in current window pipeline.zcard(key)
// Add current request const requestId = `${now}:${Math.random()}` pipeline.zadd(key, now, requestId)
// Set expiry on key pipeline.expire(key, windowSeconds)
const results = await pipeline.exec() const count = results[1][1] // Result of ZCARD
const allowed = count < limit const remaining = Math.max(0, limit - count - 1)
// Calculate reset time const oldestEntry = await this.redis.zrange(key, 0, 0, "WITHSCORES") const resetAt = oldestEntry.length > 0 ? new Date(parseInt(oldestEntry[1]) + windowSeconds * 1000) : new Date(now + windowSeconds * 1000)
return { allowed, limit, remaining, resetAt, } }
/** * Rate limiting middleware for Fastify */ middleware(getLimitConfig) { return async (request, reply) => { // Get limit configuration based on request context const { identifier, limit, window } = getLimitConfig(request)
const result = await this.checkLimit(identifier, limit, window)
// Set rate limit headers reply.header("X-RateLimit-Limit", result.limit) reply.header("X-RateLimit-Remaining", result.remaining) reply.header("X-RateLimit-Reset", result.resetAt.toISOString())
if (!result.allowed) { return reply.code(429).send({ error: "Too Many Requests", message: `Rate limit exceeded. Try again after ${result.resetAt.toISOString()}`, retryAfter: Math.ceil((result.resetAt.getTime() - Date.now()) / 1000), }) } } }}
// Usage exampleconst redis = new Redis()const rateLimiter = new RateLimiter(redis)
// Apply to routeapp.get( "/v1/pub/*", { preHandler: rateLimiter.middleware((request) => ({ identifier: `org:${request.params.org}`, limit: 1000, // requests window: 60, // seconds })), }, handler,)
export default RateLimiterDeployment Architecture
Kubernetes Deployment
Storage Abstraction Layer
/** * Abstract storage interface */class StorageAdapter { async put(key, buffer, contentType, metadata = {}) { throw new Error("Not implemented") }
async get(key) { throw new Error("Not implemented") }
async delete(key) { throw new Error("Not implemented") }
async exists(key) { throw new Error("Not implemented") }
async getSignedUrl(key, ttl) { throw new Error("Not implemented") }
get provider() { throw new Error("Not implemented") }}
/** * AWS S3 Implementation */import { S3Client, PutObjectCommand, GetObjectCommand, DeleteObjectCommand, HeadObjectCommand,} from "@aws-sdk/client-s3"import { getSignedUrl } from "@aws-sdk/s3-request-presigner"
class S3StorageAdapter extends StorageAdapter { constructor(config) { super() this.client = new S3Client({ region: config.region, credentials: config.credentials, }) this.bucket = config.bucket }
async put(key, buffer, contentType, metadata = {}) { const command = new PutObjectCommand({ Bucket: this.bucket, Key: key, Body: buffer, ContentType: contentType, Metadata: metadata, ServerSideEncryption: "AES256", })
await this.client.send(command) }
async get(key) { const command = new GetObjectCommand({ Bucket: this.bucket, Key: key, })
const response = await this.client.send(command) const chunks = []
for await (const chunk of response.Body) { chunks.push(chunk) }
return Buffer.concat(chunks) }
async delete(key) { const command = new DeleteObjectCommand({ Bucket: this.bucket, Key: key, })
await this.client.send(command) }
async exists(key) { try { const command = new HeadObjectCommand({ Bucket: this.bucket, Key: key, })
await this.client.send(command) return true } catch (error) { if (error.name === "NotFound") { return false } throw error } }
async getSignedUrl(key, ttl = 3600) { const command = new GetObjectCommand({ Bucket: this.bucket, Key: key, })
return await getSignedUrl(this.client, command, { expiresIn: ttl }) }
get provider() { return "aws" }}
/** * Google Cloud Storage Implementation */import { Storage } from "@google-cloud/storage"
class GCSStorageAdapter extends StorageAdapter { constructor(config) { super() this.storage = new Storage({ projectId: config.projectId, credentials: config.credentials, }) this.bucket = this.storage.bucket(config.bucket) }
async put(key, buffer, contentType, metadata = {}) { const file = this.bucket.file(key) await file.save(buffer, { contentType, metadata, resumable: false, }) }
async get(key) { const file = this.bucket.file(key) const [contents] = await file.download() return contents }
async delete(key) { const file = this.bucket.file(key) await file.delete() }
async exists(key) { const file = this.bucket.file(key) const [exists] = await file.exists() return exists }
async getSignedUrl(key, ttl = 3600) { const file = this.bucket.file(key) const [url] = await file.getSignedUrl({ action: "read", expires: Date.now() + ttl * 1000, }) return url }
get provider() { return "gcp" }}
/** * Azure Blob Storage Implementation */import { BlobServiceClient } from "@azure/storage-blob"
class AzureBlobStorageAdapter extends StorageAdapter { constructor(config) { super() this.blobServiceClient = BlobServiceClient.fromConnectionString(config.connectionString) this.containerClient = this.blobServiceClient.getContainerClient(config.containerName) }
async put(key, buffer, contentType, metadata = {}) { const blockBlobClient = this.containerClient.getBlockBlobClient(key) await blockBlobClient.upload(buffer, buffer.length, { blobHTTPHeaders: { blobContentType: contentType }, metadata, }) }
async get(key) { const blobClient = this.containerClient.getBlobClient(key) const downloadResponse = await blobClient.download()
return await this.streamToBuffer(downloadResponse.readableStreamBody) }
async delete(key) { const blobClient = this.containerClient.getBlobClient(key) await blobClient.delete() }
async exists(key) { const blobClient = this.containerClient.getBlobClient(key) return await blobClient.exists() }
async getSignedUrl(key, ttl = 3600) { const blobClient = this.containerClient.getBlobClient(key) const expiresOn = new Date(Date.now() + ttl * 1000)
return await blobClient.generateSasUrl({ permissions: "r", expiresOn, }) }
async streamToBuffer(readableStream) { return new Promise((resolve, reject) => { const chunks = [] readableStream.on("data", (chunk) => chunks.push(chunk)) readableStream.on("end", () => resolve(Buffer.concat(chunks))) readableStream.on("error", reject) }) }
get provider() { return "azure" }}
/** * MinIO Implementation (S3-compatible for on-premise) */import * as Minio from "minio"
class MinIOStorageAdapter extends StorageAdapter { constructor(config) { super() this.client = new Minio.Client({ endPoint: config.endPoint, port: config.port || 9000, useSSL: config.useSSL !== false, accessKey: config.accessKey, secretKey: config.secretKey, }) this.bucket = config.bucket }
async put(key, buffer, contentType, metadata = {}) { await this.client.putObject(this.bucket, key, buffer, buffer.length, { "Content-Type": contentType, ...metadata, }) }
async get(key) { const stream = await this.client.getObject(this.bucket, key)
return new Promise((resolve, reject) => { const chunks = [] stream.on("data", (chunk) => chunks.push(chunk)) stream.on("end", () => resolve(Buffer.concat(chunks))) stream.on("error", reject) }) }
async delete(key) { await this.client.removeObject(this.bucket, key) }
async exists(key) { try { await this.client.statObject(this.bucket, key) return true } catch (error) { if (error.code === "NotFound") { return false } throw error } }
async getSignedUrl(key, ttl = 3600) { return await this.client.presignedGetObject(this.bucket, key, ttl) }
get provider() { return "minio" }}
/** * Storage Factory */class StorageFactory { static create(config) { switch (config.provider) { case "aws": case "s3": return new S3StorageAdapter(config)
case "gcp": case "gcs": return new GCSStorageAdapter(config)
case "azure": return new AzureBlobStorageAdapter(config)
case "minio": case "onprem": return new MinIOStorageAdapter(config)
default: throw new Error(`Unsupported storage provider: ${config.provider}`) } }}
export { StorageAdapter, StorageFactory }Deployment Configuration
# docker-compose.yml for local developmentversion: "3.8"
services: # API Gateway gateway: build: ./services/gateway ports: - "3000:3000" environment: NODE_ENV: development DATABASE_URL: postgresql://postgres:password@postgres:5432/imageservice REDIS_URL: redis://redis:6379 STORAGE_PROVIDER: minio MINIO_ENDPOINT: minio MINIO_ACCESS_KEY: minioadmin MINIO_SECRET_KEY: minioadmin depends_on: - postgres - redis - minio
# Transform Engine transform: build: ./services/transform deploy: replicas: 3 environment: DATABASE_URL: postgresql://postgres:password@postgres:5432/imageservice REDIS_URL: redis://redis:6379 STORAGE_PROVIDER: minio MINIO_ENDPOINT: minio MINIO_ACCESS_KEY: minioadmin MINIO_SECRET_KEY: minioadmin depends_on: - postgres - redis - minio
# Transform Workers worker: build: ./services/worker deploy: replicas: 3 environment: DATABASE_URL: postgresql://postgres:password@postgres:5432/imageservice RABBITMQ_URL: amqp://rabbitmq:5672 STORAGE_PROVIDER: minio MINIO_ENDPOINT: minio MINIO_ACCESS_KEY: minioadmin MINIO_SECRET_KEY: minioadmin depends_on: - postgres - rabbitmq - minio
# PostgreSQL postgres: image: postgres:15-alpine environment: POSTGRES_DB: imageservice POSTGRES_USER: postgres POSTGRES_PASSWORD: password volumes: - postgres-data:/var/lib/postgresql/data ports: - "5432:5432"
# Redis redis: image: redis:7-alpine command: redis-server --appendonly yes volumes: - redis-data:/data ports: - "6379:6379"
# RabbitMQ rabbitmq: image: rabbitmq:3-management-alpine environment: RABBITMQ_DEFAULT_USER: admin RABBITMQ_DEFAULT_PASS: password ports: - "5672:5672" - "15672:15672" volumes: - rabbitmq-data:/var/lib/rabbitmq
# MinIO (S3-compatible storage) minio: image: minio/minio:latest command: server /data --console-address ":9001" environment: MINIO_ROOT_USER: minioadmin MINIO_ROOT_PASSWORD: minioadmin ports: - "9000:9000" - "9001:9001" volumes: - minio-data:/data
volumes: postgres-data: redis-data: rabbitmq-data: minio-data:Cost Optimization
Multi-Layer Caching Strategy
Storage Lifecycle Management
/** * Storage lifecycle manager */class LifecycleManager { constructor(registry, storage) { this.registry = registry this.storage = storage }
/** * Move derived assets to cold tier based on access patterns */ async moveToColdTier() { const coldThresholdDays = 30 const warmThresholdDays = 7
// Find candidates for tiering const candidates = await this.registry.query(` SELECT id, storage_key, cache_tier, last_accessed_at, size_bytes FROM derived_assets WHERE cache_tier = 'hot' AND last_accessed_at < NOW() - INTERVAL '${coldThresholdDays} days' AND deleted_at IS NULL ORDER BY last_accessed_at ASC LIMIT 1000 `)
for (const asset of candidates.rows) { try { // Move to cold storage tier (Glacier Instant Retrieval, Coldline, etc.) await this.storage.moveToTier(asset.storageKey, "cold")
// Update database await this.registry.updateCacheTier(asset.id, "cold")
console.log(`Moved asset ${asset.id} to cold tier`) } catch (error) { console.error(`Failed to move asset ${asset.id}:`, error) } }
// Similar logic for warm tier const warmCandidates = await this.registry.query(` SELECT id, storage_key, cache_tier FROM derived_assets WHERE cache_tier = 'hot' AND last_accessed_at < NOW() - INTERVAL '${warmThresholdDays} days' AND last_accessed_at >= NOW() - INTERVAL '${coldThresholdDays} days' LIMIT 1000 `)
for (const asset of warmCandidates.rows) { await this.storage.moveToTier(asset.storageKey, "warm") await this.registry.updateCacheTier(asset.id, "warm") } }
/** * Delete unused derived assets */ async pruneUnused() { const pruneThresholdDays = 90
const unused = await this.registry.query(` SELECT id, storage_key FROM derived_assets WHERE access_count = 0 AND created_at < NOW() - INTERVAL '${pruneThresholdDays} days' LIMIT 1000 `)
for (const asset of unused.rows) { try { await this.storage.delete(asset.storageKey) await this.registry.deleteDerivedAsset(asset.id)
console.log(`Pruned unused asset ${asset.id}`) } catch (error) { console.error(`Failed to prune asset ${asset.id}:`, error) } } }}Cost Projection
For a service serving 10 million requests/month:
| Component | Without Optimization | With Optimization | Savings |
|---|---|---|---|
| Processing | 1M transforms × $0.001 | 50K transforms × $0.001 | 95% |
| Storage | 100TB × $0.023 | 100TB × $0.013 (tiered) | 43% |
| Bandwidth | 100TB × $0.09 (origin) | 100TB × $0.02 (CDN) | 78% |
| CDN | — | 100TB × $0.02 | — |
| Total | $12,300/month | $5,400/month | 56% |
Key optimizations:
- 95% CDN hit rate reduces origin bandwidth
- Transform deduplication prevents reprocessing
- Storage tiering moves cold data to cheaper tiers
- Smart caching minimizes processing costs
Monitoring & Operations
Metrics Collection
import prometheus from "prom-client"
/** * Metrics registry */class MetricsRegistry { constructor() { this.register = new prometheus.Registry()
// Default metrics (CPU, memory, etc.) prometheus.collectDefaultMetrics({ register: this.register })
// HTTP metrics this.httpRequestDuration = new prometheus.Histogram({ name: "http_request_duration_seconds", help: "HTTP request duration in seconds", labelNames: ["method", "route", "status"], buckets: [0.01, 0.05, 0.1, 0.5, 1, 2, 5, 10], })
this.httpRequestTotal = new prometheus.Counter({ name: "http_requests_total", help: "Total HTTP requests", labelNames: ["method", "route", "status"], })
// Transform metrics this.transformDuration = new prometheus.Histogram({ name: "transform_duration_seconds", help: "Image transformation duration in seconds", labelNames: ["org", "format", "cached"], buckets: [0.1, 0.2, 0.5, 1, 2, 5, 10], })
this.transformTotal = new prometheus.Counter({ name: "transforms_total", help: "Total image transformations", labelNames: ["org", "format", "cached"], })
this.transformErrors = new prometheus.Counter({ name: "transform_errors_total", help: "Total transformation errors", labelNames: ["org", "error_type"], })
// Cache metrics this.cacheHits = new prometheus.Counter({ name: "cache_hits_total", help: "Total cache hits", labelNames: ["layer"], // cdn, redis, database })
this.cacheMisses = new prometheus.Counter({ name: "cache_misses_total", help: "Total cache misses", labelNames: ["layer"], })
// Storage metrics this.storageOperations = new prometheus.Counter({ name: "storage_operations_total", help: "Total storage operations", labelNames: ["provider", "operation"], // put, get, delete })
this.storageBytesTransferred = new prometheus.Counter({ name: "storage_bytes_transferred_total", help: "Total bytes transferred to/from storage", labelNames: ["provider", "direction"], // upload, download })
// Business metrics this.assetsUploaded = new prometheus.Counter({ name: "assets_uploaded_total", help: "Total assets uploaded", labelNames: ["org", "format"], })
this.bandwidthServed = new prometheus.Counter({ name: "bandwidth_served_bytes_total", help: "Total bandwidth served", labelNames: ["org", "space"], })
// Register all metrics this.register.registerMetric(this.httpRequestDuration) this.register.registerMetric(this.httpRequestTotal) this.register.registerMetric(this.transformDuration) this.register.registerMetric(this.transformTotal) this.register.registerMetric(this.transformErrors) this.register.registerMetric(this.cacheHits) this.register.registerMetric(this.cacheMisses) this.register.registerMetric(this.storageOperations) this.register.registerMetric(this.storageBytesTransferred) this.register.registerMetric(this.assetsUploaded) this.register.registerMetric(this.bandwidthServed) }
/** * Get metrics in Prometheus format */ async getMetrics() { return await this.register.metrics() }}
// Singleton instanceconst metricsRegistry = new MetricsRegistry()
export default metricsRegistryAlerting Configuration
groups: - name: image_service_alerts interval: 30s rules: # High error rate - alert: HighErrorRate expr: | ( sum(rate(http_requests_total{status=~"5.."}[5m])) by (service) / sum(rate(http_requests_total[5m])) by (service) ) > 0.05 for: 5m labels: severity: critical annotations: summary: "High error rate on {{ $labels.service }}" description: "Error rate is {{ $value | humanizePercentage }}"
# Low cache hit rate - alert: LowCacheHitRate expr: | ( sum(rate(cache_hits_total{layer="redis"}[10m])) / (sum(rate(cache_hits_total{layer="redis"}[10m])) + sum(rate(cache_misses_total{layer="redis"}[10m]))) ) < 0.70 for: 15m labels: severity: warning annotations: summary: "Low cache hit rate" description: "Cache hit rate is {{ $value | humanizePercentage }}, expected > 70%"
# Slow transformations - alert: SlowTransformations expr: | histogram_quantile(0.95, sum(rate(transform_duration_seconds_bucket[5m])) by (le) ) > 2 for: 10m labels: severity: warning annotations: summary: "Slow image transformations" description: "P95 transform time is {{ $value }}s, expected < 2s"
# Queue backup - alert: QueueBacklog expr: rabbitmq_queue_messages{queue="transforms"} > 1000 for: 10m labels: severity: warning annotations: summary: "Transform queue has backlog" description: "Queue depth is {{ $value }}, workers may be overwhelmed"
# Storage quota warning - alert: StorageQuotaWarning expr: | ( sum(storage_bytes_used) by (organization_id) / sum(storage_bytes_quota) by (organization_id) ) > 0.80 for: 1h labels: severity: warning annotations: summary: "Organization {{ $labels.organization_id }} approaching storage quota" description: "Usage is {{ $value | humanizePercentage }} of quota"Health Checks
/** * Health check service */class HealthCheckService { constructor(dependencies) { this.db = dependencies.db this.redis = dependencies.redis this.storage = dependencies.storage this.queue = dependencies.queue }
/** * Liveness probe - is the service running? */ async liveness() { return { status: "ok", timestamp: new Date().toISOString(), uptime: process.uptime(), } }
/** * Readiness probe - is the service ready to accept traffic? */ async readiness() { const checks = { database: false, redis: false, storage: false, queue: false, }
// Check database try { await this.db.query("SELECT 1") checks.database = true } catch (error) { console.error("Database health check failed:", error) }
// Check Redis try { await this.redis.ping() checks.redis = true } catch (error) { console.error("Redis health check failed:", error) }
// Check storage try { const testKey = ".health-check" const testData = Buffer.from("health") await this.storage.put(testKey, testData, "text/plain") await this.storage.get(testKey) await this.storage.delete(testKey) checks.storage = true } catch (error) { console.error("Storage health check failed:", error) }
// Check queue try { // Implement queue-specific health check checks.queue = true } catch (error) { console.error("Queue health check failed:", error) }
const allHealthy = Object.values(checks).every((v) => v === true)
return { status: allHealthy ? "ready" : "not ready", checks, timestamp: new Date().toISOString(), } }}
export default HealthCheckServiceSummary
This document presents a comprehensive architecture for a multi-tenant, cloud-agnostic image processing platform with the following key characteristics:
Architecture Highlights
- Multi-Tenancy: Three-level hierarchy (Organization → Tenant → Space) with policy inheritance
- Cloud Portability: Storage and queue abstractions enable deployment to AWS, GCP, Azure, or on-premise
- Performance: Guaranteed HTTP 200 responses with < 800ms p95 latency for first transforms
- Security: Cryptographic signed URLs with HMAC-SHA256 and key rotation support
- Cost Optimization: 56% cost reduction through multi-layer caching and storage lifecycle management
- Scalability: Kubernetes-native deployment with horizontal autoscaling
Technology Recommendations
- Image Processing: Sharp (libvips) for performance
- Caching: Redis with Redlock for distributed locking
- Database: PostgreSQL 15+ with JSONB for flexible policies
- Storage: Provider-specific SDKs with unified abstraction
- Framework: Fastify for low-latency HTTP serving
- Orchestration: Kubernetes for cloud-agnostic deployment
Key Design Decisions
- Synchronous transforms for first requests ensure immediate delivery
- Content-addressed storage prevents duplicate processing
- Hierarchical policies enable flexible multi-tenancy
- Edge authentication reduces origin load for private content
- Transform canonicalization maximizes cache hit rates
This architecture provides a production-ready foundation for building a Cloudinary-alternative image service with enterprise-grade performance, security, and cost efficiency.