Design a Notification System

A comprehensive system design for multi-channel notifications covering event ingestion, channel routing, delivery guarantees, user preferences, rate limiting, and failure handling. This design addresses sub-second delivery at Uber/LinkedIn scale (millions of notifications per second) with at-least-once delivery guarantees and user-centric throttling.

Mermaid diagram — High-level architecture: Event producers publish to Kafka, routing layer applies preferences and throttling, channel processors deliver via external providers.

Abstract

Notification systems solve three interconnected problems: reliable delivery (no notification is lost), user respect (throttling, preferences, quiet hours), and channel optimization (right message, right channel, right time).

Core architectural decisions:

Decision	Choice	Rationale
Delivery guarantee	At-least-once + idempotent consumers	Exactly-once impossible in distributed systems; dedup at consumer is simpler
Queue partitioning	By user_id	Co-locates user’s notifications for rate limiting and aggregation
Priority handling	Separate queues per priority	Critical notifications bypass backlog from bulk sends
Channel selection	User preference → fallback chain	Respect user choice, ensure delivery for critical alerts
Rate limiting	Token bucket per user per channel	Prevents notification fatigue, protects external provider limits
Template rendering	At send time	Supports dynamic content, A/B testing, personalization

Key trade-offs accepted:

Increased latency from preference lookups in exchange for user control
Storage overhead for deduplication windows (24-48 hours)
Complexity of multiple channel processors vs. single delivery path
At-least-once means clients must handle duplicates

What this design optimizes:

Sub-500ms delivery for critical notifications
99.99% delivery rate with retry and fallback mechanisms
User-controlled notification experience (frequency, channels, timing)
Horizontal scaling to millions of notifications per second

Requirements

Functional Requirements

Requirement	Priority	Notes
Multi-channel delivery	Core	Push (iOS/Android), Email, SMS, In-app
User preferences	Core	Opt-in/out per notification type and channel
Template management	Core	Dynamic templates with variable substitution
Scheduling	Core	Immediate, scheduled, timezone-aware delivery
Delivery tracking	Core	Sent, delivered, opened, clicked status
Rate limiting	Core	User-level and channel-level throttling
Retry and fallback	Core	Automatic retry with channel fallback
Notification history	Extended	Queryable log for users and support
Batching/aggregation	Extended	Collapse similar notifications (“5 new likes”)
Quiet hours	Extended	Per-user do-not-disturb windows

Non-Functional Requirements

Requirement	Target	Rationale
Availability	99.99% (4 nines)	Notifications are critical for user engagement
Delivery latency (critical)	p99 < 500ms	Time-sensitive alerts (security, transactions)
Delivery latency (normal)	p99 < 5s	Acceptable for social/promotional
Throughput	1M notifications/second peak	Enterprise scale (Uber: 250K/s, LinkedIn: millions/s)
Deduplication window	48 hours	Balance storage vs. duplicate prevention
Delivery rate	> 99.9%	After retries and fallbacks

Scale Estimation

Users:

Monthly Active Users (MAU): 100M
Daily Active Users (DAU): 40M (40% of MAU)
Devices per user: 2 (phone + web)
Push tokens to manage: 200M

Traffic:

Notifications per user per day: 25 (mix of transactional and engagement)
Daily notifications: 40M × 25 = 1B notifications/day
Average per second: 1B / 86400 ≈ 12K notifications/second
Peak multiplier (3x): 36K notifications/second
Burst events (flash sales, breaking news): 100K+ notifications/second

Storage:

Notification record: 500 bytes (metadata, status, timestamps)
Daily storage: 1B × 500B = 500GB/day
90-day retention: 45TB
Deduplication cache: 48-hour window × 1B × 32-byte key = ~64GB

External provider capacity:

FCM: 600K quota tokens/minute ≈ 10K/second sustained
APNs: No published limit, but throttles excessive traffic
Email (SES): 50K/second with warm-up
SMS (Twilio): 100 MPS per short code

Design Paths

Path A: Push-Based (Real-Time First)

Best when:

Sub-second latency is critical
Users expect immediate notifications
Infrastructure can maintain persistent connections
Moderate notification volume per user

Architecture:

Key characteristics:

Persistent connections (WebSocket/SSE) to user devices
Gateway maintains connection-to-user mapping
Direct delivery bypasses external providers for in-app

Trade-offs:

✅ Lowest latency (< 100ms for in-app)
✅ No external provider costs for in-app
✅ Bidirectional communication
❌ Connection management complexity at scale
❌ Still needs push providers for background delivery
❌ Higher infrastructure cost (persistent connections)

Real-world example: Uber’s RAMEN system maintains 1.5M+ concurrent connections, delivering 250K+ messages/second with 99.99% server-side reliability using gRPC bidirectional streaming.

Path B: Queue-Based (Reliability First)

Best when:

Delivery guarantee is paramount
Notification volume is high but latency tolerance is 1-5 seconds
Need strong audit trail
Burst handling is critical

Architecture:

Key characteristics:

All notifications flow through durable message queue
Workers process at their own pace
Built-in retry with exponential backoff
Dead letter queue for failed notifications

Trade-offs:

✅ Guaranteed delivery (no message loss)
✅ Excellent burst handling (queue absorbs spikes)
✅ Strong audit trail (Kafka retention)
❌ Higher latency (queue hop overhead)
❌ Ordering complexity across partitions
❌ Potential for notification storms after recovery

Real-world example: Slack uses Kafka-based infrastructure for notification delivery, achieving 100% trace coverage for debugging delivery issues.

Path C: Hybrid (Tiered by Priority)

Best when:

Mix of time-sensitive and bulk notifications
Need to balance cost, latency, and reliability
Different notification types have different SLAs

Architecture:

Key characteristics:

Priority classification at ingestion
Separate processing paths per priority
Resource allocation matches SLA requirements
Bulk notifications processed during off-peak

Trade-offs:

✅ Optimal latency for critical notifications
✅ Cost-efficient bulk processing
✅ Predictable SLAs per notification type
❌ Multiple code paths to maintain
❌ Priority classification complexity
❌ Risk of priority inversion under load

Real-world example: Netflix’s RENO uses priority-based AWS SQS queues with corresponding compute clusters, delivering personalized notifications with different latency guarantees.

Path Comparison

Factor	Push-Based	Queue-Based	Hybrid
Latency (critical)	< 100ms	500ms-2s	< 100ms
Latency (bulk)	Same	Same	Flexible
Reliability	Good	Excellent	Excellent
Burst handling	Limited	Excellent	Excellent
Infrastructure cost	High	Medium	Medium-High
Complexity	High	Medium	Highest
Production examples	Uber RAMEN	Slack	Netflix RENO

This Article’s Focus

This article focuses on Path C (Hybrid) because:

Reflects production systems at scale (Netflix, LinkedIn)
Demonstrates priority-based trade-off thinking
Handles diverse notification types (security alerts to marketing)
Balances cost, latency, and reliability appropriately

High-Level Design

Component Overview

Notification API

Receives notification requests, validates, enriches, and routes to appropriate queue.

Responsibilities:

Request validation and authentication
Template resolution and rendering
Priority classification
User preference lookup
Queue routing based on priority

Design decisions:

Decision	Choice	Rationale
API style	REST with async response	Fire-and-forget for producers; status via webhook/polling
Idempotency	Client-provided notification_id	Enables safe retries from producers
Batching	Support up to 1000 recipients/request	Reduces API overhead for bulk sends
Template rendering	At ingestion time	Content frozen at send; supports personalization

Template Service

Manages notification templates with variable substitution and multi-language support.

Template structure:

1
interface NotificationTemplate {
2
  templateId: string
3
  name: string
4
  category: "transactional" | "marketing" | "system"
5
  channels: {
6
    push?: {
7
      title: string // "Your order {{orderId}} has shipped"
8
      body: string // "Track your package: {{trackingUrl}}"
9
      data?: Record<string, string>
10
    }
11
    email?: {
12
      subject: string
13
      htmlBody: string
14
      textBody: string
15
    }
16
    sms?: {
17
      body: string // Max 160 chars for single segment
18
    }
19
  }
20
  variables: VariableDefinition[]
21
  defaultLocale: string
22
  translations: Record<string, ChannelContent>
23
}

Design decisions:

Templates stored in PostgreSQL with Redis cache (5-minute TTL)
Variable validation at template creation prevents runtime errors
Version history for rollback support
A/B testing via template variants

Preference Service

Manages user notification preferences with channel-level and type-level granularity.

Preference model:

1
interface UserPreferences {
2
  userId: string
3
  globalEnabled: boolean
4
  quietHours?: {
5
    enabled: boolean
6
    start: string // "22:00"
7
    end: string // "07:00"
8
    timezone: string // "America/New_York"
9
  }
10
  channels: {
11
    push: ChannelPreference
12
    email: ChannelPreference
13
    sms: ChannelPreference
14
    inApp: ChannelPreference
15
  }
16
  categories: {
17
    [category: string]: {
18
      enabled: boolean
19
      channels: string[] // Override global channel prefs
20
      frequency?: "immediate" | "daily_digest" | "weekly_digest"
21
    }
22
  }
23
}
24

25
interface ChannelPreference {
26
  enabled: boolean
27
  frequency?: FrequencyLimit // Max 5/hour, 20/day
28
}

Storage strategy:

Hot path: Redis hash with 1-hour TTL
Canonical: PostgreSQL with audit history
Write-through cache invalidation

Device Registry

Maintains device tokens for push notification delivery.

Token management:

1
interface DeviceToken {
2
  userId: string
3
  deviceId: string
4
  platform: "ios" | "android" | "web"
5
  token: string
6
  tokenType: "apns" | "fcm" | "web_push"
7
  appVersion: string
8
  lastSeen: Date
9
  createdAt: Date
10
  updatedAt: Date
11
  status: "active" | "stale" | "invalid"
12
}

Token lifecycle:

Event	Action
App install	Register new token
App launch	Refresh token if > 7 days old
Token refresh callback	Update token, mark previous invalid
Delivery failure (unregistered)	Mark token invalid immediately
30 days inactive	Mark token stale (lower priority)
270 days inactive (Android)	Token expires automatically

Per Firebase documentation: Monitor droppedDeviceInactive percentage; tokens inactive > 270 days on Android are automatically expired.

Router Service

Core orchestration layer that applies business logic before delivery.

Routing flow:

Deduplication check: Has this (user_id, notification_id) been processed?
Preference check: Is user opted in for this notification type and channel?
Quiet hours check: Is user in do-not-disturb window?
Rate limit check: Has user exceeded frequency limits?
Aggregation check: Should this be batched with similar notifications?
Channel selection: Which channel(s) based on preference and fallback rules?
Dispatch: Send to appropriate channel processor(s)

Channel Processors

Independent processors for each delivery channel with provider-specific logic.

Push Processor:

Manages connection pools to APNs/FCM
Handles token-based authentication (APNs) and service account auth (FCM)
Respects provider rate limits (FCM: 600K tokens/minute)
Processes invalid token responses

Email Processor:

Manages sender reputation and warm-up
Handles bounces (hard/soft) and complaints
Implements one-click unsubscribe (Gmail/Yahoo 2024 requirement)
Tracks open/click events via tracking pixels and redirect URLs

SMS Processor:

Routes to appropriate number type (short code vs. long code)
Handles multi-segment messages (> 160 chars)
Manages opt-out via STOP keyword
Respects carrier rate limits

In-App Processor:

Delivers via WebSocket for connected users
Falls back to polling endpoint for disconnected
Supports notification aggregation (badge counts)
Manages read/unread state

API Design

Send Notification

Endpoint: POST /api/v1/notifications

Request:

1
{
2
  "notificationId": "uuid-client-generated",
3
  "templateId": "order_shipped",
4
  "recipients": [
5
    {
6
      "userId": "user_123",
7
      "variables": {
8
        "orderId": "ORD-456",
9
        "trackingUrl": "https://track.example.com/ORD-456"
10
      }
11
    }
12
  ],
13
  "priority": "high",
14
  "channels": ["push", "email"],
15
  "options": {
16
    "ttl": 86400,
17
    "collapseKey": "order_update_ORD-456",
18
    "scheduledAt": null
19
  }
20
}

Response (202 Accepted):

1
{
2
  "requestId": "req_abc123",
3
  "notificationId": "uuid-client-generated",
4
  "status": "accepted",
5
  "recipientCount": 1,
6
  "estimatedDelivery": "2024-02-03T10:00:05Z"
7
}

Error Responses:

Code	Error	When
400	`INVALID_TEMPLATE`	Template not found or invalid variables
400	`INVALID_RECIPIENT`	User ID not found
409	`DUPLICATE_NOTIFICATION`	notificationId already processed
429	`RATE_LIMITED`	Producer rate limit exceeded

Bulk Send

Endpoint: POST /api/v1/notifications/bulk

Request:

1
{
2
  "notificationId": "bulk_uuid",
3
  "templateId": "weekly_digest",
4
  "recipientQuery": {
5
    "segment": "active_users_7d",
6
    "excludeOptedOut": true
7
  },
8
  "priority": "low",
9
  "channels": ["email"],
10
  "options": {
11
    "spreadOverMinutes": 60,
12
    "respectQuietHours": true
13
  }
14
}

Response (202 Accepted):

1
{
2
  "requestId": "bulk_req_xyz",
3
  "notificationId": "bulk_uuid",
4
  "status": "queued",
5
  "estimatedRecipients": 150000,
6
  "estimatedCompletion": "2024-02-03T11:00:00Z"
7
}

Get Notification Status

Endpoint: GET /api/v1/notifications/{notificationId}/status

Response:

1
{
2
  "notificationId": "uuid",
3
  "status": "delivered",
4
  "recipients": [
5
    {
6
      "userId": "user_123",
7
      "channels": {
8
        "push": {
9
          "status": "delivered",
10
          "deliveredAt": "2024-02-03T10:00:02Z",
11
          "openedAt": "2024-02-03T10:05:00Z"
12
        },
13
        "email": {
14
          "status": "sent",
15
          "sentAt": "2024-02-03T10:00:03Z",
16
          "openedAt": null
17
        }
18
      }
19
    }
20
  ]
21
}

User Preferences

Endpoint: GET /api/v1/users/{userId}/preferences

Response:

1
{
2
  "userId": "user_123",
3
  "globalEnabled": true,
4
  "quietHours": {
5
    "enabled": true,
6
    "start": "22:00",
7
    "end": "07:00",
8
    "timezone": "America/New_York"
9
  },
10
  "channels": {
11
    "push": { "enabled": true },
12
    "email": { "enabled": true, "frequency": { "maxPerDay": 10 } },
13
    "sms": { "enabled": false }
14
  },
15
  "categories": {
16
    "marketing": { "enabled": false },
17
    "order_updates": { "enabled": true, "channels": ["push", "email"] },
18
    "security": { "enabled": true, "channels": ["push", "sms", "email"] }
19
  }
20
}

Update Preferences:

Endpoint: PATCH /api/v1/users/{userId}/preferences

1
{
2
  "categories": {
3
    "marketing": { "enabled": true, "frequency": "weekly_digest" }
4
  }
5
}

Device Registration

Endpoint: POST /api/v1/devices

Request:

1
{
2
  "userId": "user_123",
3
  "deviceId": "device_abc",
4
  "platform": "ios",
5
  "token": "apns_token_xyz",
6
  "appVersion": "3.2.1"
7
}

Response (201 Created):

1
{
2
  "deviceId": "device_abc",
3
  "status": "active",
4
  "registeredAt": "2024-02-03T10:00:00Z"
5
}

Notification History

Endpoint: GET /api/v1/users/{userId}/notifications?limit=50&cursor=xxx

Response:

1
{
2
  "notifications": [
3
    {
4
      "notificationId": "uuid_1",
5
      "templateId": "order_shipped",
6
      "title": "Your order has shipped",
7
      "body": "Track your package...",
8
      "channel": "push",
9
      "status": "read",
10
      "createdAt": "2024-02-03T10:00:00Z",
11
      "readAt": "2024-02-03T10:05:00Z"
12
    }
13
  ],
14
  "nextCursor": "cursor_abc",
15
  "hasMore": true
16
}

Data Modeling

Notification Record (Cassandra)

Table design for time-series notification access:

1
CREATE TABLE notifications (
2
    user_id UUID,
3
    created_at TIMESTAMP,
4
    notification_id UUID,
5
    template_id TEXT,
6
    priority TEXT,
7
    content FROZEN<notification_content>,
8
    channels SET<TEXT>,
9
    status TEXT,
10
    delivery_attempts INT,
11
    PRIMARY KEY ((user_id), created_at, notification_id)
12
) WITH CLUSTERING ORDER BY (created_at DESC, notification_id ASC)
13
  AND default_time_to_live = 7776000; -- 90 days
14

15
CREATE TYPE notification_content (
16
    title TEXT,
17
    body TEXT,
18
    data MAP<TEXT, TEXT>,
19
    image_url TEXT
20
);
21

22
-- For notification lookup by ID
23
CREATE TABLE notifications_by_id (
24
    notification_id UUID PRIMARY KEY,
25
    user_id UUID,
26
    created_at TIMESTAMP,
27
    template_id TEXT,
28
    priority TEXT,
29
    content FROZEN<notification_content>,
30
    channels SET<TEXT>,
31
    status TEXT
32
);

Why Cassandra:

Time-series optimized with partition per user
Automatic TTL-based expiration
High write throughput for delivery status updates
Linear horizontal scaling

Delivery Status Tracking (Cassandra)

1
CREATE TABLE delivery_status (
2
    notification_id UUID,
3
    channel TEXT,
4
    user_id UUID,
5
    device_id TEXT,
6
    status TEXT,        -- queued, sent, delivered, failed, opened, clicked
7
    provider_id TEXT,   -- APNs message ID, SES message ID, etc.
8
    error_code TEXT,
9
    error_message TEXT,
10
    timestamp TIMESTAMP,
11
    PRIMARY KEY ((notification_id), channel, device_id)
12
);
13

14
-- Index for retry processing
15
CREATE TABLE failed_deliveries (
16
    retry_bucket INT,   -- Hour bucket for time-based retry
17
    notification_id UUID,
18
    channel TEXT,
19
    user_id UUID,
20
    attempt_count INT,
21
    last_error TEXT,
22
    next_retry_at TIMESTAMP,
23
    PRIMARY KEY ((retry_bucket), next_retry_at, notification_id)
24
) WITH CLUSTERING ORDER BY (next_retry_at ASC);

User Preferences (PostgreSQL)

1
CREATE TABLE user_preferences (
2
    user_id UUID PRIMARY KEY,
3
    global_enabled BOOLEAN DEFAULT true,
4
    quiet_hours JSONB,  -- {"enabled":true,"start":"22:00","end":"07:00","tz":"America/New_York"}
5
    channel_prefs JSONB,
6
    category_prefs JSONB,
7
    created_at TIMESTAMPTZ DEFAULT NOW(),
8
    updated_at TIMESTAMPTZ DEFAULT NOW()
9
);
10

11
-- Audit history for compliance
12
CREATE TABLE preference_history (
13
    id BIGSERIAL PRIMARY KEY,
14
    user_id UUID NOT NULL,
15
    changed_at TIMESTAMPTZ DEFAULT NOW(),
16
    change_type TEXT,  -- 'opt_in', 'opt_out', 'update'
17
    old_value JSONB,
18
    new_value JSONB,
19
    source TEXT        -- 'user', 'system', 'compliance'
20
);
21

22
CREATE INDEX idx_pref_history_user ON preference_history(user_id, changed_at DESC);

Device Tokens (PostgreSQL + Redis)

1
CREATE TABLE device_tokens (
2
    device_id TEXT PRIMARY KEY,
3
    user_id UUID NOT NULL,
4
    platform TEXT NOT NULL,  -- ios, android, web
5
    token TEXT NOT NULL,
6
    token_type TEXT NOT NULL, -- apns, fcm, web_push
7
    app_version TEXT,
8
    last_seen TIMESTAMPTZ,
9
    status TEXT DEFAULT 'active',  -- active, stale, invalid
10
    created_at TIMESTAMPTZ DEFAULT NOW(),
11
    updated_at TIMESTAMPTZ DEFAULT NOW()
12
);
13

14
CREATE INDEX idx_tokens_user ON device_tokens(user_id);
15
CREATE INDEX idx_tokens_status ON device_tokens(status) WHERE status = 'active';

Redis cache structure:

1
# User's active tokens (set)
2
SADD user:tokens:{user_id} {device_id_1} {device_id_2}
3

4
# Token details (hash)
5
HSET token:{device_id}
6
    user_id "user_123"
7
    platform "ios"
8
    token "apns_xyz"
9
    token_type "apns"
10
    status "active"
11

12
# Token lookup (string with TTL for stale detection)
13
SETEX token:active:{device_id} 2592000 "1"  -- 30 days

Templates (PostgreSQL)

1
CREATE TABLE notification_templates (
2
    template_id TEXT PRIMARY KEY,
3
    name TEXT NOT NULL,
4
    category TEXT NOT NULL,
5
    channels JSONB NOT NULL,
6
    variables JSONB,
7
    default_locale TEXT DEFAULT 'en',
8
    is_active BOOLEAN DEFAULT true,
9
    created_at TIMESTAMPTZ DEFAULT NOW(),
10
    updated_at TIMESTAMPTZ DEFAULT NOW(),
11
    version INT DEFAULT 1
12
);
13

14
CREATE TABLE template_translations (
15
    template_id TEXT REFERENCES notification_templates(template_id),
16
    locale TEXT,
17
    channels JSONB NOT NULL,
18
    PRIMARY KEY (template_id, locale)
19
);
20

21
CREATE TABLE template_versions (
22
    template_id TEXT,
23
    version INT,
24
    channels JSONB NOT NULL,
25
    created_at TIMESTAMPTZ DEFAULT NOW(),
26
    created_by TEXT,
27
    PRIMARY KEY (template_id, version)
28
);

Database Selection Matrix

Data Type	Store	Rationale
Notifications	Cassandra	Time-series, high write volume, TTL
Delivery status	Cassandra	High write volume, time-based queries
User preferences	PostgreSQL + Redis	ACID for changes, cached for reads
Device tokens	PostgreSQL + Redis	Relational queries, cached for delivery
Templates	PostgreSQL	Low volume, version history needed
Deduplication	Redis	TTL-based, fast lookups
Rate limits	Redis	Atomic counters, sliding windows
Analytics	ClickHouse	Columnar, aggregations at scale

Low-Level Design

Deduplication Service

Purpose: Prevent duplicate notification delivery within 48-hour window.


10 collapsed lines
1
class DeduplicationService {
2
  private readonly redis: RedisCluster
3
  private readonly DEDUP_TTL = 172800 // 48 hours in seconds
4

5
  async isDuplicate(userId: string, notificationId: string): Promise<boolean> {
6
    const key = `dedup:${userId}:${notificationId}`
7

8
    // SETNX returns 1 if key was set (not duplicate), 0 if exists (duplicate)
9
    const result = await this.redis.set(key, "1", {
10
      NX: true,
11
      EX: this.DEDUP_TTL,
12
    })
13

14
    return result === null // null means key existed (duplicate)
15
  }
16

17
  // Bloom filter for fast "definitely not duplicate" check
18
  async checkBloomFilter(userId: string, notificationId: string): Promise<boolean> {
19
    const key = `bloom:dedup:${userId}`
20
    return await this.redis.bf.exists(key, notificationId)
21
  }
22
}

Design rationale: Twilio Segment’s deduplication handles 60 billion keys across 1.5TB storage. Using Bloom filters for fast rejection and Redis SETNX for authoritative check balances memory and accuracy.

Rate Limiter

Token bucket implementation for user-level throttling:


12 collapsed lines
1
interface RateLimitConfig {
2
  channel: string
3
  maxPerHour: number
4
  maxPerDay: number
5
}
6

7
class RateLimiter {
8
  private readonly redis: RedisCluster
9

10
  async checkAndConsume(
11
    userId: string,
12
    channel: string,
13
    config: RateLimitConfig,
14
  ): Promise<{ allowed: boolean; retryAfter?: number }> {
15
    const hourKey = `ratelimit:${userId}:${channel}:hour:${this.getCurrentHour()}`
16
    const dayKey = `ratelimit:${userId}:${channel}:day:${this.getCurrentDay()}`
17

18
    // Lua script for atomic check-and-increment
19
    const result = await this.redis.eval(
20
      `
21
      local hourCount = redis.call('INCR', KEYS[1])
22
      if hourCount == 1 then
23
        redis.call('EXPIRE', KEYS[1], 3600)
24
      end
25

26
      local dayCount = redis.call('INCR', KEYS[2])
27
      if dayCount == 1 then
28
        redis.call('EXPIRE', KEYS[2], 86400)
29
      end
30

31
      if hourCount > tonumber(ARGV[1]) then
32
        redis.call('DECR', KEYS[1])
33
        return {0, 3600 - redis.call('TTL', KEYS[1])}
34
      end
35

36
      if dayCount > tonumber(ARGV[2]) then
37
        redis.call('DECR', KEYS[1])
38
        redis.call('DECR', KEYS[2])
39
        return {0, 86400 - redis.call('TTL', KEYS[2])}
40
      end
41

42
      return {1, 0}
43
    `,
44
      [hourKey, dayKey],
45
      [config.maxPerHour, config.maxPerDay],
46
    )
47

48
    return {
49
      allowed: result[0] === 1,
50
      retryAfter: result[1] > 0 ? result[1] : undefined,
51
    }
52
  }
53
}

Channel-specific limits (per FCM documentation):

Channel	Limit	Enforcement
FCM	600K tokens/minute	Token bucket with backoff on 429
APNs	No published limit	Monitor for throttling responses
Email (SES)	50K/second (warm domain)	Gradual ramp-up required
SMS (Twilio)	100 MPS/short code	Queue with rate-limited consumer

Notification Aggregator

Collapses similar notifications into digest:


15 collapsed lines
1
interface AggregationRule {
2
  category: string
3
  collapseKey: string // Template for grouping, e.g., "likes_{postId}"
4
  windowSeconds: number // Aggregation window
5
  minCount: number // Minimum to trigger aggregation
6
  maxCount: number // Maximum before force-flush
7
  digestTemplate: string // "{{count}} people liked your post"
8
}
9

10
class NotificationAggregator {
11
  private readonly redis: RedisCluster
12

13
  async shouldAggregate(
14
    userId: string,
15
    notification: Notification,
16
    rule: AggregationRule,
17
  ): Promise<{ aggregate: boolean; pending: Notification[] }> {
18
    const collapseKey = this.renderCollapseKey(rule.collapseKey, notification)
19
    const bufferKey = `agg:${userId}:${collapseKey}`
20

21
    // Add to buffer
22
    await this.redis.rpush(bufferKey, JSON.stringify(notification))
23
    await this.redis.expire(bufferKey, rule.windowSeconds)
24

25
    const count = await this.redis.llen(bufferKey)
26

27
    if (count >= rule.maxCount) {
28
      // Force flush
29
      const pending = await this.flushBuffer(bufferKey)
30
      return { aggregate: true, pending }
31
    }
32

33
    if (count >= rule.minCount) {
34
      // Schedule aggregated delivery at window end
35
      await this.scheduleFlush(userId, collapseKey, rule.windowSeconds)
36
    }
37

38
    return { aggregate: false, pending: [] }
39
  }
40

41
  async createDigest(notifications: Notification[], rule: AggregationRule): Promise<Notification> {
42
    const count = notifications.length
43
    const actors = [...new Set(notifications.map((n) => n.actorId))].slice(0, 3)
44

45
    return {
46
      ...notifications[0],
47
      content: {
48
        title: this.renderTemplate(rule.digestTemplate, { count, actors }),
49
        body: `${actors[0]} and ${count - 1} others`,
50
      },
51
      metadata: {
52
        aggregatedCount: count,
53
        originalIds: notifications.map((n) => n.notificationId),
54
      },
55
    }
56
  }
57
}

Aggregation patterns:

Notification Type	Collapse Key	Window	Digest Format
Post likes	`likes_{postId}`	5 min	”John and 5 others liked your post”
New followers	`followers_{userId}`	1 hour	”6 new followers today”
Comment replies	`replies_{commentId}`	10 min	”3 new replies to your comment”

Priority Router


12 collapsed lines
1
enum NotificationPriority {
2
  CRITICAL = "critical", // Security alerts, transaction confirmations
3
  HIGH = "high", // Direct messages, mentions
4
  NORMAL = "normal", // Social notifications, updates
5
  LOW = "low", // Marketing, digests
6
}
7

8
class PriorityRouter {
9
  private readonly queues: Map<NotificationPriority, KafkaProducer>
10

11
  async route(notification: EnrichedNotification): Promise<void> {
12
    const priority = this.determinePriority(notification)
13
    const queue = this.queues.get(priority)
14

15
    // Partition by user_id for rate limiting co-location
16
    await queue.send({
17
      topic: `notifications.${priority}`,
18
      messages: [
19
        {
20
          key: notification.userId,
21
          value: JSON.stringify(notification),
22
          headers: {
23
            "notification-id": notification.notificationId,
24
            "created-at": Date.now().toString(),
25
          },
26
        },
27
      ],
28
    })
29
  }
30

31
  private determinePriority(notification: EnrichedNotification): NotificationPriority {
32
    // Critical: security, transactions, time-sensitive
33
    if (notification.category === "security") return NotificationPriority.CRITICAL
34
    if (notification.category === "transaction") return NotificationPriority.CRITICAL
35

36
    // High: direct user interaction
37
    if (notification.category === "message") return NotificationPriority.HIGH
38
    if (notification.category === "mention") return NotificationPriority.HIGH
39

40
    // Low: bulk, marketing
41
    if (notification.category === "marketing") return NotificationPriority.LOW
42
    if (notification.category === "digest") return NotificationPriority.LOW
43

44
    return NotificationPriority.NORMAL
45
  }
46
}

Queue configuration:

Priority	Partitions	Consumer Parallelism	Max Latency
Critical	50	50 workers	500ms
High	100	100 workers	2s
Normal	200	200 workers	10s
Low	50	50 workers (off-peak)	Best effort

Push Delivery with Retry


20 collapsed lines
1
interface PushDeliveryResult {
2
  success: boolean
3
  messageId?: string
4
  errorCode?: string
5
  shouldRetry: boolean
6
  invalidToken: boolean
7
}
8

9
class PushProcessor {
10
  private readonly fcm: FirebaseMessaging
11
  private readonly apns: ApnsClient
12
  private readonly deviceRegistry: DeviceRegistry
13

14
  async deliver(notification: Notification, device: DeviceToken): Promise<PushDeliveryResult> {
15
    try {
16
      if (device.tokenType === "fcm") {
17
        return await this.deliverFcm(notification, device)
18
      } else if (device.tokenType === "apns") {
19
        return await this.deliverApns(notification, device)
20
      }
21
    } catch (error) {
22
      return this.handleError(error, device)
23
    }
24
  }
25

26
  private async deliverFcm(notification: Notification, device: DeviceToken): Promise<PushDeliveryResult> {
27
    const message = {
28
      token: device.token,
29
      notification: {
30
        title: notification.content.title,
31
        body: notification.content.body,
32
      },
33
      data: notification.content.data,
34
      android: {
35
        priority: notification.priority === "critical" ? "high" : "normal",
36
        ttl: notification.ttl * 1000,
37
        collapseKey: notification.collapseKey,
38
      },
39
    }
40

41
    const response = await this.fcm.send(message)
42
    return { success: true, messageId: response, shouldRetry: false, invalidToken: false }
43
  }
44

45
  private handleError(error: any, device: DeviceToken): PushDeliveryResult {
46
    // FCM error codes per documentation
47
    const errorCode = error.code
48

49
    // Invalid token - remove immediately
50
    if (["messaging/invalid-registration-token", "messaging/registration-token-not-registered"].includes(errorCode)) {
51
      this.deviceRegistry.markInvalid(device.deviceId)
52
      return { success: false, errorCode, shouldRetry: false, invalidToken: true }
53
    }
54

55
    // Rate limited - retry with backoff
56
    if (errorCode === "messaging/too-many-requests") {
57
      return { success: false, errorCode, shouldRetry: true, invalidToken: false }
58
    }
59

60
    // Server error - retry with backoff
61
    if (errorCode === "messaging/internal-error") {
62
      return { success: false, errorCode, shouldRetry: true, invalidToken: false }
63
    }
64

65
    return { success: false, errorCode, shouldRetry: false, invalidToken: false }
66
  }
67
}

Retry Service with Exponential Backoff


15 collapsed lines
1
interface RetryConfig {
2
  maxAttempts: number
3
  baseDelayMs: number
4
  maxDelayMs: number
5
  jitterFactor: number
6
}
7

8
class RetryService {
9
  private readonly defaultConfig: RetryConfig = {
10
    maxAttempts: 5,
11
    baseDelayMs: 1000,
12
    maxDelayMs: 300000, // 5 minutes
13
    jitterFactor: 0.2,
14
  }
15

16
  async scheduleRetry(
17
    notification: Notification,
18
    channel: string,
19
    attemptCount: number,
20
    config: RetryConfig = this.defaultConfig,
21
  ): Promise<void> {
22
    if (attemptCount >= config.maxAttempts) {
23
      await this.moveToDlq(notification, channel)
24
      return
25
    }
26

27
    const delay = this.calculateDelay(attemptCount, config)
28
    const retryBucket = Math.floor((Date.now() + delay) / 3600000) // Hour bucket
29

30
    await this.cassandra.execute(
31
      `
32
      INSERT INTO failed_deliveries (
33
        retry_bucket, notification_id, channel, user_id,
34
        attempt_count, last_error, next_retry_at
35
      ) VALUES (?, ?, ?, ?, ?, ?, ?)
36
    `,
37
      [
38
        retryBucket,
39
        notification.notificationId,
40
        channel,
41
        notification.userId,
42
        attemptCount + 1,
43
        notification.lastError,
44
        new Date(Date.now() + delay),
45
      ],
46
    )
47
  }
48

49
  private calculateDelay(attempt: number, config: RetryConfig): number {
50
    // Exponential backoff with jitter
51
    const exponentialDelay = config.baseDelayMs * Math.pow(2, attempt)
52
    const cappedDelay = Math.min(exponentialDelay, config.maxDelayMs)
53
    const jitter = cappedDelay * config.jitterFactor * Math.random()
54

55
    return Math.floor(cappedDelay + jitter)
56
  }
57

58
  private async moveToDlq(notification: Notification, channel: string): Promise<void> {
59
    await this.kafka.send({
60
      topic: "notifications.dlq",
61
      messages: [
62
        {
63
          key: notification.userId,
64
          value: JSON.stringify({
65
            notification,
66
            channel,
67
            reason: "max_retries_exceeded",
68
            timestamp: Date.now(),
69
          }),
70
        },
71
      ],
72
    })
73

74
    // Alert for monitoring
75
    this.metrics.increment("notifications.dlq.count", {
76
      channel,
77
      category: notification.category,
78
    })
79
  }
80
}

Quiet Hours Handler


10 collapsed lines
1
class QuietHoursHandler {
2
  async shouldDefer(
3
    userId: string,
4
    notification: Notification,
5
    preferences: UserPreferences,
6
  ): Promise<{ defer: boolean; deliverAt?: Date }> {
7
    // Critical notifications bypass quiet hours
8
    if (notification.priority === "critical") {
9
      return { defer: false }
10
    }
11

12
    if (!preferences.quietHours?.enabled) {
13
      return { defer: false }
14
    }
15

16
    const userNow = this.getUserLocalTime(preferences.quietHours.timezone)
17
    const isInQuietHours = this.isTimeInRange(userNow, preferences.quietHours.start, preferences.quietHours.end)
18

19
    if (!isInQuietHours) {
20
      return { defer: false }
21
    }
22

23
    // Calculate when quiet hours end
24
    const deliverAt = this.getQuietHoursEnd(preferences.quietHours.end, preferences.quietHours.timezone)
25

26
    return { defer: true, deliverAt }
27
  }
28

29
  private isTimeInRange(current: Date, start: string, end: string): boolean {
30
    const currentMinutes = current.getHours() * 60 + current.getMinutes()
31
    const [startHour, startMin] = start.split(":").map(Number)
32
    const [endHour, endMin] = end.split(":").map(Number)
33

34
    const startMinutes = startHour * 60 + startMin
35
    const endMinutes = endHour * 60 + endMin
36

37
    // Handle overnight ranges (e.g., 22:00 - 07:00)
38
    if (startMinutes > endMinutes) {
39
      return currentMinutes >= startMinutes || currentMinutes < endMinutes
40
    }
41

42
    return currentMinutes >= startMinutes && currentMinutes < endMinutes
43
  }
44
}

Frontend Considerations

Real-Time In-App Notifications

WebSocket connection for live updates:


15 collapsed lines
1
class NotificationClient {
2
  private ws: WebSocket | null = null
3
  private reconnectAttempt = 0
4
  private readonly MAX_RECONNECT_DELAY = 30000
5

6
  connect(authToken: string): void {
7
    this.ws = new WebSocket(`wss://notifications.example.com/ws?token=${authToken}`)
8

9
    this.ws.onopen = () => {
10
      this.reconnectAttempt = 0
11
      this.syncMissedNotifications()
12
    }
13

14
    this.ws.onmessage = (event) => {
15
      const notification = JSON.parse(event.data)
16
      this.handleNotification(notification)
17
    }
18

19
    this.ws.onclose = () => {
20
      this.scheduleReconnect()
21
    }
22
  }
23

24
  private handleNotification(notification: Notification): void {
25
    // Update badge count
26
    this.incrementBadge()
27

28
    // Add to notification list
29
    this.store.dispatch(addNotification(notification))
30

31
    // Show toast if appropriate
32
    if (notification.priority === "high" && !document.hasFocus()) {
33
      this.showToast(notification)
34
    }
35

36
    // Request browser notification permission if needed
37
    if (notification.showBrowserNotification) {
38
      this.showBrowserNotification(notification)
39
    }
40
  }
41

42
  private async syncMissedNotifications(): Promise<void> {
43
    const lastSeen = localStorage.getItem("lastNotificationTimestamp")
44

45
    const response = await fetch(`/api/v1/notifications?since=${lastSeen}&limit=50`)
46
    const { notifications } = await response.json()
47

48
    notifications.forEach((n) => this.handleNotification(n))
49
  }
50
}

Notification List with Virtualization


12 collapsed lines
1
interface NotificationListProps {
2
  userId: string
3
  pageSize: number
4
}
5

6
const NotificationList: React.FC<NotificationListProps> = ({ userId, pageSize }) => {
7
  const {
8
    data,
9
    fetchNextPage,
10
    hasNextPage,
11
    isFetchingNextPage
12
  } = useInfiniteQuery({
13
    queryKey: ['notifications', userId],
14
    queryFn: ({ pageParam }) =>
15
      fetchNotifications(userId, { cursor: pageParam, limit: pageSize }),
16
    getNextPageParam: (lastPage) => lastPage.nextCursor
17
  })
18

19
  const notifications = data?.pages.flatMap(p => p.notifications) ?? []
20

21
  return (
22
    <VirtualList
23
      items={notifications}
24
      estimatedItemSize={80}
25
      onEndReached={() => hasNextPage && fetchNextPage()}
26
      renderItem={(notification) => (
27
        <NotificationItem
28
          key={notification.id}
29
          notification={notification}
30
          onRead={markAsRead}
31
        />
32
      )}
33
    />
34
  )
35
}

Preference Management UI


15 collapsed lines
1
interface PreferenceState {
2
  loading: boolean
3
  preferences: UserPreferences | null
4
  pendingChanges: Partial<UserPreferences>
5
}
6

7
const PreferencesPanel: React.FC = () => {
8
  const [state, dispatch] = useReducer(preferenceReducer, initialState)
9

10
  const handleToggle = async (category: string, enabled: boolean) => {
11
    // Optimistic update
12
    dispatch({ type: 'UPDATE_CATEGORY', category, enabled })
13

14
    try {
15
      await updatePreferences({
16
        categories: { [category]: { enabled } }
17
      })
18
    } catch (error) {
19
      // Rollback on failure
20
      dispatch({ type: 'ROLLBACK' })
21
      showError('Failed to update preferences')
22
    }
23
  }
24

25
  return (
26
    <div className="preferences-panel">
27
      <section>
28
        <h3>Notification Channels</h3>
29
        {Object.entries(state.preferences?.channels ?? {}).map(([channel, config]) => (
30
          <ToggleRow
31
            key={channel}
32
            label={channelLabels[channel]}
33
            enabled={config.enabled}
34
            onChange={(enabled) => handleChannelToggle(channel, enabled)}
35
          />
36
        ))}
37
      </section>
38

39
      <section>
40
        <h3>Notification Types</h3>
41
        {Object.entries(state.preferences?.categories ?? {}).map(([category, config]) => (
42
          <CategoryRow
43
            key={category}
44
            category={category}
45
            config={config}
46
            onChange={(update) => handleCategoryUpdate(category, update)}
47
          />
48
        ))}
49
      </section>
50

51
      <section>
52
        <h3>Quiet Hours</h3>
53
        <QuietHoursEditor
54
          config={state.preferences?.quietHours}
55
          onChange={handleQuietHoursUpdate}
56
        />
57
      </section>
58
    </div>
59
  )
60
}

Push Notification Permission Flow


10 collapsed lines
1
class PushPermissionManager {
2
  async requestPermission(): Promise<"granted" | "denied" | "default"> {
3
    // Check if already granted
4
    if (Notification.permission === "granted") {
5
      await this.registerServiceWorker()
6
      return "granted"
7
    }
8

9
    // Don't ask if denied
10
    if (Notification.permission === "denied") {
11
      return "denied"
12
    }
13

14
    // Request permission
15
    const permission = await Notification.requestPermission()
16

17
    if (permission === "granted") {
18
      await this.registerServiceWorker()
19
      const token = await this.getFcmToken()
20
      await this.registerDevice(token)
21
    }
22

23
    return permission
24
  }
25

26
  private async registerServiceWorker(): Promise<void> {
27
    const registration = await navigator.serviceWorker.register("/sw.js")
28

29
    // Handle token refresh
30
    registration.addEventListener("pushsubscriptionchange", async () => {
31
      const newToken = await this.getFcmToken()
32
      await this.updateDevice(newToken)
33
    })
34
  }
35
}

Infrastructure

Cloud-Agnostic Components

Component	Purpose	Options
Message Queue	Event ingestion, priority routing	Kafka, Pulsar, NATS JetStream
KV Store	Preferences, tokens, dedup, rate limits	Redis, KeyDB, Dragonfly
Primary DB	Templates, preferences, audit	PostgreSQL, CockroachDB
Time-series DB	Notification history, delivery status	Cassandra, ScyllaDB, DynamoDB
Push Gateway	APNs/FCM delivery	Self-hosted, Firebase Admin
Email Gateway	SMTP delivery	Postfix, SendGrid API
SMS Gateway	Carrier delivery	Twilio, Vonage

AWS Reference Architecture

Service configurations:

Service	Configuration	Rationale
Notification API (Fargate)	2 vCPU, 4GB, 20 tasks	Stateless, scales with traffic
Router Workers (Fargate)	2 vCPU, 4GB, 50 tasks	CPU-bound preference lookups
Push Workers (Fargate)	2 vCPU, 4GB, 30 tasks	I/O-bound provider calls
WebSocket Gateways (Fargate)	4 vCPU, 8GB, 20 tasks	Memory for connections
ElastiCache Redis	r6g.xlarge cluster	Sub-ms reads for hot path
RDS PostgreSQL	db.r6g.large Multi-AZ	Templates, preferences
Amazon Keyspaces	On-demand	Serverless Cassandra
MSK	kafka.m5.large × 3	Priority queue separation

Self-Hosted Alternatives

Managed Service	Self-Hosted Option	When to Self-Host
Amazon MSK	Apache Kafka on EC2	Cost at scale, specific configs
ElastiCache	Redis Cluster on EC2	Specific modules (RediSearch)
Amazon Keyspaces	Apache Cassandra/ScyllaDB	Cost, tuning flexibility
SNS Mobile Push	Direct APNs/FCM integration	Full control, cost savings
Amazon SES	Postfix + DKIM/SPF	Volume discounts, deliverability control

Monitoring and Observability

Key metrics:

Metric	Alert Threshold	Action
Delivery rate	< 99%	Investigate provider issues
p99 latency (critical)	> 500ms	Scale workers, check queues
DLQ depth	> 1000	Manual intervention needed
Rate limit hits	> 10%	Review user throttle config
Invalid tokens	> 5% daily	Token cleanup job issue

Distributed tracing (per Slack’s approach):

Each notification gets its own trace (notification_id = trace_id)
Spans: trigger → enqueue → route → deliver → acknowledge
100% sampling for notifications (vs. 1% for general traffic)
OpenTelemetry integration for cross-service visibility

Conclusion

This design provides a scalable notification system with:

At-least-once delivery via Kafka durability and retry mechanisms
Sub-500ms delivery for critical notifications through priority queues and dedicated workers
User-centric throttling with preference-based channel selection and quiet hours
Multi-channel support with independent processors for push, email, SMS, and in-app
Horizontal scalability to millions of notifications per second

Key architectural decisions:

Priority-based queue separation ensures critical notifications bypass bulk backlogs
User-partitioned Kafka enables co-located rate limiting and aggregation
Separate channel processors allow independent scaling and failure isolation
Template rendering at send time supports personalization and A/B testing

Known limitations:

At-least-once delivery requires idempotent clients
Cross-channel ordering not guaranteed (push may arrive before email)
Aggregation windows add latency for batch-eligible notifications
External provider rate limits constrain burst capacity

Future enhancements:

ML-based send time optimization (per Uber/Airship research)
Rich media notifications (images, action buttons)
Cross-device notification sync (read on phone, clear on web)
Webhook delivery for B2B integrations

Appendix

Prerequisites

Distributed systems fundamentals (message queues, partitioning)
Push notification protocols (APNs, FCM)
Rate limiting algorithms (token bucket, sliding window)
Database selection trade-offs (SQL vs. NoSQL)

Terminology

Term	Definition
APNs	Apple Push Notification service - Apple’s push delivery infrastructure
FCM	Firebase Cloud Messaging - Google’s cross-platform push service
DLQ	Dead Letter Queue - storage for messages that failed processing
TTL	Time-to-Live - expiration duration for notifications
Collapse key	Identifier for grouping related notifications (newer replaces older)
Token bucket	Rate limiting algorithm allowing bursts up to bucket capacity
Idempotent	Operation that produces same result regardless of execution count

Summary

Multi-channel delivery (push, email, SMS, in-app) with at-least-once guarantees using Kafka and retry mechanisms
Priority-based routing separates critical notifications (< 500ms) from bulk (best effort)
User preference service with Redis caching enables per-user channel and frequency control
Rate limiting at user and channel level prevents notification fatigue and respects provider limits
Aggregation collapses similar notifications (“5 new likes”) to reduce user interruption
Scale to 1M+ notifications/second with horizontal worker scaling and partitioned queues

References

Real-World Implementations:

Uber’s Real-Time Push Platform (RAMEN) - 1.5M+ connections, 250K messages/second
LinkedIn Concourse - Near real-time personalized notifications at scale
Netflix RENO - Hybrid push-pull notification architecture
Slack Notification Tracing - End-to-end observability for notification delivery
Pinterest NEP - ML-powered notification relevance

Provider Documentation:

Firebase Cloud Messaging - Scale Your App - Rate limits, error handling, token management
APNs Provider API - Connection management, authentication
Gmail/Yahoo 2024 Deliverability Requirements - SPF, DKIM, one-click unsubscribe

Patterns and Best Practices:

Twilio Segment - Exactly-Once Delivery - Deduplication at 200B message scale
Knock - Batched Notification Engine - Aggregation patterns

Related Articles:

Design Real-Time Chat and Messaging - WebSocket connections, presence systems
Design an API Rate Limiter - Token bucket, sliding window algorithms