22 min read

Design Yelp: Location-Based Business Discovery Platform

A comprehensive system design for a location-based business discovery service. This design addresses proximity search at scale, multi-signal ranking, and user-generated content moderation with a focus on sub-100ms search latency and global availability.

ML Pipeline

Data Layer

Application Services

Edge Layer

Client Layer

Async Processing

Cache Layer

Primary Stores

Mobile Apps

Web App

CDN

Load Balancer

API Gateway

Search Service

Business Service

Review Service

User Service

Ranking Service

PostgreSQL

Elasticsearch

Redis Cluster

Geospatial Cache

Kafka

Background Workers

Ranking Model

Spam Detection

High-level architecture: Client requests flow through edge infrastructure to application services, with geospatial indexing in Elasticsearch and real-time updates via Kafka.

Proximity services solve a fundamentally different problem than traditional text search: the answer depends not just on what you’re looking for but where you are. This creates unique indexing challenges—you can’t pre-compute rankings when the query center point changes with every request.

The core trade-off space:

  • Indexing strategy: Geohash (simple, bounded precision) vs. Quadtree (adaptive density) vs. R-tree (optimal for rectangles). Each trades query flexibility against write complexity and memory overhead.
  • Search radius handling: Fixed grid cells leak results at boundaries; hierarchical approaches add latency but guarantee coverage.
  • Ranking signals: Distance alone produces poor results. Real systems blend distance, ratings, recency, personalization, and business attributes—but more signals mean slower ranking and harder explainability.
  • Write path complexity: Business data changes infrequently, but reviews flow continuously. Decoupling these paths enables independent scaling but introduces consistency windows.

Real-world context: Yelp handles 178 million unique visitors monthly with 244 million reviews. Google Maps indexes 200+ million places. Foursquare’s Pilgrim SDK processes 14 billion location signals daily for venue detection. These systems demonstrate that sub-100ms latency is achievable at scale, but requires aggressive caching, denormalization, and carefully bounded search spaces.

Mental model: Think of this as three systems in one: (1) a geospatial index that answers “what’s nearby?”, (2) a ranking engine that orders results by relevance, and (3) a content platform that manages reviews, photos, and business data. The challenge is making them work together with consistent latency under varying load patterns.

FeaturePriorityDescription
Nearby searchCoreFind businesses within radius of a location
Business profilesCoreView business details, hours, attributes
Reviews & ratingsCoreRead/write reviews, aggregate ratings
PhotosCoreView/upload business photos
Search by categoryCoreFilter by cuisine, service type
Search by attributesHighFilter by price, hours, amenities
Check-insMediumRecord visits, see activity
BookmarksMediumSave businesses for later
PersonalizationMediumRecommendations based on history
Business owner toolsMediumClaim business, respond to reviews

Scope for this article: We’ll design the core proximity search, business data, and review systems. Check-ins and advanced personalization are covered briefly; advertising and business analytics are out of scope.

RequirementTargetRationale
Search latencyp99 < 100msUser expectation for “instant” results
Write latencyp99 < 500msAcceptable for review submission
Availability99.99%Revenue-critical, user trust
Read/Write ratio100:1Read-heavy workload
Data freshness< 30s for reviews, < 5min for business dataReviews need quick visibility; business data changes rarely
Global coverageMulti-regionUsers travel; businesses are everywhere

Users:

  • Monthly Active Users (MAU): 100M
  • Daily Active Users (DAU): 30M
  • Peak concurrent users: 3M (10% of DAU)

Businesses:

  • Total businesses: 10M globally
  • Active businesses (updated in last year): 5M
  • New businesses per day: 10K

Traffic:

  • Search queries: 30M DAU × 5 searches/day = 150M searches/day ≈ 1,700 QPS
  • Peak search: 3× average = 5,100 QPS
  • Business profile views: 30M DAU × 10 views/day = 300M/day ≈ 3,500 QPS
  • Review writes: 30M DAU × 0.01 reviews/day = 300K reviews/day ≈ 3.5 writes/sec
  • Photo uploads: ~100K/day

Storage:

  • Business data: 10M × 10KB = 100GB
  • Reviews: 200M reviews × 2KB = 400GB
  • Photos: 500M photos × 500KB average = 250TB (object storage)
  • Search index: ~50GB (denormalized business + location data)
  • 5-year projection: ~2PB total (dominated by photos)

Best when:

  • Uniform business density across regions
  • Fixed search radius (e.g., always 5km)
  • Simpler operational requirements

Architecture:

Query Flow

Index Structure

geohash:dr5ru7 → [biz_ids]

geohash:dr5ru → [biz_ids]

geohash:dr5r → [biz_ids]

lat, lon

Geohash Encode

Prefix Match

+ 8 Neighbors

Candidate Set

Distance Filter

Rank & Return

Key characteristics:

  • Geohash encodes lat/lon into a string where prefix matches indicate proximity
  • 6-character geohash ≈ 1.2km × 600m cell; 5-character ≈ 5km × 5km
  • Query requires checking target cell + 8 neighboring cells to handle boundary cases
  • Index structure is a simple key-value map: geohash → list of business IDs

Trade-offs:

  • Simple to implement and debug
  • Excellent cache locality (nearby queries hit same cells)
  • Easy sharding by geohash prefix
  • Fixed precision creates density mismatches (Manhattan vs. rural Wyoming)
  • Variable radius searches require multiple precision levels
  • Edge cases at cell boundaries require neighbor expansion

Real-world example: Uber’s geospatial indexing uses geohash for initial filtering, then refines with exact distance calculations. Their H3 system (hexagonal hierarchical index) evolved from geohash limitations but shares the prefix-based lookup principle.

Best when:

  • Highly variable business density
  • Need for adaptive precision
  • Complex geometric queries (polygons, routes)

Architecture:

Query

Search Circle

Find Intersecting Cells

Collect All Businesses

Quadtree Structure

Root (World)

NW

NE

SW

SE

NW

NE

SW

SE

Leaf: 50 businesses

Key characteristics:

  • Recursively subdivides space until each cell contains ≤ N businesses
  • Dense areas (cities) have deeper trees; sparse areas (rural) stay shallow
  • Google’s S2 Geometry library uses this with spherical projection
  • Cell IDs encode the path from root, enabling range queries

Trade-offs:

  • Adapts to density automatically
  • Efficient for variable-radius queries
  • Handles complex shapes (polygons, paths)
  • More complex implementation
  • Tree rebalancing on updates
  • Harder to cache (cells have variable sizes)

Real-world example: Google Maps uses S2 cells extensively. A search in Manhattan might use level-16 cells (~150m), while rural areas use level-10 (~10km). Foursquare’s Pilgrim SDK uses S2 for geofencing with 14 billion daily signals.

Best when:

  • Need full-text search combined with geo
  • Complex attribute filtering
  • Leveraging existing search infrastructure

Architecture:

Index Structure

Document

location: geo_point

attributes: nested

name: text

Elasticsearch

geo_distance query

BKD Tree Index

Attribute Filters

Relevance Scoring

Key characteristics:

  • Elasticsearch uses BKD trees (variant of R-trees) for geo_point fields
  • Combines spatial queries with full-text search and filtering
  • geo_distance query returns all points within radius
  • geo_bounding_box for rectangular regions

Trade-offs:

  • Combines geo + text + attributes in single query
  • Mature tooling, operational knowledge
  • Built-in relevance scoring
  • Higher latency than specialized geo indexes (~20-50ms vs ~5ms)
  • Memory overhead for inverted indexes
  • Write amplification for index updates

Real-world example: Yelp uses Elasticsearch for their search infrastructure. Their architecture handles 150M+ searches per day with sub-100ms latency by combining aggressive caching with optimized ES configurations.

FactorGeohashQuadtree/S2R-tree (ES)
Implementation complexityLowHighMedium (if using ES)
Query latency~5ms~10ms~30ms
Variable radiusMultiple queriesNativeNative
Density handlingPoorExcellentGood
Text searchSeparate systemSeparate systemIntegrated
Operational overheadLowMediumMedium-High
Best forFixed-radius, uniformComplex geo, variableFull-featured search

This article focuses on Path C (Elasticsearch-based) because:

  1. Most proximity services need combined text + geo + attribute search
  2. Elasticsearch is widely deployed with known operational patterns
  3. The latency difference (~30ms vs ~5ms) is acceptable when total budget is 100ms
  4. Avoiding a separate geo-index reduces system complexity

For systems requiring sub-10ms geo queries at massive scale (e.g., ride-sharing dispatch), Path B with custom S2 implementation would be more appropriate.

Endpoint: GET /api/v1/businesses/search

Query Parameters:

ParameterTypeRequiredDescription
latfloatYesLatitude (-90 to 90)
lonfloatYesLongitude (-180 to 180)
radiusintNoSearch radius in meters (default: 5000, max: 50000)
categorystringNoCategory filter (e.g., “restaurants”, “coffee”)
pricestringNoPrice filter (“1”, “2”, “3”, “4” or ranges “1,2”)
open_nowboolNoFilter to currently open businesses
sortstringNoSort order: “distance”, “rating”, “review_count”, “relevance” (default)
cursorstringNoPagination cursor
limitintNoResults per page (default: 20, max: 50)

Response (200 OK):

{
"businesses": [
{
"id": "biz_abc123",
"name": "Joe's Coffee",
"slug": "joes-coffee-san-francisco",
"location": {
"lat": 37.7749,
"lon": -122.4194,
"address": "123 Market St",
"city": "San Francisco",
"state": "CA",
"postal_code": "94102",
"country": "US"
},
"distance_meters": 450,
"categories": [{ "id": "coffee", "name": "Coffee & Tea" }],
"rating": 4.5,
"review_count": 1247,
"price_level": 2,
"hours": {
"is_open_now": true,
"today": "7:00 AM - 8:00 PM"
},
"photos": {
"thumbnail": "https://cdn.example.com/photos/abc123/thumb.jpg",
"count": 523
},
"attributes": {
"wifi": true,
"outdoor_seating": true,
"takes_reservations": false
}
}
],
"total": 847,
"cursor": "eyJvZmZzZXQiOjIwfQ==",
"search_metadata": {
"center": { "lat": 37.7749, "lon": -122.4194 },
"radius_meters": 5000,
"query_time_ms": 42
}
}

Error Responses:

  • 400 Bad Request: Invalid coordinates, radius out of range
  • 429 Too Many Requests: Rate limit exceeded

Rate Limits: 100 requests/minute per user (authenticated), 20/minute (anonymous)

Endpoint: GET /api/v1/businesses/{business_id}

Response (200 OK):

{
"id": "biz_abc123",
"name": "Joe's Coffee",
"slug": "joes-coffee-san-francisco",
"claimed": true,
"location": {
"lat": 37.7749,
"lon": -122.4194,
"address": "123 Market St",
"city": "San Francisco",
"state": "CA",
"postal_code": "94102",
"country": "US",
"cross_streets": "Market & 4th"
},
"contact": {
"phone": "+1-415-555-0123",
"website": "https://joescoffee.com"
},
"categories": [
{ "id": "coffee", "name": "Coffee & Tea" },
{ "id": "breakfast", "name": "Breakfast & Brunch" }
],
"rating": 4.5,
"review_count": 1247,
"price_level": 2,
"hours": {
"monday": [{ "open": "07:00", "close": "20:00" }],
"tuesday": [{ "open": "07:00", "close": "20:00" }],
"is_open_now": true,
"special_hours": [{ "date": "2024-12-25", "is_closed": true }]
},
"photos": [
{
"id": "photo_xyz",
"url": "https://cdn.example.com/photos/xyz.jpg",
"caption": "Interior",
"user_id": "user_123"
}
],
"attributes": {
"wifi": true,
"outdoor_seating": true,
"parking": "street",
"noise_level": "moderate",
"good_for": ["working", "casual_dining"],
"accepts": ["credit_cards", "apple_pay"]
},
"highlights": ["Great for working remotely", "Excellent espresso"]
}

Endpoint: POST /api/v1/businesses/{business_id}/reviews

Request:

{
"rating": 4,
"text": "Great coffee and atmosphere. The baristas are friendly and the WiFi is fast. Perfect spot for remote work.",
"photos": ["upload_token_1", "upload_token_2"]
}

Response (201 Created):

{
"id": "review_def456",
"business_id": "biz_abc123",
"user": {
"id": "user_789",
"name": "John D.",
"review_count": 42,
"photo_url": "https://cdn.example.com/users/789.jpg"
},
"rating": 4,
"text": "Great coffee and atmosphere...",
"photos": [{ "id": "photo_1", "url": "https://cdn.example.com/..." }],
"created_at": "2024-01-15T10:30:00Z",
"status": "pending_moderation"
}

Error Responses:

  • 400 Bad Request: Rating out of range (1-5), text too short (<50 chars) or too long (>5000 chars)
  • 401 Unauthorized: Not authenticated
  • 403 Forbidden: User has already reviewed this business
  • 429 Too Many Requests: Review rate limit (5/day per user)

Cursor-based pagination (chosen over offset):

  • Why: Offset pagination breaks when data changes between pages. With 300K new reviews/day, offset=1000 returns different results seconds apart.
  • Implementation: Cursor encodes the last seen sort key (e.g., {score: 4.5, id: "biz_xyz"}).
  • Trade-off: Can’t jump to arbitrary pages, but provides consistent results.
{
"cursor": "eyJzY29yZSI6NC41LCJpZCI6ImJpel94eXoifQ==",
"has_more": true
}

Field inclusion strategy:

  • Default response includes fields needed for list views
  • ?expand=hours,attributes,photos for detail views
  • Reduces payload size by 60% for search results

Denormalization decisions:

  • Include distance_meters in search results (requires computation anyway)
  • Include is_open_now (computed from hours, but clients need it)
  • Exclude full hours object from search (add via ?expand=hours)

Primary Store: PostgreSQL (ACID for business data integrity)

CREATE TABLE businesses (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name VARCHAR(255) NOT NULL,
slug VARCHAR(255) UNIQUE NOT NULL,
-- Location
latitude DECIMAL(10, 8) NOT NULL,
longitude DECIMAL(11, 8) NOT NULL,
geohash VARCHAR(12) GENERATED ALWAYS AS (
ST_GeoHash(ST_SetSRID(ST_MakePoint(longitude, latitude), 4326), 12)
) STORED,
address_line1 VARCHAR(255),
address_line2 VARCHAR(255),
city VARCHAR(100),
state VARCHAR(100),
postal_code VARCHAR(20),
country CHAR(2) NOT NULL,
-- Business info
phone VARCHAR(20),
website VARCHAR(500),
price_level SMALLINT CHECK (price_level BETWEEN 1 AND 4),
-- Aggregates (denormalized for read performance)
rating_avg DECIMAL(2, 1) DEFAULT 0,
review_count INTEGER DEFAULT 0,
photo_count INTEGER DEFAULT 0,
-- Status
claimed BOOLEAN DEFAULT FALSE,
owner_id UUID REFERENCES users(id),
status VARCHAR(20) DEFAULT 'active',
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
-- Geospatial index for PostGIS queries
CREATE INDEX idx_businesses_location ON businesses
USING GIST (ST_SetSRID(ST_MakePoint(longitude, latitude), 4326));
-- Geohash prefix index for simple lookups
CREATE INDEX idx_businesses_geohash ON businesses (geohash varchar_pattern_ops);
-- Category lookups
CREATE INDEX idx_businesses_city_status ON businesses (city, status)
WHERE status = 'active';
CREATE TABLE business_hours (
business_id UUID REFERENCES businesses(id) ON DELETE CASCADE,
day_of_week SMALLINT NOT NULL CHECK (day_of_week BETWEEN 0 AND 6),
open_time TIME NOT NULL,
close_time TIME NOT NULL,
PRIMARY KEY (business_id, day_of_week, open_time)
);
CREATE TABLE business_special_hours (
business_id UUID REFERENCES businesses(id) ON DELETE CASCADE,
date DATE NOT NULL,
is_closed BOOLEAN DEFAULT FALSE,
open_time TIME,
close_time TIME,
PRIMARY KEY (business_id, date)
);
CREATE TABLE categories (
id VARCHAR(50) PRIMARY KEY,
name VARCHAR(100) NOT NULL,
parent_id VARCHAR(50) REFERENCES categories(id),
level SMALLINT NOT NULL DEFAULT 0
);
CREATE TABLE business_categories (
business_id UUID REFERENCES businesses(id) ON DELETE CASCADE,
category_id VARCHAR(50) REFERENCES categories(id),
is_primary BOOLEAN DEFAULT FALSE,
PRIMARY KEY (business_id, category_id)
);
CREATE INDEX idx_business_categories_category ON business_categories (category_id);
CREATE TABLE reviews (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
business_id UUID NOT NULL REFERENCES businesses(id),
user_id UUID NOT NULL REFERENCES users(id),
rating SMALLINT NOT NULL CHECK (rating BETWEEN 1 AND 5),
text TEXT NOT NULL CHECK (char_length(text) BETWEEN 50 AND 5000),
-- Moderation
status VARCHAR(20) DEFAULT 'pending',
moderation_score DECIMAL(3, 2),
moderated_at TIMESTAMPTZ,
-- Engagement
useful_count INTEGER DEFAULT 0,
funny_count INTEGER DEFAULT 0,
cool_count INTEGER DEFAULT 0,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW(),
UNIQUE (business_id, user_id)
);
CREATE INDEX idx_reviews_business ON reviews (business_id, created_at DESC)
WHERE status = 'approved';
CREATE INDEX idx_reviews_user ON reviews (user_id, created_at DESC);
CREATE INDEX idx_reviews_pending ON reviews (status, created_at)
WHERE status = 'pending';
{
"mappings": {
"properties": {
"id": { "type": "keyword" },
"name": {
"type": "text",
"analyzer": "standard",
"fields": {
"keyword": { "type": "keyword" },
"autocomplete": {
"type": "text",
"analyzer": "autocomplete"
}
}
},
"location": { "type": "geo_point" },
"geohash": { "type": "keyword" },
"city": { "type": "keyword" },
"country": { "type": "keyword" },
"categories": { "type": "keyword" },
"price_level": { "type": "integer" },
"rating_avg": { "type": "float" },
"review_count": { "type": "integer" },
"attributes": {
"type": "object",
"properties": {
"wifi": { "type": "boolean" },
"outdoor_seating": { "type": "boolean" },
"parking": { "type": "keyword" }
}
},
"hours": {
"type": "nested",
"properties": {
"day": { "type": "integer" },
"open": { "type": "integer" },
"close": { "type": "integer" }
}
},
"updated_at": { "type": "date" }
}
},
"settings": {
"analysis": {
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "autocomplete_filter"]
}
},
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 20
}
}
}
}
}
Data TypeStoreRationale
Business profilesPostgreSQLACID, complex queries, foreign keys, moderate scale
ReviewsPostgreSQLACID for integrity, complex moderation queries
Search indexElasticsearchGeo queries, full-text, filtering, aggregations
Session/rate limitingRedisSub-ms latency, TTL support, atomic operations
PhotosS3 + CloudFrontObject storage, CDN delivery, cost-effective
Analytics eventsKafka → ClickHouseHigh write throughput, analytical queries

PostgreSQL (if needed at scale):

  • Shard by country or region for geographic locality
  • Most queries are region-scoped (users search near their location)
  • Cross-region queries (rare) handled by query router

Elasticsearch:

  • Shard by geohash prefix (e.g., first 2 characters)
  • 256 shards globally (00-ff in hex, or geographic prefixes)
  • Enables routing queries to relevant shards only
Ranking ServiceRedis CacheElasticsearchSearch ServiceAPI GatewayClientRanking ServiceRedis CacheElasticsearchSearch ServiceAPI GatewayClientalt[Cache hit][Cache miss]GET /search?lat=37.77&lon=-122.41&radius=5000Forward with user contextCheck geohash cache (dr5ru)Cached business IDsgeo_distance queryMatching documentsStore in cache (TTL: 5min)Rank candidates with signalsScored resultsPaginated responseJSON response

Basic geo query:

{
"query": {
"bool": {
"must": [{ "match": { "status": "active" } }],
"filter": [
{
"geo_distance": {
"distance": "5km",
"location": {
"lat": 37.7749,
"lon": -122.4194
}
}
}
]
}
},
"sort": [
{
"_geo_distance": {
"location": { "lat": 37.7749, "lon": -122.4194 },
"order": "asc",
"unit": "m"
}
}
]
}

With category and attribute filters:

{
"query": {
"bool": {
"must": [{ "match": { "status": "active" } }],
"filter": [
{
"geo_distance": {
"distance": "5km",
"location": { "lat": 37.7749, "lon": -122.4194 }
}
},
{ "terms": { "categories": ["coffee", "cafe"] } },
{ "term": { "attributes.wifi": true } },
{ "range": { "price_level": { "lte": 2 } } }
]
}
}
}

“Open now” filter (complex):

The “open now” query requires knowing the current day/time in the business’s timezone:

{
"query": {
"bool": {
"filter": [
{
"nested": {
"path": "hours",
"query": {
"bool": {
"must": [
{ "term": { "hours.day": 1 } },
{ "range": { "hours.open": { "lte": 1030 } } },
{ "range": { "hours.close": { "gt": 1030 } } }
]
}
}
}
}
]
}
}
}

Design decision: Store hours as integer (HHMM format: 1030 = 10:30 AM) for efficient range queries. Handle timezone conversion at query time, not storage time.

Cache structure:

Key: geo:search:{geohash_prefix}:{category}:{filters_hash}
Value: [business_id_1, business_id_2, ...]
TTL: 5 minutes

Why cache by geohash prefix:

  • Queries within same ~1km area hit same cache entry
  • 6-character geohash prefix provides good locality
  • Different filter combinations get separate cache keys

Cache invalidation:

  • Business update → Invalidate all cache keys containing that business
  • New business → No invalidation needed (TTL handles staleness)
  • Trade-off: 5-minute staleness acceptable for discovery use case

Multi-signal ranking formula:

score=w1fdistance+w2frating+w3freviews+w4frecency+w5fpersonalscore = w_1 \cdot f_{distance} + w_2 \cdot f_{rating} + w_3 \cdot f_{reviews} + w_4 \cdot f_{recency} + w_5 \cdot f_{personal}

Where:

  • fdistance=1distancemax_radiusf_{distance} = 1 - \frac{distance}{max\_radius} (closer = higher score)
  • frating=rating14f_{rating} = \frac{rating - 1}{4} (normalized 0-1)
  • freviews=log(review_count+1)log(max_reviews+1)f_{reviews} = \frac{\log(review\_count + 1)}{\log(max\_reviews + 1)} (log-scaled)
  • frecency=eλdays_since_reviewf_{recency} = e^{-\lambda \cdot days\_since\_review} (exponential decay)
  • fpersonalf_{personal} = personalization signal (0-1)

Default weights (tuned via A/B testing):

SignalWeightRationale
Distance0.25Important but not dominant
Rating0.30Primary quality signal
Review count0.20Social proof, data confidence
Recency0.15Fresh data preferred
Personal0.10Light personalization

Elasticsearch function_score implementation:

{
"query": {
"function_score": {
"query": {"bool": {"filter": [...]}},
"functions": [
{
"gauss": {
"location": {
"origin": {"lat": 37.77, "lon": -122.41},
"scale": "2km",
"decay": 0.5
}
},
"weight": 25
},
{
"field_value_factor": {
"field": "rating_avg",
"factor": 1,
"modifier": "none",
"missing": 3
},
"weight": 30
},
{
"field_value_factor": {
"field": "review_count",
"factor": 1,
"modifier": "log1p",
"missing": 1
},
"weight": 20
}
],
"score_mode": "sum",
"boost_mode": "replace"
}
}
}

Sparse areas (few results):

  • If < 10 results within radius, automatically expand to 2× radius
  • Cap at 50km to prevent global searches
  • Return expanded_radius: true in response metadata

Dense areas (too many results):

  • Pre-filter by quality threshold (rating > 3.0, review_count > 5)
  • Use stricter ranking to surface best options
  • Never return more than 1000 candidates to ranker

Boundary problems:

  • Elasticsearch’s geo_distance handles great-circle distance correctly
  • No geohash boundary issues (unlike raw geohash queries)

Timezone handling for “open now”:

  • Store business timezone in database
  • Query-time conversion: UTC now → business local time
  • Handle DST transitions correctly
Notification ServiceElasticsearchSpam DetectionModeration WorkerKafkaPostgreSQLReview APIUserNotification ServiceElasticsearchSpam DetectionModeration WorkerKafkaPostgreSQLReview APIUseralt[Auto-approve (scores < threshold)][Manual review needed]POST /reviews (rating, text, photos)Validate inputInsert review (status: pending)Publish ReviewCreated event201 Created (status: pending)Consume eventAnalyze textSpam score, toxicity scoreUpdate status: approvedIndex reviewUpdate business rating_avg, review_countNotify business ownerUpdate status: manual_reviewPublish to human review queue
SignalWeightDescription
Account age0.15New accounts more suspicious
Review velocity0.20Multiple reviews in short time
Text quality0.25Gibberish, excessive caps, links
Sentiment mismatch0.155-star rating with negative text
IP/device clustering0.15Multiple accounts from same source
Business relationship0.10Employee/owner detection

Thresholds:

  • Score < 0.3: Auto-approve
  • Score 0.3-0.7: Auto-approve with flag for sampling
  • Score > 0.7: Manual review required

Naive average problems:

  • Business with 1 review (5 stars) ranks above business with 1000 reviews (4.8 avg)
  • New businesses have unstable ratings

Bayesian average solution:

ratingadjusted=Cm+ratingsC+nrating_{adjusted} = \frac{C \cdot m + \sum ratings}{C + n}

Where:

  • CC = confidence parameter (typically 10-50)
  • mm = prior mean (global average, ~3.7)
  • nn = number of reviews

Example:

  • New business: 2 reviews, both 5 stars
  • Naive: 5.0
  • Bayesian (C=10, m=3.7): (10×3.7+10)/(10+2)=4.0(10 \times 3.7 + 10) / (10 + 2) = 4.0

This prevents new businesses with few perfect reviews from dominating rankings.

Challenge: When a review is approved, multiple systems need updating:

  1. PostgreSQL: review status
  2. PostgreSQL: business rating_avg, review_count
  3. Elasticsearch: business document
  4. Cache: invalidate relevant entries

Solution: Transactional outbox pattern

BEGIN;
-- Update review
UPDATE reviews SET status = 'approved' WHERE id = $1;
-- Update business aggregates
UPDATE businesses
SET rating_avg = (SELECT AVG(rating) FROM reviews WHERE business_id = $2 AND status = 'approved'),
review_count = (SELECT COUNT(*) FROM reviews WHERE business_id = $2 AND status = 'approved')
WHERE id = $2;
-- Write to outbox for async propagation
INSERT INTO outbox (event_type, payload)
VALUES ('ReviewApproved', '{"review_id": "...", "business_id": "..."}');
COMMIT;

A separate process reads the outbox and updates Elasticsearch + invalidates caches.

Consistency window: ~1-5 seconds for search index, but PostgreSQL is immediately consistent for direct business profile reads.

Optimized for list rendering:

interface SearchState {
// Normalized entities
businesses: Record<string, Business>
// Search result ordering
resultIds: string[]
// Pagination
cursor: string | null
hasMore: boolean
// Search params (for cache key)
searchParams: {
lat: number
lon: number
radius: number
filters: Record<string, unknown>
}
// UI state
isLoading: boolean
error: string | null
}

Why normalized:

  • Moving between list and detail views doesn’t duplicate data
  • Updates to a business (e.g., from detail view) reflect in list
  • Efficient React renders with reference equality

Marker clustering for dense areas:

  • 50+ markers on mobile degrades performance
  • Cluster markers when zoom level shows > 30 businesses in viewport
  • Expand clusters on zoom or tap

Viewport-based loading:

interface MapSearchParams {
bounds: {
ne: { lat: number; lon: number }
sw: { lat: number; lon: number }
}
zoom: number
}
// Debounce map move events (300ms)
// Only fetch when bounds change significantly (> 20% new area)

Cache strategy for mobile:

  • Cache last 5 search results (businesses + basic details)
  • Pre-fetch business details for top 10 search results
  • Store ~50MB of data for offline browsing

Service worker strategy:

// Cache search results with network-first, cache-fallback
// Cache business details with stale-while-revalidate
// Never cache review submission (must be online)

When needed:

  • Review count changes (low priority, poll every 5 min)
  • Business hours during edge times (near open/close)
  • “Popular times” if implemented

When NOT needed:

  • Real-time review streaming (batch updates fine)
  • Live rating changes (too volatile, confusing UX)

Async Processing

Data (per region)

Compute (per region)

Load Balancing

Edge / CDN

CDN (static assets, images)

WAF / DDoS Protection

Global Load Balancer

Regional LB - US

Regional LB - EU

API Servers (auto-scaling)

Background Workers

ML Inference

PostgreSQL Primary

PostgreSQL Replicas

Elasticsearch Cluster

Redis Cluster

Message Queue

Object Storage

ComponentServiceConfiguration
Global LBRoute 53 + CloudFrontLatency-based routing
Regional LBALBAuto-scaling target groups
API serversECS Fargate2-100 tasks, auto-scaling
Background workersECS FargateSpot instances (70% savings)
PostgreSQLRDS Multi-AZdb.r6g.xlarge, 3 read replicas
ElasticsearchOpenSearch Service3 master + 6 data nodes, r6g.large
RedisElastiCache Cluster3 shards, r6g.large
Message queueMSK (Managed Kafka)3 brokers, kafka.m5.large
Object storageS3 + CloudFrontIntelligent tiering
ML inferenceSageMaker ServerlessFor spam detection

Active-active for reads:

  • Each region has full read replicas
  • Users routed to nearest region
  • Search queries served locally

Primary region for writes:

  • Reviews written to primary region
  • Async replication to other regions (< 5s lag)
  • Business updates are infrequent, higher latency acceptable

Failover:

  • Route 53 health checks detect region failure
  • Automatic failover to secondary region
  • RTO: < 1 minute, RPO: < 30 seconds
ComponentMonthly Cost (estimate)Optimization
Elasticsearch$8,000Reserved instances, optimize shards
RDS PostgreSQL$3,000Reserved instances, right-size replicas
ElastiCache$1,500Reserved instances
ECS Fargate$5,000Spot for workers, right-size tasks
Data transfer$2,000CloudFront caching, compression
S3$1,000Intelligent tiering, lifecycle policies
Total~$20,000/monthAt 100M MAU scale

Problem: Query at cell boundary misses nearby businesses in adjacent cells.

Cause: Querying only the target geohash cell without neighbors.

Fix: Always query target cell + 8 surrounding cells, or use proper geo_distance queries that handle boundaries correctly.

Problem: Business owners create fake accounts to boost ratings.

Cause: No detection of review authenticity.

Fix: Multi-signal spam detection, IP/device fingerprinting, account age requirements, manual review sampling.

Problem: Newly added businesses don’t appear in search for hours.

Cause: Elasticsearch index not refreshed frequently enough.

Fix: Near-real-time indexing (refresh_interval: 1s for hot data), or explicit refresh for critical updates.

Problem: Businesses show as open/closed at wrong times.

Cause: Storing hours in wrong timezone or not handling DST.

Fix: Store timezone with business, convert at query time, use proper timezone libraries (not offset math).

Problem: Storage costs grow 10x faster than expected.

Cause: Storing original photos without size limits or compression.

Fix: Resize on upload (max 2048px), generate thumbnails, use progressive JPEG, implement storage lifecycle (archive old photos).

Problem: User submits review, refreshes page, review not visible.

Cause: Async processing creates visibility delay.

Fix: Read-your-writes consistency: return pending review in user’s own view immediately, mark as “pending” until approved.

This design prioritizes search latency and result quality as the primary optimization targets, accepting complexity in the write path and eventual consistency for non-critical data.

Key architectural decisions:

  1. Elasticsearch for geo-search rather than custom geospatial index—the 20-30ms latency overhead is acceptable when combined text/geo/attribute queries are needed
  2. Denormalized aggregates (rating_avg, review_count) in business records—trades write complexity for read performance
  3. Async review processing with transactional outbox—ensures eventual consistency while maintaining PostgreSQL ACID guarantees
  4. Multi-signal ranking with tunable weights—enables A/B testing different ranking formulas without code changes

What this design sacrifices:

  • Real-time consistency for search results (5-30 second delay acceptable)
  • Sub-10ms geo query latency (custom geospatial index would achieve this)
  • Perfect spam detection (manual review still required for edge cases)

Future improvements:

  • Personalized search using user history and collaborative filtering
  • “Popular times” using check-in and visit data
  • Voice search integration for mobile
  • Business insights dashboard for owners
  • Distributed systems fundamentals (CAP theorem, consistency models)
  • Database indexing concepts (B-trees, inverted indexes)
  • Basic understanding of geospatial concepts (latitude/longitude, great-circle distance)
  • Familiarity with Elasticsearch or similar search engines
TermDefinition
GeohashA hierarchical spatial encoding that converts coordinates to a string where prefix matches indicate proximity
QuadtreeA tree data structure that recursively subdivides 2D space into four quadrants
R-treeA tree data structure for indexing multi-dimensional data, optimized for range queries
BKD treeA variant of KD-tree used by Lucene/Elasticsearch for efficient multi-dimensional indexing
Geo_pointElasticsearch field type that stores latitude/longitude and enables geospatial queries
Bayesian averageA weighted average that incorporates prior knowledge to handle small sample sizes
  • Proximity search requires specialized geospatial indexing; Elasticsearch’s geo_distance query provides good balance of features and performance
  • Multi-signal ranking (distance + rating + reviews + recency) produces better results than distance alone
  • Review systems need async processing with spam detection to maintain quality at scale
  • Denormalization and caching are essential for sub-100ms search latency
  • Eventual consistency (< 30 seconds) is acceptable for discovery use cases but requires read-your-writes for user’s own content
Continue Reading
  • Previous

    Design Google Maps

    System Design / System Design Problems 16 min read

    A system design for a mapping and navigation platform handling tile-based rendering, real-time routing with traffic awareness, geocoding, and offline maps. This design addresses continental-scale road networks (18M+ nodes), sub-second routing queries, and 97%+ ETA accuracy.

  • Next

    Design Search Autocomplete: Prefix Matching at Scale

    System Design / System Design Problems 18 min read

    A system design for search autocomplete (typeahead) covering prefix data structures, ranking algorithms, distributed architecture, and sub-100ms latency requirements. This design addresses the challenge of returning relevant suggestions within the user’s typing cadence—typically under 100ms—while handling billions of queries daily.