System Design Problems
25 min read

Design Google Search

Building a web-scale search engine that processes 8.5 billion queries daily across 400+ billion indexed pages with sub-second latency. Search engines solve the fundamental information retrieval problem: given a query, return the most relevant documents from a massive corpus—instantly. This design covers crawling (web discovery), indexing (content organization), ranking (relevance scoring), and serving (query processing)—the four pillars that make search work at planetary scale.

Storage Layer

Crawl Infrastructure

Ranking Systems

Distributed Index

Serving Layer

Query Processing

User Layer

Web Browser

Mobile App

Search API

Spell Correction

680M params, <2ms

Intent Understanding

NLP + Entity Recognition

Query Expansion

Synonyms + Related

Query Router

Query Cache

Frequent Results

Result Aggregator

Shard 1

Inverted Index

Shard 2

Inverted Index

Shard N

Inverted Index

PageRank

Link Analysis

BERT

Semantic Understanding

RankBrain

Query-Result Matching

URL Frontier

Prioritized Queue

Distributed Crawler

Googlebot

Content Parser

+ Deduplication

Bigtable

Page Content

Colossus

Distributed FS

Google Search architecture: Queries flow through spell correction and intent understanding, then fan out to distributed index shards. Results aggregate through ranking systems (PageRank, BERT, RankBrain) before returning. Crawlers continuously feed fresh content into the index via Bigtable storage.

Web search design revolves around four interconnected systems, each with distinct scale challenges:

  1. Crawling — Discover and fetch the web’s content. The challenge: billions of pages change constantly, but crawl resources are finite. Prioritization (popular pages crawled hourly; obscure pages monthly) and politeness (respecting server limits) determine coverage quality.

  2. Indexing — Transform raw HTML into queryable data structures. Inverted indexes map every term to its posting list (documents containing that term). Sharding distributes the index across thousands of machines; tiered storage keeps hot data in memory.

  3. Ranking — Score document relevance for a given query. PageRank (link analysis) provides baseline authority; modern systems layer BERT (semantic understanding), RankBrain (query-result matching), and 200+ other signals. Ranking quality directly determines user satisfaction.

  4. Serving — Process queries with sub-second latency. Fan out to all index shards in parallel, aggregate results, apply final ranking, and return—all within 200-500ms. Caching frequent queries reduces load; early termination stops when good results are found.

ComponentScaleKey Trade-off
Crawling25B URLs discovered/dayFreshness vs. coverage (can’t crawl everything)
Indexing400B+ documentsStorage cost vs. query speed (compression trade-offs)
Ranking200+ signals per queryLatency vs. ranking quality (more signals = slower)
Serving100K+ QPS peakCompleteness vs. speed (early termination)

The mental model: crawl → parse → index → rank → serve. Each stage operates independently but feeds the next. Freshness propagates from crawl to index to results over hours to days depending on page importance.

FeatureScopeNotes
Web searchCoreReturn ranked results for text queries
AutocompleteCoreSuggest queries as user types
Spell correctionCoreFix typos, suggest alternatives
Image searchExtendedSearch by image content/metadata
News searchExtendedTime-sensitive, freshness-critical
Local searchExtendedLocation-aware results
Knowledge panelsExtendedDirect answers from knowledge graph
PersonalizationCoreLocation, language, search history
Safe searchCoreFilter explicit content
PaginationCoreNavigate through result pages
RequirementTargetRationale
Query latencyp50 < 200ms, p99 < 500msUser abandonment increases 20% per 100ms delay
Autocomplete latencyp99 < 100msMust feel instantaneous while typing
Availability99.99%Revenue-critical; billions of queries daily
Index freshnessMinutes for news, hours for regular pagesQuery Deserves Freshness (QDF) for time-sensitive topics
Index coverage400B+ pagesComprehensive web coverage
Crawl politenessRespect robots.txt, adaptive rate limitingAvoid overloading origin servers
Result relevanceHigh precision in top 10 resultsUsers rarely scroll past first page

Query Traffic:

Daily queries: 8.5 billion
QPS (average): 8.5B / 86,400 = ~100,000 QPS
QPS (peak): 3x average = ~300,000 QPS
Autocomplete: 10x queries (every keystroke) = 1M+ RPS

Index Size:

Indexed pages: 400+ billion documents
Average page size (compressed): 100KB
Raw storage: 400B × 100KB = 40 exabytes
With compression + deduplication: ~1-5 exabytes
Index size (inverted index): ~10-20% of raw = 100s of petabytes

Crawl Volume:

URLs discovered daily: 25+ billion
Pages crawled daily: ~billions (prioritized subset)
Bandwidth: Petabytes per day
Crawl rate per domain: 1-10 requests/second (politeness-limited)

Storage Infrastructure:

Bigtable clusters: Thousands of machines
Colossus clusters: Multiple exabytes each (some exceed 10EB)
Index shards: Thousands across global datacenters
Replication factor: 3x minimum for durability

Best when:

  • Index fits on a single machine cluster
  • Query volume is moderate (<10K QPS)
  • Freshness requirements are relaxed (daily updates acceptable)

Architecture:

Single Datacenter

Load Balancer

Query Servers

Monolithic Index

Crawler

Users

Key characteristics:

  • Single index copy, simpler consistency
  • Vertical scaling (bigger machines)
  • Batch index rebuilds

Trade-offs:

  • Simpler architecture, easier debugging
  • No distributed coordination overhead
  • Strong consistency guaranteed
  • Limited to single-datacenter scale
  • Index rebuild causes downtime or staleness
  • No geographic redundancy

Real-world example: Elasticsearch single-cluster deployments for enterprise search. Works well up to billions of documents and thousands of QPS. Beyond that, coordination overhead becomes prohibitive.

Best when:

  • Web-scale index (hundreds of billions of documents)
  • Global user base requiring low latency
  • Continuous index updates required (no rebuild windows)

Architecture:

Datacenter 2 (EU)

Datacenter 1 (US)

Global Infrastructure

Index Shards

Index Shards

GeoDNS

Load Balancer

Query Processors

Shard 1

Shard 2

Shard N

Load Balancer

Query Processors

Shard 1

Shard 2

Shard N

Key characteristics:

  • Index partitioned across thousands of machines
  • Each query fans out to all shards in parallel
  • Results aggregated and ranked centrally
  • Index replicated across datacenters for redundancy and latency

Trade-offs:

  • Unlimited horizontal scaling
  • Geographic distribution for low latency
  • Continuous updates (no rebuild windows)
  • Fault tolerance (shard failures don’t affect availability)
  • Distributed coordination complexity
  • Tail latency challenges (slowest shard determines response time)
  • Cross-shard ranking requires careful design

Real-world example: Google Search uses document-based sharding with thousands of shards per datacenter. Index updates propagate continuously; each shard handles a subset of documents independently.

Best when:

  • Query distribution is highly skewed (popular queries dominate)
  • Storage costs are a concern
  • Latency requirements vary by query type

Architecture:

Index Tiers

Miss

Miss

Hot Tier

Memory/SSD

Top 10% pages

Warm Tier

SSD

Next 30% pages

Cold Tier

HDD

Remaining 60%

Query

Key characteristics:

  • Most queries served from hot tier (memory-resident)
  • Warm tier for moderately popular content
  • Cold tier for long-tail queries

Trade-offs:

  • Optimal cost/performance ratio
  • Sub-millisecond latency for popular queries
  • Gradual degradation for rare queries
  • Tiering logic complexity
  • Cache invalidation challenges
  • Cold-start latency spikes

Real-world example: Google combines tiered indexing with sharding. Frequently accessed posting lists stay memory-resident; cold terms live on disk. The system dynamically promotes/demotes based on access patterns.

FactorPath A (Monolithic)Path B (Sharded)Path C (Tiered)
Scale limit~Billions docsUnlimitedUnlimited
Query latencyLow (no fan-out)Higher (aggregation)Varies by tier
Index freshnessBatch updatesContinuousContinuous
ComplexityLowHighMedium
Cost efficiencyLowMediumHigh
Best forEnterprise searchWeb-scale searchCost-sensitive web scale

This article focuses on Path B (Distributed Sharded Index) with Path C (Tiered) optimizations because:

  1. Web-scale search requires horizontal scaling beyond single-datacenter limits
  2. Users expect sub-second latency regardless of location
  3. Modern search combines sharding with tiering for cost efficiency

The design sections show how to build each component (crawler, indexer, ranker, serving layer) for distributed operation while maintaining latency SLOs.

ComponentResponsibilityScale
URL FrontierPrioritized queue of URLs to crawlBillions of URLs
Distributed CrawlerFetch pages, respect politenessMillions of fetches/hour
Content ParserExtract text, links, metadataProcess crawled pages
DeduplicationDetect duplicate/near-duplicate pagesContent fingerprinting
IndexerBuild inverted index from documentsContinuous updates
Index ShardsStore and query posting listsThousands of shards
Query ProcessorParse, expand, route queries100K+ QPS
Ranking EngineScore and order results200+ signals
Result AggregatorMerge results from shardsSub-100ms aggregation
Cache LayerStore frequent query results30-40% hit rate
AggregatorRanking EngineIndex Shards (1000s)Query CacheQuery ProcessorLoad BalancerGeoDNSUserAggregatorRanking EngineIndex Shards (1000s)Query CacheQuery ProcessorLoad BalancerGeoDNSUseralt[Cache hit][Cache miss]search queryRoute to nearest DCForward querySpell correction (<2ms)Intent understandingQuery expansionCheck cacheReturn cached resultsFan out to all shards (parallel)Return top-K per shardMerge candidatesApply PageRank, BERT, RankBrainPersonalizationStore resultReturn ranked results

Storage

Processing

Fetching

URL Frontier

URL Discovery

Seed URLs

XML Sitemaps

Extracted Links

Priority Queue

URL Deduplication

Politeness Scheduler

DNS Resolution

robots.txt Check

HTTP Fetch

JavaScript Rendering

HTML Parsing

Link Extraction

Content Extraction

Fingerprinting

Bigtable

Indexer Queue

GET /search?q=distributed+systems&num=10&start=0
Authorization: Bearer {api_key}
Accept-Language: en-US
X-Forwarded-For: {client_ip}

Query Parameters:

ParameterTypeDescription
qstringSearch query (URL-encoded)
numintResults per page (default: 10, max: 100)
startintOffset for pagination
lrstringLanguage restriction (e.g., lang_en)
glstringGeolocation (country code)
safestringSafe search (off, medium, strict)
dateRestrictstringTime filter (d7, m1, y1)

Response (200 OK):

{
"query": {
"original": "distribted systems",
"corrected": "distributed systems",
"expanded_terms": ["distributed computing", "distributed architecture"]
},
"search_info": {
"total_results": 2340000000,
"search_time_ms": 187,
"spelling_correction_applied": true
},
"results": [
{
"position": 1,
"url": "https://example.com/distributed-systems-guide",
"title": "Distributed Systems: A Comprehensive Guide",
"snippet": "Learn about distributed systems architecture, including consensus algorithms, replication strategies, and fault tolerance...",
"displayed_url": "example.com › guides › distributed-systems",
"cached_url": "https://webcache.example.com/...",
"page_info": {
"last_crawled": "2024-03-15T10:00:00Z",
"language": "en",
"mobile_friendly": true
}
}
],
"related_searches": ["distributed systems design patterns", "distributed systems vs microservices"],
"knowledge_panel": {
"title": "Distributed system",
"description": "A distributed system is a system whose components are located on different networked computers...",
"source": "Wikipedia"
},
"pagination": {
"current_page": 1,
"next_start": 10,
"has_more": true
}
}

Error Responses:

CodeConditionResponse
400 Bad RequestEmpty query, invalid parameters{"error": {"code": "invalid_query"}}
429 Too Many RequestsRate limit exceeded{"error": {"code": "rate_limited", "retry_after": 60}}
503 Service UnavailableSystem overload{"error": {"code": "overloaded"}}
GET /complete?q=distrib&client=web

Response (200 OK):

{
"query": "distrib",
"suggestions": [
{ "text": "distributed systems", "score": 0.95 },
{ "text": "distributed computing", "score": 0.87 },
{ "text": "distribution center near me", "score": 0.72 },
{ "text": "distributed database", "score": 0.68 }
],
"latency_ms": 8
}

Design note: Autocomplete must complete in <100ms. Suggestions come from a separate, highly optimized trie-based index of popular queries, not the main document index.

GET /internal/crawl/status?url=https://example.com/page
Authorization: Internal-Service-Key {key}

Response:

{
"url": "https://example.com/page",
"canonical_url": "https://example.com/page",
"last_crawl": "2024-03-15T08:30:00Z",
"next_scheduled_crawl": "2024-03-16T08:30:00Z",
"crawl_frequency": "daily",
"index_status": "indexed",
"robots_txt_status": "allowed",
"page_quality_score": 0.78
}

Google stores crawled pages in Bigtable with domain-reversed URLs as row keys for efficient range scans of entire domains.

Row Key Design:

com.example.www/page/path → Reversed domain + path

Why reversed domain? Range scans for com.example.* retrieve all pages from example.com efficiently. Forward URLs would scatter domain pages across the keyspace.

Column Families:

Column FamilyColumnsDescription
contenthtml, text, title, metaPage content
linksoutlinks, inlinksLink graph
crawllast_crawl, next_crawl, statusCrawl metadata
indexindexed_at, shard_idIndex status
qualitypagerank, spam_score, mobile_scoreQuality signals

Schema (Conceptual):

Row: com.example.www/distributed-systems
├── content:html → "<html>..."
├── content:text → "Distributed systems are..."
├── content:title → "Distributed Systems Guide"
├── links:outlinks → ["com.other.www/page1", "org.wiki.en/dist"]
├── links:inlinks → ["com.blog.www/article", ...]
├── crawl:last_crawl → 1710489600 (timestamp)
├── crawl:status → "success"
├── quality:pagerank → 0.00042
└── quality:spam_score → 0.02

The inverted index maps terms to posting lists—ordered lists of documents containing that term.

Posting List Structure:

Term: "distributed"
├── Document IDs: [doc_123, doc_456, doc_789, ...]
├── Positions: [[5, 23, 107], [12], [3, 45, 89, 201], ...]
├── Frequencies: [3, 1, 4, ...]
└── Quality hints: [0.9, 0.7, 0.85, ...] # PageRank-based ordering

Compression:

  • Document IDs: Delta encoding (store differences, not absolute values)
    • Original: [100, 105, 112, 150] → Deltas: [100, 5, 7, 38]
    • Smaller integers compress better with variable-byte encoding
  • Positions: Delta encoding within each document
  • Frequencies: Variable-byte encoding

Index Entry (Conceptual Schema):

-- Logical structure (actual implementation uses custom binary format)
term_id: uint64 -- Hashed term
doc_count: uint32 -- Number of documents containing term
posting_list: bytes -- Compressed posting data
├── doc_ids: varint[] -- Delta-encoded document IDs
├── freqs: varint[] -- Term frequencies per doc
└── positions: bytes -- Position data for phrase queries
CREATE TABLE url_frontier (
url_hash BIGINT PRIMARY KEY, -- Hash of normalized URL
url TEXT NOT NULL,
domain_hash BIGINT NOT NULL, -- For politeness grouping
priority FLOAT NOT NULL, -- Crawl priority (0-1)
last_crawl_time TIMESTAMP,
next_crawl_time TIMESTAMP NOT NULL,
crawl_frequency INTERVAL,
retry_count INT DEFAULT 0,
status VARCHAR(20) DEFAULT 'pending',
-- Partitioned by priority for efficient dequeue
INDEX idx_priority (priority DESC, next_crawl_time ASC),
INDEX idx_domain (domain_hash, next_crawl_time ASC)
);

Politeness constraint: Only one outstanding request per domain. The domain_hash index enables efficient per-domain rate limiting.

DataStoreRationale
Crawled pagesBigtablePetabyte scale, row-key range scans
Inverted indexCustom sharded storesOptimized for posting list access
URL frontierDistributed queue (Bigtable + Redis)Priority queue semantics
Query cacheDistributed cache (Memcached-like)Sub-ms latency, high hit rate
PageRank scoresBigtableUpdated periodically, read during indexing
Query logsColumnar store (BigQuery)Analytics, ML training
robots.txt cacheIn-memory cachePer-domain, TTL-based

Building an inverted index from crawled documents at web scale requires careful batching and distributed coordination.

Index Build Pipeline:

Reduce Phase

Shuffle

Map Phase

Input

Crawled Documents

from Bigtable

Tokenize

Normalize

lowercase, stem

Emit

term → doc_id, pos, freq

Partition by Term

Sort by Doc ID

Merge Posting Lists

Delta Encode

+ Compress

Write to Shard

Implementation (Conceptual MapReduce):

index-builder.ts
9 collapsed lines
interface Document {
doc_id: string
url: string
content: string
quality_score: number
}
interface Posting {
doc_id: number
frequency: number
positions: number[]
}
// Map phase: emit (term, posting) pairs
function mapDocument(doc: Document): Map<string, Posting> {
const terms = new Map<string, Posting>()
const tokens = tokenize(doc.content)
for (let pos = 0; pos < tokens.length; pos++) {
const term = normalize(tokens[pos]) // lowercase, stem
if (!terms.has(term)) {
terms.set(term, {
doc_id: hashDocId(doc.doc_id),
frequency: 0,
positions: [],
})
}
const posting = terms.get(term)!
posting.frequency++
posting.positions.push(pos)
}
return terms
}
// Reduce phase: merge postings for same term
function reducePostings(term: string, postings: Posting[]): PostingList {
// Sort by quality-weighted doc_id for early termination optimization
postings.sort((a, b) => b.quality_score - a.quality_score)
return {
term,
doc_count: postings.length,
postings: deltaEncode(postings),
}
}
function deltaEncode(postings: Posting[]): Buffer {
const buffer = new CompressedBuffer()
let prevDocId = 0
10 collapsed lines
for (const posting of postings) {
// Store delta instead of absolute doc_id
buffer.writeVarint(posting.doc_id - prevDocId)
buffer.writeVarint(posting.frequency)
buffer.writePositions(posting.positions)
prevDocId = posting.doc_id
}
return buffer.toBuffer()
}

Index update strategy:

ApproachLatencyComplexityUse Case
Full rebuildHoursLowInitial build, major changes
Incremental mergeMinutesMediumRegular updates
Real-time appendSecondsHighBreaking news, fresh content

Google uses a hybrid: the main index updates incrementally, while a separate “fresh” index handles real-time content with periodic merges.

Ranking

Index Retrieval

Query Understanding

Preprocessing

Query Input

Raw Query

Normalize

Spell Correct

Entity Recognition

Intent Classification

Query Expansion

Query Rewriting

Fan out to Shards

Posting List Intersection

Top-K per Shard

Merge Candidates

Feature Extraction

ML Ranking

Personalization

Spell Correction Implementation:

Google’s spell corrector uses a deep neural network with 680+ million parameters, executing in under 2ms.

spell-correction.ts
7 collapsed lines
interface SpellResult {
original: string
corrected: string
confidence: number
alternatives: string[]
}
async function correctSpelling(query: string): Promise<SpellResult> {
// 1. Check if query is a known valid phrase
if (await isKnownPhrase(query)) {
return { original: query, corrected: query, confidence: 1.0, alternatives: [] }
}
// 2. Run neural spell correction model
const modelOutput = await spellModel.predict(query)
// 3. Consider context: surrounding words affect correction
// "python" after "monty" → don't correct to "python programming"
const contextualCorrection = applyContextRules(query, modelOutput)
// 4. Check correction against query logs (popular queries)
const popularMatch = await findPopularMatch(contextualCorrection)
return {
original: query,
corrected: popularMatch || contextualCorrection,
confidence: modelOutput.confidence,
alternatives: modelOutput.alternatives.slice(0, 3),
}
}

Design insight: Spell correction uses query logs as ground truth. If millions of users search for “javascript” after initially typing “javasript”, the model learns that correction. This is why spell correction works better for common queries than rare technical terms.

Google combines multiple ranking systems, each contributing different signals:

Score Combination

Ranking Systems

Ranking Signals (200+)

Query Signals

terms, intent, freshness need

Document Signals

PageRank, content quality, freshness

User Signals

location, language, history

Context Signals

device, time of day

PageRank

Link authority

TF-IDF

Term relevance

BERT

Semantic matching

RankBrain

Query-doc vectors

Freshness

Time decay

Learned Weights

Final Score

PageRank Computation:

PageRank measures page authority based on link structure. The algorithm models a random web surfer following links.

pagerank.ts
11 collapsed lines
interface PageGraph {
pages: Map<string, string[]> // page → outlinks
inlinks: Map<string, string[]> // page → pages linking to it
}
const DAMPING_FACTOR = 0.85
const CONVERGENCE_THRESHOLD = 0.0001
const MAX_ITERATIONS = 100
function computePageRank(graph: PageGraph): Map<string, number> {
const numPages = graph.pages.size
const initialRank = 1.0 / numPages
// Initialize all pages with equal rank
let ranks = new Map<string, number>()
for (const page of graph.pages.keys()) {
ranks.set(page, initialRank)
}
// Iterate until convergence
for (let iter = 0; iter < MAX_ITERATIONS; iter++) {
const newRanks = new Map<string, number>()
let maxDelta = 0
for (const page of graph.pages.keys()) {
// Sum of (rank / outlink_count) for all pages linking to this page
let inlinkSum = 0
const inlinks = graph.inlinks.get(page) || []
for (const inlink of inlinks) {
const inlinkRank = ranks.get(inlink) || 0
const outlinks = graph.pages.get(inlink) || []
if (outlinks.length > 0) {
inlinkSum += inlinkRank / outlinks.length
}
}
// PageRank formula: PR(A) = (1-d)/N + d * sum(PR(Ti)/C(Ti))
const newRank = (1 - DAMPING_FACTOR) / numPages + DAMPING_FACTOR * inlinkSum
newRanks.set(page, newRank)
maxDelta = Math.max(maxDelta, Math.abs(newRank - (ranks.get(page) || 0)))
}
ranks = newRanks
if (maxDelta < CONVERGENCE_THRESHOLD) {
break // Converged
5 collapsed lines
}
}
return ranks
}

PageRank at scale:

  • Full web graph: 400B+ nodes, trillions of edges
  • Computation: Distributed MapReduce across thousands of machines
  • Frequency: Recomputed periodically (historically monthly, now more frequent)
  • Storage: PageRank scores stored with documents in Bigtable

BERT for Ranking:

BERT (Bidirectional Encoder Representations from Transformers) understands semantic meaning, not just keyword matching.

Query: "can you get medicine for someone pharmacy"
Without BERT: Matches pages about "medicine" and "pharmacy" separately
With BERT: Understands intent = "picking up prescription for another person"

RankBrain:

RankBrain converts queries and documents to vectors in a shared embedding space. Semantic similarity is measured by vector distance.

Query vector: [0.23, -0.45, 0.12, ...] (300+ dimensions)
Doc vector: [0.21, -0.42, 0.15, ...]
Similarity: cosine_similarity(query_vec, doc_vec) = 0.94

Querying a sharded index requires fan-out to all shards, parallel execution, and result aggregation.

query-executor.ts
14 collapsed lines
interface ShardResult {
shard_id: number
results: ScoredDocument[]
latency_ms: number
}
interface QueryPlan {
query: ParsedQuery
shards: ShardConnection[]
timeout_ms: number
top_k_per_shard: number
}
async function executeQuery(plan: QueryPlan): Promise<SearchResult[]> {
const { query, shards, timeout_ms, top_k_per_shard } = plan
// Fan out to all shards in parallel
const shardPromises = shards.map((shard) =>
queryShard(shard, query, top_k_per_shard).catch((err) => ({
shard_id: shard.id,
results: [],
latency_ms: timeout_ms,
error: err,
})),
)
// Wait for all shards with timeout
const shardResults = await Promise.race([Promise.all(shardPromises), sleep(timeout_ms).then(() => "timeout")])
if (shardResults === "timeout") {
// Return partial results from completed shards
return aggregatePartialResults(shardPromises)
}
// Merge results from all shards
return mergeAndRank(shardResults as ShardResult[], query)
}
function mergeAndRank(shardResults: ShardResult[], query: ParsedQuery): SearchResult[] {
// Collect all candidates
const candidates: ScoredDocument[] = []
for (const result of shardResults) {
candidates.push(...result.results)
}
// Global ranking across all shards
// Shard-local scores are comparable because same scoring function
candidates.sort((a, b) => b.score - a.score)
// Apply final ranking (BERT, personalization)
const reranked = applyFinalRanking(candidates.slice(0, 1000), query)
return reranked.slice(0, query.num_results)
}
async function queryShard(shard: ShardConnection, query: ParsedQuery, topK: number): Promise<ShardResult> {
const start = Date.now()
// 1. Retrieve posting lists for query terms
const postingLists = await shard.getPostingLists(query.terms)
// 2. Intersect posting lists (for AND queries)
const candidates = intersectPostingLists(postingLists)
// 3. Score candidates using local signals
const scored = candidates.map((doc) => ({
doc,
score: computeLocalScore(doc, query),
11 collapsed lines
}))
// 4. Return top-K
scored.sort((a, b) => b.score - a.score)
return {
shard_id: shard.id,
results: scored.slice(0, topK),
latency_ms: Date.now() - start,
}
}

Tail latency challenge: With 1000 shards, even 99th percentile shard latency affects median query latency. Mitigations:

TechniqueDescription
Hedged requestsSend duplicate requests to replica shards, use first response
Partial resultsReturn results even if some shards timeout
Early terminationStop when enough high-quality results found
Shard rebalancingMove hot shards to faster machines
crawl-scheduler.ts
11 collapsed lines
interface CrawlJob {
url: string
domain: string
priority: number
lastCrawl: Date | null
estimatedChangeRate: number
}
interface DomainState {
lastRequestTime: Date
crawlDelay: number // From robots.txt or adaptive
concurrentRequests: number
maxConcurrent: number
}
class CrawlScheduler {
private domainStates: Map<string, DomainState> = new Map()
private frontier: PriorityQueue<CrawlJob>
async scheduleNext(): Promise<CrawlJob | null> {
while (!this.frontier.isEmpty()) {
const job = this.frontier.peek()
// Check politeness constraints
const domainState = this.getDomainState(job.domain)
if (!this.canCrawlNow(domainState)) {
// Can't crawl this domain yet, try next
this.frontier.pop()
this.frontier.push(job) // Re-add with delay
continue
}
// Acquire crawl slot for this domain
if (domainState.concurrentRequests >= domainState.maxConcurrent) {
continue
}
domainState.concurrentRequests++
domainState.lastRequestTime = new Date()
return this.frontier.pop()
}
return null
}
private canCrawlNow(state: DomainState): boolean {
const elapsed = Date.now() - state.lastRequestTime.getTime()
return elapsed >= state.crawlDelay * 1000
}
// Adaptive crawl delay based on server response
updateCrawlDelay(domain: string, responseTimeMs: number, statusCode: number): void {
const state = this.getDomainState(domain)
if (statusCode === 429 || statusCode === 503) {
// Server is overloaded, back off exponentially
10 collapsed lines
state.crawlDelay = Math.min(state.crawlDelay * 2, 60)
} else if (responseTimeMs > 2000) {
// Slow response, increase delay
state.crawlDelay = Math.min(state.crawlDelay * 1.5, 30)
} else if (responseTimeMs < 200 && state.crawlDelay > 1) {
// Fast response, can crawl more aggressively
state.crawlDelay = Math.max(state.crawlDelay * 0.9, 1)
}
}
}

Crawl prioritization factors:

FactorWeightRationale
PageRankHighImportant pages should be fresh
Update frequencyHighPages that change often need frequent crawls
User demandHighPopular query results need freshness
Sitemap priorityMediumWebmaster hints
Time since last crawlMediumSpread crawl load
robots.txt crawl-delayMandatoryRespect server limits

The Search Engine Results Page (SERP) must render quickly despite complex content (rich snippets, knowledge panels, images).

Critical rendering path:

serp-rendering.ts
9 collapsed lines
interface SearchResultsPage {
query: string
results: SearchResult[]
knowledgePanel?: KnowledgePanel
relatedSearches: string[]
}
// Server-side render critical content
function renderSERP(data: SearchResultsPage): string {
// 1. Inline critical CSS for above-the-fold content
const criticalCSS = extractCriticalCSS()
// 2. Render first 3 results server-side (no JS needed)
const initialResults = data.results.slice(0, 3).map(renderResult).join("")
// 3. Defer non-critical content
const deferredContent = `
<script>
// Hydrate remaining results after initial paint
window.__SERP_DATA__ = ${JSON.stringify(data)};
</script>
`
return `
<html>
<head>
<style>${criticalCSS}</style>
</head>
<body>
<div id="results">${initialResults}</div>
<div id="deferred"></div>
${deferredContent}
<script src="/serp.js" defer></script>
</body>
</html>
`
}

Performance optimizations:

TechniqueImpactImplementation
Server-side renderingFCP < 500msRender first 3 results on server
Critical CSS inliningNo render blockingExtract above-fold styles
Lazy loadingReduced initial payloadLoad images/rich snippets on scroll
PrefetchingFaster result clicksPrefetch top result on hover
Service workerOffline + instant repeatCache static assets, query history
autocomplete.ts
7 collapsed lines
class AutocompleteController {
private debounceMs = 100
private minChars = 2
private cache: Map<string, string[]> = new Map()
async handleInput(query: string): Promise<string[]> {
if (query.length < this.minChars) {
return []
}
// Check cache first
const cached = this.cache.get(query)
if (cached) {
return cached
}
// Debounce rapid keystrokes
await this.debounce()
// Fetch suggestions
const suggestions = await this.fetchSuggestions(query)
// Cache for repeat queries
this.cache.set(query, suggestions)
// Prefetch likely next queries
this.prefetchNextCharacter(query)
return suggestions
}
private prefetchNextCharacter(query: string): void {
// Prefetch common next characters
const commonNextChars = ["a", "e", "i", "o", "s", "t", " "]
for (const char of commonNextChars) {
const nextQuery = query + char
if (!this.cache.has(nextQuery)) {
// Low-priority background fetch
requestIdleCallback(() => this.fetchSuggestions(nextQuery))
}
}
}
}

Autocomplete latency budget:

Total: 100ms target
├── Network RTT: 30ms (edge servers)
├── Server processing: 20ms
├── Trie lookup: 5ms
├── Ranking: 10ms
├── Response serialization: 5ms
└── Client rendering: 30ms

Google uses traditional pagination rather than infinite scroll. Design rationale:

FactorPaginationInfinite Scroll
User mental modelClear position in resultsLost context
Sharing results”Page 2” is meaningfulNo way to share position
Back buttonWorks as expectedLoses scroll position
PerformanceBounded DOM sizeUnbounded growth
SEO resultsUsers evaluate before clickingScroll past quickly
ComponentPurposeRequirements
Distributed storagePage content, indexPetabyte scale, strong consistency
Distributed computeIndex building, rankingHorizontal scaling, fault tolerance
Message queueCrawl job distributionAt-least-once, priority queues
Cache layerQuery results, posting listsSub-ms latency, high throughput
CDNStatic assets, edge servingGlobal distribution
DNSGeographic routingLow latency, health checking
ComponentGoogle ServicePurpose
StorageBigtable + ColossusStructured data + distributed file system
ComputeBorgContainer orchestration
MapReduceMapReduce / FlumeBatch processing
RPCStubby (gRPC predecessor)Service communication
MonitoringBorgmon (Prometheus inspiration)Metrics and alerting
ConsensusChubby (ZooKeeper inspiration)Distributed locking

Queue Layer

Cache Layer

Storage Layer

Compute Layer

Edge Layer

CloudFront

Route 53

GeoDNS

Application Load Balancer

ECS Fargate

Query Servers

AWS Batch

Index Building

DynamoDB

URL Frontier

S3

Raw Pages

OpenSearch

Inverted Index

ElastiCache

Query Cache

DAX

DynamoDB Cache

SQS

Crawl Jobs

Kinesis

Index Updates

Service sizing (for ~10K QPS, 1B documents):

ServiceConfigurationCost Estimate
OpenSearch20 × i3.2xlarge data nodes~$50K/month
ECS Fargate50 × 4vCPU/8GB tasks~$15K/month
ElastiCache10 × r6g.xlarge nodes~$5K/month
DynamoDBOn-demand, ~100K WCU~$10K/month
S3100TB storage~$2K/month

Note: This is a simplified reference. Google’s actual infrastructure is 1000x larger and uses custom hardware/software unavailable commercially.

ComponentTechnologyNotes
Search engineElasticsearch / SolrProven at billion-doc scale
StorageCassandra / ScyllaDBWide-column store like Bigtable
CrawlerApache Nutch / StormCrawlerDistributed web crawling
QueueKafkaCrawl job distribution
ComputeKubernetesContainer orchestration
CacheRedis ClusterQuery and posting list cache

News search prioritizes freshness over traditional ranking signals.

news-ranking.ts
function computeNewsScore(doc: NewsDocument, query: Query): number {
const baseRelevance = computeTextRelevance(doc, query)
const authorityScore = doc.sourceAuthority // CNN > random blog
const freshnessScore = computeFreshnessDecay(doc.publishedAt)
// Freshness dominates for news queries
return baseRelevance * 0.3 + authorityScore * 0.2 + freshnessScore * 0.5
}
function computeFreshnessDecay(publishedAt: Date): number {
const ageHours = (Date.now() - publishedAt.getTime()) / (1000 * 60 * 60)
// Exponential decay: half-life of ~6 hours for breaking news
return Math.exp(-ageHours / 8)
}

News-specific infrastructure:

  • Dedicated “fresh” index updated in real-time
  • RSS/Atom feed crawling every few minutes
  • Publisher push APIs for instant indexing
  • Separate ranking model trained on news engagement

Image search combines visual features with text signals.

image-search.ts
interface ImageDocument {
imageUrl: string
pageUrl: string
altText: string
surroundingText: string
visualFeatures: number[] // CNN embeddings
safeSearchScore: number
}
function rankImageResult(image: ImageDocument, query: Query): number {
// Text signals from alt text and page context
const textScore = computeTextRelevance(image.altText + " " + image.surroundingText, query)
// Visual similarity to query (if query has image)
const visualScore = query.hasImage ? cosineSimilarity(image.visualFeatures, query.imageFeatures) : 0
// Page authority
const pageScore = getPageRank(image.pageUrl)
return textScore * 0.4 + visualScore * 0.3 + pageScore * 0.3
}

Location-aware search requires geographic indexing.

local-search.ts
interface LocalBusiness {
id: string
name: string
category: string
location: { lat: number; lng: number }
rating: number
reviewCount: number
}
function rankLocalResult(business: LocalBusiness, query: Query, userLocation: Location): number {
const relevanceScore = computeTextRelevance(business.name + " " + business.category, query)
// Distance decay: closer is better
const distance = haversineDistance(userLocation, business.location)
const distanceScore = 1 / (1 + distance / 5) // 5km reference distance
// Quality signals
const qualityScore = business.rating * Math.log(business.reviewCount + 1)
return relevanceScore * 0.3 + distanceScore * 0.4 + qualityScore * 0.3
}

Local search infrastructure:

  • Geospatial index (R-tree or geohash-based)
  • Business database integration (Google My Business)
  • Real-time hours/availability from APIs
  • User location from GPS, IP, or explicit setting

Web search design requires solving four interconnected problems at planetary scale:

  1. Crawling — Discovering and fetching content from billions of URLs while respecting server limits. Prioritization determines which pages stay fresh; adaptive politeness prevents overloading origin servers. The crawler is never “done”—the web changes continuously.

  2. Indexing — Building data structures that enable sub-second query response. Inverted indexes map terms to documents; sharding distributes the index across thousands of machines. Compression (delta encoding) reduces storage 5-10x while maintaining query speed.

  3. Ranking — Combining hundreds of signals to surface relevant results. PageRank provides baseline authority from link structure; BERT understands semantic meaning; RankBrain matches queries to documents in embedding space. No single signal dominates—the combination matters.

  4. Serving — Processing 100K+ QPS with sub-second latency. Fan-out to all shards, aggregate results, apply final ranking—all within 200ms. Caching handles the long tail; early termination stops when good results are found.

What this design optimizes for:

  • Query latency: p50 < 200ms through caching, early termination, and parallel shard queries
  • Index freshness: Minutes for news, hours for regular content through tiered crawling
  • Result relevance: Multiple ranking systems (PageRank + BERT + RankBrain) cover different relevance aspects
  • Horizontal scale: Sharded architecture scales to 400B+ documents

What it sacrifices:

  • Simplicity: Thousands of components, multiple ranking systems, complex coordination
  • Cost: Massive infrastructure (estimated millions of servers)
  • Real-time indexing: Minutes to hours delay for most content (news excepted)

Known limitations:

  • Long-tail queries may have poor results (insufficient training data)
  • Adversarial SEO requires constant ranking updates
  • Fresh content from new sites may take weeks to surface
  • Personalization creates filter bubbles
  • Information retrieval fundamentals (TF-IDF, inverted indexes)
  • Distributed systems concepts (sharding, replication, consensus)
  • Basic machine learning (embeddings, neural networks)
  • Graph algorithms (PageRank, link analysis)
  • Inverted Index — Data structure mapping terms to documents containing them
  • Posting List — List of documents (with positions/frequencies) for a single term
  • PageRank — Algorithm measuring page importance based on link structure
  • BERT — Bidirectional Encoder Representations from Transformers; understands word context
  • RankBrain — Google’s ML system for query-document matching via embeddings
  • Crawl Budget — Maximum pages a crawler will fetch from a domain in a time period
  • robots.txt — File specifying crawler access rules for a website
  • QDF — Query Deserves Freshness; flag indicating time-sensitive queries
  • SERP — Search Engine Results Page
  • Canonical URL — Preferred URL when multiple URLs have duplicate content
  • Web search processes 8.5B queries/day across 400B+ indexed pages with sub-second latency
  • Inverted indexes enable O(1) term lookup; sharding distributes load across thousands of machines
  • PageRank measures page authority via link analysis; BERT/RankBrain add semantic understanding
  • Crawl prioritization balances freshness vs. coverage; politeness respects server limits
  • Query processing includes spell correction (680M param DNN), intent understanding, and query expansion
  • Tiered indexing keeps hot data in memory; cold data on disk for cost efficiency
  • Early termination and caching reduce tail latency; hedged requests handle slow shards

Read more

  • Previous

    Design a Web Crawler

    System Design / System Design Problems 29 min read

    A comprehensive system design for a web-scale crawler that discovers, downloads, and indexes billions of pages. This design addresses URL frontier management with politeness constraints, distributed crawling at scale, duplicate detection, and freshness maintenance across petabytes of web content.

  • Next

    Design Collaborative Document Editing (Google Docs)

    System Design / System Design Problems 19 min read

    A comprehensive system design for real-time collaborative document editing covering synchronization algorithms, presence broadcasting, conflict resolution, storage patterns, and offline support. This design addresses sub-second convergence for concurrent edits while maintaining document history and supporting 10-50 simultaneous editors.