Design Google Search

Building a web-scale search engine that processes 8.5 billion queries daily across 400+ billion indexed pages with sub-second latency. Search engines solve the fundamental information retrieval problem: given a query, return the most relevant documents from a massive corpus—instantly. This design covers crawling (web discovery), indexing (content organization), ranking (relevance scoring), and serving (query processing)—the four pillars that make search work at planetary scale.

Google Search architecture: Queries flow through spell correction and intent understanding, then fan out to distributed index shards. Results aggregate through ranking systems (PageRank, BERT, RankBrain) before returning. Crawlers continuously feed fresh content into the index via Bigtable storage.

Abstract

Web search design revolves around four interconnected systems, each with distinct scale challenges:

Crawling — Discover and fetch the web’s content. The challenge: billions of pages change constantly, but crawl resources are finite. Prioritization (popular pages crawled hourly; obscure pages monthly) and politeness (respecting server limits) determine coverage quality.
Indexing — Transform raw HTML into queryable data structures. Inverted indexes map every term to its posting list (documents containing that term). Sharding distributes the index across thousands of machines; tiered storage keeps hot data in memory.
Ranking — Score document relevance for a given query. PageRank (link analysis) provides baseline authority; modern systems layer BERT (semantic understanding), RankBrain (query-result matching), and 200+ other signals. Ranking quality directly determines user satisfaction.
Serving — Process queries with sub-second latency. Fan out to all index shards in parallel, aggregate results, apply final ranking, and return—all within 200-500ms. Caching frequent queries reduces load; early termination stops when good results are found.

Component	Scale	Key Trade-off
Crawling	25B URLs discovered/day	Freshness vs. coverage (can’t crawl everything)
Indexing	400B+ documents	Storage cost vs. query speed (compression trade-offs)
Ranking	200+ signals per query	Latency vs. ranking quality (more signals = slower)
Serving	100K+ QPS peak	Completeness vs. speed (early termination)

The mental model: crawl → parse → index → rank → serve. Each stage operates independently but feeds the next. Freshness propagates from crawl to index to results over hours to days depending on page importance.

Requirements

Functional Requirements

Feature	Scope	Notes
Web search	Core	Return ranked results for text queries
Autocomplete	Core	Suggest queries as user types
Spell correction	Core	Fix typos, suggest alternatives
Image search	Extended	Search by image content/metadata
News search	Extended	Time-sensitive, freshness-critical
Local search	Extended	Location-aware results
Knowledge panels	Extended	Direct answers from knowledge graph
Personalization	Core	Location, language, search history
Safe search	Core	Filter explicit content
Pagination	Core	Navigate through result pages

Non-Functional Requirements

Requirement	Target	Rationale
Query latency	p50 < 200ms, p99 < 500ms	User abandonment increases 20% per 100ms delay
Autocomplete latency	p99 < 100ms	Must feel instantaneous while typing
Availability	99.99%	Revenue-critical; billions of queries daily
Index freshness	Minutes for news, hours for regular pages	Query Deserves Freshness (QDF) for time-sensitive topics
Index coverage	400B+ pages	Comprehensive web coverage
Crawl politeness	Respect robots.txt, adaptive rate limiting	Avoid overloading origin servers
Result relevance	High precision in top 10 results	Users rarely scroll past first page

Scale Estimation

Query Traffic:

1
Daily queries: 8.5 billion
2
QPS (average): 8.5B / 86,400 = ~100,000 QPS
3
QPS (peak): 3x average = ~300,000 QPS
4
Autocomplete: 10x queries (every keystroke) = 1M+ RPS

Index Size:

1
Indexed pages: 400+ billion documents
2
Average page size (compressed): 100KB
3
Raw storage: 400B × 100KB = 40 exabytes
4
With compression + deduplication: ~1-5 exabytes
5
Index size (inverted index): ~10-20% of raw = 100s of petabytes

Crawl Volume:

1
URLs discovered daily: 25+ billion
2
Pages crawled daily: ~billions (prioritized subset)
3
Bandwidth: Petabytes per day
4
Crawl rate per domain: 1-10 requests/second (politeness-limited)

Storage Infrastructure:

1
Bigtable clusters: Thousands of machines
2
Colossus clusters: Multiple exabytes each (some exceed 10EB)
3
Index shards: Thousands across global datacenters
4
Replication factor: 3x minimum for durability

Design Paths

Path A: Monolithic Index (Single Datacenter)

Best when:

Index fits on a single machine cluster
Query volume is moderate (<10K QPS)
Freshness requirements are relaxed (daily updates acceptable)

Architecture:

Key characteristics:

Single index copy, simpler consistency
Vertical scaling (bigger machines)
Batch index rebuilds

Trade-offs:

✅ Simpler architecture, easier debugging
✅ No distributed coordination overhead
✅ Strong consistency guaranteed
❌ Limited to single-datacenter scale
❌ Index rebuild causes downtime or staleness
❌ No geographic redundancy

Real-world example: Elasticsearch single-cluster deployments for enterprise search. Works well up to billions of documents and thousands of QPS. Beyond that, coordination overhead becomes prohibitive.

Path B: Distributed Sharded Index (Google’s Approach)

Best when:

Web-scale index (hundreds of billions of documents)
Global user base requiring low latency
Continuous index updates required (no rebuild windows)

Architecture:

Key characteristics:

Index partitioned across thousands of machines
Each query fans out to all shards in parallel
Results aggregated and ranked centrally
Index replicated across datacenters for redundancy and latency

Trade-offs:

✅ Unlimited horizontal scaling
✅ Geographic distribution for low latency
✅ Continuous updates (no rebuild windows)
✅ Fault tolerance (shard failures don’t affect availability)
❌ Distributed coordination complexity
❌ Tail latency challenges (slowest shard determines response time)
❌ Cross-shard ranking requires careful design

Real-world example: Google Search uses document-based sharding with thousands of shards per datacenter. Index updates propagate continuously; each shard handles a subset of documents independently.

Path C: Tiered Index (Hot/Cold Separation)

Best when:

Query distribution is highly skewed (popular queries dominate)
Storage costs are a concern
Latency requirements vary by query type

Architecture:

Key characteristics:

Most queries served from hot tier (memory-resident)
Warm tier for moderately popular content
Cold tier for long-tail queries

Trade-offs:

✅ Optimal cost/performance ratio
✅ Sub-millisecond latency for popular queries
✅ Gradual degradation for rare queries
❌ Tiering logic complexity
❌ Cache invalidation challenges
❌ Cold-start latency spikes

Real-world example: Google combines tiered indexing with sharding. Frequently accessed posting lists stay memory-resident; cold terms live on disk. The system dynamically promotes/demotes based on access patterns.

Path Comparison

Factor	Path A (Monolithic)	Path B (Sharded)	Path C (Tiered)
Scale limit	~Billions docs	Unlimited	Unlimited
Query latency	Low (no fan-out)	Higher (aggregation)	Varies by tier
Index freshness	Batch updates	Continuous	Continuous
Complexity	Low	High	Medium
Cost efficiency	Low	Medium	High
Best for	Enterprise search	Web-scale search	Cost-sensitive web scale

This Article’s Focus

This article focuses on Path B (Distributed Sharded Index) with Path C (Tiered) optimizations because:

Web-scale search requires horizontal scaling beyond single-datacenter limits
Users expect sub-second latency regardless of location
Modern search combines sharding with tiering for cost efficiency

The design sections show how to build each component (crawler, indexer, ranker, serving layer) for distributed operation while maintaining latency SLOs.

High-Level Design

Component Overview

Component	Responsibility	Scale
URL Frontier	Prioritized queue of URLs to crawl	Billions of URLs
Distributed Crawler	Fetch pages, respect politeness	Millions of fetches/hour
Content Parser	Extract text, links, metadata	Process crawled pages
Deduplication	Detect duplicate/near-duplicate pages	Content fingerprinting
Indexer	Build inverted index from documents	Continuous updates
Index Shards	Store and query posting lists	Thousands of shards
Query Processor	Parse, expand, route queries	100K+ QPS
Ranking Engine	Score and order results	200+ signals
Result Aggregator	Merge results from shards	Sub-100ms aggregation
Cache Layer	Store frequent query results	30-40% hit rate

Request Flow

Crawl Pipeline

API Design

Search Query

1
GET /search?q=distributed+systems&num=10&start=0
2
Authorization: Bearer {api_key}
3
Accept-Language: en-US
4
X-Forwarded-For: {client_ip}

Query Parameters:

Parameter	Type	Description
`q`	string	Search query (URL-encoded)
`num`	int	Results per page (default: 10, max: 100)
`start`	int	Offset for pagination
`lr`	string	Language restriction (e.g., `lang_en`)
`gl`	string	Geolocation (country code)
`safe`	string	Safe search (`off`, `medium`, `strict`)
`dateRestrict`	string	Time filter (`d7`, `m1`, `y1`)

Response (200 OK):

1
{
2
  "query": {
3
    "original": "distribted systems",
4
    "corrected": "distributed systems",
5
    "expanded_terms": ["distributed computing", "distributed architecture"]
6
  },
7
  "search_info": {
8
    "total_results": 2340000000,
9
    "search_time_ms": 187,
10
    "spelling_correction_applied": true
11
  },
12
  "results": [
13
    {
14
      "position": 1,
15
      "url": "https://example.com/distributed-systems-guide",
16
      "title": "Distributed Systems: A Comprehensive Guide",
17
      "snippet": "Learn about distributed systems architecture, including consensus algorithms, replication strategies, and fault tolerance...",
18
      "displayed_url": "example.com › guides › distributed-systems",
19
      "cached_url": "https://webcache.example.com/...",
20
      "page_info": {
21
        "last_crawled": "2024-03-15T10:00:00Z",
22
        "language": "en",
23
        "mobile_friendly": true
24
      }
25
    }
26
  ],
27
  "related_searches": ["distributed systems design patterns", "distributed systems vs microservices"],
28
  "knowledge_panel": {
29
    "title": "Distributed system",
30
    "description": "A distributed system is a system whose components are located on different networked computers...",
31
    "source": "Wikipedia"
32
  },
33
  "pagination": {
34
    "current_page": 1,
35
    "next_start": 10,
36
    "has_more": true
37
  }
38
}

Error Responses:

Code	Condition	Response
`400 Bad Request`	Empty query, invalid parameters	`{"error": {"code": "invalid_query"}}`
`429 Too Many Requests`	Rate limit exceeded	`{"error": {"code": "rate_limited", "retry_after": 60}}`
`503 Service Unavailable`	System overload	`{"error": {"code": "overloaded"}}`

Autocomplete

1
GET /complete?q=distrib&client=web

Response (200 OK):

1
{
2
  "query": "distrib",
3
  "suggestions": [
4
    { "text": "distributed systems", "score": 0.95 },
5
    { "text": "distributed computing", "score": 0.87 },
6
    { "text": "distribution center near me", "score": 0.72 },
7
    { "text": "distributed database", "score": 0.68 }
8
  ],
9
  "latency_ms": 8
10
}

Design note: Autocomplete must complete in <100ms. Suggestions come from a separate, highly optimized trie-based index of popular queries, not the main document index.

Crawl Status (Internal API)

1
GET /internal/crawl/status?url=https://example.com/page
2
Authorization: Internal-Service-Key {key}

Response:

1
{
2
  "url": "https://example.com/page",
3
  "canonical_url": "https://example.com/page",
4
  "last_crawl": "2024-03-15T08:30:00Z",
5
  "next_scheduled_crawl": "2024-03-16T08:30:00Z",
6
  "crawl_frequency": "daily",
7
  "index_status": "indexed",
8
  "robots_txt_status": "allowed",
9
  "page_quality_score": 0.78
10
}

Data Modeling

Document Storage (Bigtable)

Google stores crawled pages in Bigtable with domain-reversed URLs as row keys for efficient range scans of entire domains.

Row Key Design:

1
com.example.www/page/path → Reversed domain + path

Why reversed domain? Range scans for com.example.* retrieve all pages from example.com efficiently. Forward URLs would scatter domain pages across the keyspace.

Column Families:

Column Family	Columns	Description
`content`	`html`, `text`, `title`, `meta`	Page content
`links`	`outlinks`, `inlinks`	Link graph
`crawl`	`last_crawl`, `next_crawl`, `status`	Crawl metadata
`index`	`indexed_at`, `shard_id`	Index status
`quality`	`pagerank`, `spam_score`, `mobile_score`	Quality signals

Schema (Conceptual):

1
Row: com.example.www/distributed-systems
2
├── content:html        → "<html>..."
3
├── content:text        → "Distributed systems are..."
4
├── content:title       → "Distributed Systems Guide"
5
├── links:outlinks      → ["com.other.www/page1", "org.wiki.en/dist"]
6
├── links:inlinks       → ["com.blog.www/article", ...]
7
├── crawl:last_crawl    → 1710489600 (timestamp)
8
├── crawl:status        → "success"
9
├── quality:pagerank    → 0.00042
10
└── quality:spam_score  → 0.02

Inverted Index Structure

The inverted index maps terms to posting lists—ordered lists of documents containing that term.

Posting List Structure:

1
Term: "distributed"
2
├── Document IDs: [doc_123, doc_456, doc_789, ...]
3
├── Positions:    [[5, 23, 107], [12], [3, 45, 89, 201], ...]
4
├── Frequencies:  [3, 1, 4, ...]
5
└── Quality hints: [0.9, 0.7, 0.85, ...]  # PageRank-based ordering

Compression:

Document IDs: Delta encoding (store differences, not absolute values)
- Original: [100, 105, 112, 150] → Deltas: [100, 5, 7, 38]
- Smaller integers compress better with variable-byte encoding
Positions: Delta encoding within each document
Frequencies: Variable-byte encoding

Index Entry (Conceptual Schema):

1
-- Logical structure (actual implementation uses custom binary format)
2
term_id: uint64          -- Hashed term
3
doc_count: uint32        -- Number of documents containing term
4
posting_list: bytes      -- Compressed posting data
5
  ├── doc_ids: varint[]  -- Delta-encoded document IDs
6
  ├── freqs: varint[]    -- Term frequencies per doc
7
  └── positions: bytes   -- Position data for phrase queries

URL Frontier Schema

1
CREATE TABLE url_frontier (
2
    url_hash BIGINT PRIMARY KEY,      -- Hash of normalized URL
3
    url TEXT NOT NULL,
4
    domain_hash BIGINT NOT NULL,       -- For politeness grouping
5
    priority FLOAT NOT NULL,           -- Crawl priority (0-1)
6
    last_crawl_time TIMESTAMP,
7
    next_crawl_time TIMESTAMP NOT NULL,
8
    crawl_frequency INTERVAL,
9
    retry_count INT DEFAULT 0,
10
    status VARCHAR(20) DEFAULT 'pending',
11

12
    -- Partitioned by priority for efficient dequeue
13
    INDEX idx_priority (priority DESC, next_crawl_time ASC),
14
    INDEX idx_domain (domain_hash, next_crawl_time ASC)
15
);

Politeness constraint: Only one outstanding request per domain. The domain_hash index enables efficient per-domain rate limiting.

Storage Selection Matrix

Data	Store	Rationale
Crawled pages	Bigtable	Petabyte scale, row-key range scans
Inverted index	Custom sharded stores	Optimized for posting list access
URL frontier	Distributed queue (Bigtable + Redis)	Priority queue semantics
Query cache	Distributed cache (Memcached-like)	Sub-ms latency, high hit rate
PageRank scores	Bigtable	Updated periodically, read during indexing
Query logs	Columnar store (BigQuery)	Analytics, ML training
robots.txt cache	In-memory cache	Per-domain, TTL-based

Low-Level Design

Inverted Index Construction

Building an inverted index from crawled documents at web scale requires careful batching and distributed coordination.

Index Build Pipeline:

Implementation (Conceptual MapReduce):


9 collapsed lines
1
interface Document {
2
  doc_id: string
3
  url: string
4
  content: string
5
  quality_score: number
6
}
7

8
interface Posting {
9
  doc_id: number
10
  frequency: number
11
  positions: number[]
12
}
13

14
// Map phase: emit (term, posting) pairs
15
function mapDocument(doc: Document): Map<string, Posting> {
16
  const terms = new Map<string, Posting>()
17
  const tokens = tokenize(doc.content)
18

19
  for (let pos = 0; pos < tokens.length; pos++) {
20
    const term = normalize(tokens[pos]) // lowercase, stem
21

22
    if (!terms.has(term)) {
23
      terms.set(term, {
24
        doc_id: hashDocId(doc.doc_id),
25
        frequency: 0,
26
        positions: [],
27
      })
28
    }
29

30
    const posting = terms.get(term)!
31
    posting.frequency++
32
    posting.positions.push(pos)
33
  }
34

35
  return terms
36
}
37

38
// Reduce phase: merge postings for same term
39
function reducePostings(term: string, postings: Posting[]): PostingList {
40
  // Sort by quality-weighted doc_id for early termination optimization
41
  postings.sort((a, b) => b.quality_score - a.quality_score)
42

43
  return {
44
    term,
45
    doc_count: postings.length,
46
    postings: deltaEncode(postings),
47
  }
48
}
49

50
function deltaEncode(postings: Posting[]): Buffer {
51
  const buffer = new CompressedBuffer()
52
  let prevDocId = 0
53

10 collapsed lines
54
  for (const posting of postings) {
55
    // Store delta instead of absolute doc_id
56
    buffer.writeVarint(posting.doc_id - prevDocId)
57
    buffer.writeVarint(posting.frequency)
58
    buffer.writePositions(posting.positions)
59
    prevDocId = posting.doc_id
60
  }
61

62
  return buffer.toBuffer()
63
}

Index update strategy:

Approach	Latency	Complexity	Use Case
Full rebuild	Hours	Low	Initial build, major changes
Incremental merge	Minutes	Medium	Regular updates
Real-time append	Seconds	High	Breaking news, fresh content

Google uses a hybrid: the main index updates incrementally, while a separate “fresh” index handles real-time content with periodic merges.

Query Processing Pipeline

Spell Correction Implementation:

Google’s spell corrector uses a deep neural network with 680+ million parameters, executing in under 2ms.


7 collapsed lines
1
interface SpellResult {
2
  original: string
3
  corrected: string
4
  confidence: number
5
  alternatives: string[]
6
}
7

8
async function correctSpelling(query: string): Promise<SpellResult> {
9
  // 1. Check if query is a known valid phrase
10
  if (await isKnownPhrase(query)) {
11
    return { original: query, corrected: query, confidence: 1.0, alternatives: [] }
12
  }
13

14
  // 2. Run neural spell correction model
15
  const modelOutput = await spellModel.predict(query)
16

17
  // 3. Consider context: surrounding words affect correction
18
  // "python" after "monty" → don't correct to "python programming"
19
  const contextualCorrection = applyContextRules(query, modelOutput)
20

21
  // 4. Check correction against query logs (popular queries)
22
  const popularMatch = await findPopularMatch(contextualCorrection)
23

24
  return {
25
    original: query,
26
    corrected: popularMatch || contextualCorrection,
27
    confidence: modelOutput.confidence,
28
    alternatives: modelOutput.alternatives.slice(0, 3),
29
  }
30
}

Design insight: Spell correction uses query logs as ground truth. If millions of users search for “javascript” after initially typing “javasript”, the model learns that correction. This is why spell correction works better for common queries than rare technical terms.

Ranking System Architecture

Google combines multiple ranking systems, each contributing different signals:

PageRank Computation:

PageRank measures page authority based on link structure. The algorithm models a random web surfer following links.


11 collapsed lines
1
interface PageGraph {
2
  pages: Map<string, string[]> // page → outlinks
3
  inlinks: Map<string, string[]> // page → pages linking to it
4
}
5

6
const DAMPING_FACTOR = 0.85
7
const CONVERGENCE_THRESHOLD = 0.0001
8
const MAX_ITERATIONS = 100
9

10
function computePageRank(graph: PageGraph): Map<string, number> {
11
  const numPages = graph.pages.size
12
  const initialRank = 1.0 / numPages
13

14
  // Initialize all pages with equal rank
15
  let ranks = new Map<string, number>()
16
  for (const page of graph.pages.keys()) {
17
    ranks.set(page, initialRank)
18
  }
19

20
  // Iterate until convergence
21
  for (let iter = 0; iter < MAX_ITERATIONS; iter++) {
22
    const newRanks = new Map<string, number>()
23
    let maxDelta = 0
24

25
    for (const page of graph.pages.keys()) {
26
      // Sum of (rank / outlink_count) for all pages linking to this page
27
      let inlinkSum = 0
28
      const inlinks = graph.inlinks.get(page) || []
29

30
      for (const inlink of inlinks) {
31
        const inlinkRank = ranks.get(inlink) || 0
32
        const outlinks = graph.pages.get(inlink) || []
33
        if (outlinks.length > 0) {
34
          inlinkSum += inlinkRank / outlinks.length
35
        }
36
      }
37

38
      // PageRank formula: PR(A) = (1-d)/N + d * sum(PR(Ti)/C(Ti))
39
      const newRank = (1 - DAMPING_FACTOR) / numPages + DAMPING_FACTOR * inlinkSum
40
      newRanks.set(page, newRank)
41

42
      maxDelta = Math.max(maxDelta, Math.abs(newRank - (ranks.get(page) || 0)))
43
    }
44

45
    ranks = newRanks
46

47
    if (maxDelta < CONVERGENCE_THRESHOLD) {
48
      break // Converged
5 collapsed lines
49
    }
50
  }
51

52
  return ranks
53
}

PageRank at scale:

Full web graph: 400B+ nodes, trillions of edges
Computation: Distributed MapReduce across thousands of machines
Frequency: Recomputed periodically (historically monthly, now more frequent)
Storage: PageRank scores stored with documents in Bigtable

BERT for Ranking:

BERT (Bidirectional Encoder Representations from Transformers) understands semantic meaning, not just keyword matching.

1
Query: "can you get medicine for someone pharmacy"
2
Without BERT: Matches pages about "medicine" and "pharmacy" separately
3
With BERT: Understands intent = "picking up prescription for another person"

RankBrain:

RankBrain converts queries and documents to vectors in a shared embedding space. Semantic similarity is measured by vector distance.

1
Query vector: [0.23, -0.45, 0.12, ...]  (300+ dimensions)
2
Doc vector:   [0.21, -0.42, 0.15, ...]
3
Similarity:   cosine_similarity(query_vec, doc_vec) = 0.94

Distributed Query Execution

Querying a sharded index requires fan-out to all shards, parallel execution, and result aggregation.


14 collapsed lines
1
interface ShardResult {
2
  shard_id: number
3
  results: ScoredDocument[]
4
  latency_ms: number
5
}
6

7
interface QueryPlan {
8
  query: ParsedQuery
9
  shards: ShardConnection[]
10
  timeout_ms: number
11
  top_k_per_shard: number
12
}
13

14
async function executeQuery(plan: QueryPlan): Promise<SearchResult[]> {
15
  const { query, shards, timeout_ms, top_k_per_shard } = plan
16

17
  // Fan out to all shards in parallel
18
  const shardPromises = shards.map((shard) =>
19
    queryShard(shard, query, top_k_per_shard).catch((err) => ({
20
      shard_id: shard.id,
21
      results: [],
22
      latency_ms: timeout_ms,
23
      error: err,
24
    })),
25
  )
26

27
  // Wait for all shards with timeout
28
  const shardResults = await Promise.race([Promise.all(shardPromises), sleep(timeout_ms).then(() => "timeout")])
29

30
  if (shardResults === "timeout") {
31
    // Return partial results from completed shards
32
    return aggregatePartialResults(shardPromises)
33
  }
34

35
  // Merge results from all shards
36
  return mergeAndRank(shardResults as ShardResult[], query)
37
}
38

39
function mergeAndRank(shardResults: ShardResult[], query: ParsedQuery): SearchResult[] {
40
  // Collect all candidates
41
  const candidates: ScoredDocument[] = []
42
  for (const result of shardResults) {
43
    candidates.push(...result.results)
44
  }
45

46
  // Global ranking across all shards
47
  // Shard-local scores are comparable because same scoring function
48
  candidates.sort((a, b) => b.score - a.score)
49

50
  // Apply final ranking (BERT, personalization)
51
  const reranked = applyFinalRanking(candidates.slice(0, 1000), query)
52

53
  return reranked.slice(0, query.num_results)
54
}
55

56
async function queryShard(shard: ShardConnection, query: ParsedQuery, topK: number): Promise<ShardResult> {
57
  const start = Date.now()
58

59
  // 1. Retrieve posting lists for query terms
60
  const postingLists = await shard.getPostingLists(query.terms)
61

62
  // 2. Intersect posting lists (for AND queries)
63
  const candidates = intersectPostingLists(postingLists)
64

65
  // 3. Score candidates using local signals
66
  const scored = candidates.map((doc) => ({
67
    doc,
68
    score: computeLocalScore(doc, query),
11 collapsed lines
69
  }))
70

71
  // 4. Return top-K
72
  scored.sort((a, b) => b.score - a.score)
73

74
  return {
75
    shard_id: shard.id,
76
    results: scored.slice(0, topK),
77
    latency_ms: Date.now() - start,
78
  }
79
}

Tail latency challenge: With 1000 shards, even 99th percentile shard latency affects median query latency. Mitigations:

Technique	Description
Hedged requests	Send duplicate requests to replica shards, use first response
Partial results	Return results even if some shards timeout
Early termination	Stop when enough high-quality results found
Shard rebalancing	Move hot shards to faster machines

Crawl Scheduling and Politeness


11 collapsed lines
1
interface CrawlJob {
2
  url: string
3
  domain: string
4
  priority: number
5
  lastCrawl: Date | null
6
  estimatedChangeRate: number
7
}
8

9
interface DomainState {
10
  lastRequestTime: Date
11
  crawlDelay: number // From robots.txt or adaptive
12
  concurrentRequests: number
13
  maxConcurrent: number
14
}
15

16
class CrawlScheduler {
17
  private domainStates: Map<string, DomainState> = new Map()
18
  private frontier: PriorityQueue<CrawlJob>
19

20
  async scheduleNext(): Promise<CrawlJob | null> {
21
    while (!this.frontier.isEmpty()) {
22
      const job = this.frontier.peek()
23

24
      // Check politeness constraints
25
      const domainState = this.getDomainState(job.domain)
26

27
      if (!this.canCrawlNow(domainState)) {
28
        // Can't crawl this domain yet, try next
29
        this.frontier.pop()
30
        this.frontier.push(job) // Re-add with delay
31
        continue
32
      }
33

34
      // Acquire crawl slot for this domain
35
      if (domainState.concurrentRequests >= domainState.maxConcurrent) {
36
        continue
37
      }
38

39
      domainState.concurrentRequests++
40
      domainState.lastRequestTime = new Date()
41

42
      return this.frontier.pop()
43
    }
44

45
    return null
46
  }
47

48
  private canCrawlNow(state: DomainState): boolean {
49
    const elapsed = Date.now() - state.lastRequestTime.getTime()
50
    return elapsed >= state.crawlDelay * 1000
51
  }
52

53
  // Adaptive crawl delay based on server response
54
  updateCrawlDelay(domain: string, responseTimeMs: number, statusCode: number): void {
55
    const state = this.getDomainState(domain)
56

57
    if (statusCode === 429 || statusCode === 503) {
58
      // Server is overloaded, back off exponentially
10 collapsed lines
59
      state.crawlDelay = Math.min(state.crawlDelay * 2, 60)
60
    } else if (responseTimeMs > 2000) {
61
      // Slow response, increase delay
62
      state.crawlDelay = Math.min(state.crawlDelay * 1.5, 30)
63
    } else if (responseTimeMs < 200 && state.crawlDelay > 1) {
64
      // Fast response, can crawl more aggressively
65
      state.crawlDelay = Math.max(state.crawlDelay * 0.9, 1)
66
    }
67
  }
68
}

Crawl prioritization factors:

Factor	Weight	Rationale
PageRank	High	Important pages should be fresh
Update frequency	High	Pages that change often need frequent crawls
User demand	High	Popular query results need freshness
Sitemap priority	Medium	Webmaster hints
Time since last crawl	Medium	Spread crawl load
robots.txt crawl-delay	Mandatory	Respect server limits

Frontend Considerations

Search Results Page Performance

The Search Engine Results Page (SERP) must render quickly despite complex content (rich snippets, knowledge panels, images).

Critical rendering path:


9 collapsed lines
1
interface SearchResultsPage {
2
  query: string
3
  results: SearchResult[]
4
  knowledgePanel?: KnowledgePanel
5
  relatedSearches: string[]
6
}
7

8
// Server-side render critical content
9
function renderSERP(data: SearchResultsPage): string {
10
  // 1. Inline critical CSS for above-the-fold content
11
  const criticalCSS = extractCriticalCSS()
12

13
  // 2. Render first 3 results server-side (no JS needed)
14
  const initialResults = data.results.slice(0, 3).map(renderResult).join("")
15

16
  // 3. Defer non-critical content
17
  const deferredContent = `
18
    <script>
19
      // Hydrate remaining results after initial paint
20
      window.__SERP_DATA__ = ${JSON.stringify(data)};
21
    </script>
22
  `
23

24
  return `
25
    <html>
26
    <head>
27
      <style>${criticalCSS}</style>
28
    </head>
29
    <body>
30
      <div id="results">${initialResults}</div>
31
      <div id="deferred"></div>
32
      ${deferredContent}
33
      <script src="/serp.js" defer></script>
34
    </body>
35
    </html>
36
  `
37
}

Performance optimizations:

Technique	Impact	Implementation
Server-side rendering	FCP < 500ms	Render first 3 results on server
Critical CSS inlining	No render blocking	Extract above-fold styles
Lazy loading	Reduced initial payload	Load images/rich snippets on scroll
Prefetching	Faster result clicks	Prefetch top result on hover
Service worker	Offline + instant repeat	Cache static assets, query history

Autocomplete UX


7 collapsed lines
1
class AutocompleteController {
2
  private debounceMs = 100
3
  private minChars = 2
4
  private cache: Map<string, string[]> = new Map()
5

6
  async handleInput(query: string): Promise<string[]> {
7
    if (query.length < this.minChars) {
8
      return []
9
    }
10

11
    // Check cache first
12
    const cached = this.cache.get(query)
13
    if (cached) {
14
      return cached
15
    }
16

17
    // Debounce rapid keystrokes
18
    await this.debounce()
19

20
    // Fetch suggestions
21
    const suggestions = await this.fetchSuggestions(query)
22

23
    // Cache for repeat queries
24
    this.cache.set(query, suggestions)
25

26
    // Prefetch likely next queries
27
    this.prefetchNextCharacter(query)
28

29
    return suggestions
30
  }
31

32
  private prefetchNextCharacter(query: string): void {
33
    // Prefetch common next characters
34
    const commonNextChars = ["a", "e", "i", "o", "s", "t", " "]
35
    for (const char of commonNextChars) {
36
      const nextQuery = query + char
37
      if (!this.cache.has(nextQuery)) {
38
        // Low-priority background fetch
39
        requestIdleCallback(() => this.fetchSuggestions(nextQuery))
40
      }
41
    }
42
  }
43
}

Autocomplete latency budget:

1
Total: 100ms target
2
├── Network RTT: 30ms (edge servers)
3
├── Server processing: 20ms
4
├── Trie lookup: 5ms
5
├── Ranking: 10ms
6
├── Response serialization: 5ms
7
└── Client rendering: 30ms

Infinite Scroll vs Pagination

Google uses traditional pagination rather than infinite scroll. Design rationale:

Factor	Pagination	Infinite Scroll
User mental model	Clear position in results	Lost context
Sharing results	”Page 2” is meaningful	No way to share position
Back button	Works as expected	Loses scroll position
Performance	Bounded DOM size	Unbounded growth
SEO results	Users evaluate before clicking	Scroll past quickly

Infrastructure Design

Cloud-Agnostic Components

Component	Purpose	Requirements
Distributed storage	Page content, index	Petabyte scale, strong consistency
Distributed compute	Index building, ranking	Horizontal scaling, fault tolerance
Message queue	Crawl job distribution	At-least-once, priority queues
Cache layer	Query results, posting lists	Sub-ms latency, high throughput
CDN	Static assets, edge serving	Global distribution
DNS	Geographic routing	Low latency, health checking

Google’s Internal Infrastructure

Component	Google Service	Purpose
Storage	Bigtable + Colossus	Structured data + distributed file system
Compute	Borg	Container orchestration
MapReduce	MapReduce / Flume	Batch processing
RPC	Stubby (gRPC predecessor)	Service communication
Monitoring	Borgmon (Prometheus inspiration)	Metrics and alerting
Consensus	Chubby (ZooKeeper inspiration)	Distributed locking

AWS Reference Architecture

Service sizing (for ~10K QPS, 1B documents):

Service	Configuration	Cost Estimate
OpenSearch	20 × i3.2xlarge data nodes	~$50K/month
ECS Fargate	50 × 4vCPU/8GB tasks	~$15K/month
ElastiCache	10 × r6g.xlarge nodes	~$5K/month
DynamoDB	On-demand, ~100K WCU	~$10K/month
S3	100TB storage	~$2K/month

Note: This is a simplified reference. Google’s actual infrastructure is 1000x larger and uses custom hardware/software unavailable commercially.

Self-Hosted Open Source Stack

Component	Technology	Notes
Search engine	Elasticsearch / Solr	Proven at billion-doc scale
Storage	Cassandra / ScyllaDB	Wide-column store like Bigtable
Crawler	Apache Nutch / StormCrawler	Distributed web crawling
Queue	Kafka	Crawl job distribution
Compute	Kubernetes	Container orchestration
Cache	Redis Cluster	Query and posting list cache

Variations

News Search (Freshness-Critical)

News search prioritizes freshness over traditional ranking signals.

1
function computeNewsScore(doc: NewsDocument, query: Query): number {
2
  const baseRelevance = computeTextRelevance(doc, query)
3
  const authorityScore = doc.sourceAuthority // CNN > random blog
4
  const freshnessScore = computeFreshnessDecay(doc.publishedAt)
5

6
  // Freshness dominates for news queries
7
  return baseRelevance * 0.3 + authorityScore * 0.2 + freshnessScore * 0.5
8
}
9

10
function computeFreshnessDecay(publishedAt: Date): number {
11
  const ageHours = (Date.now() - publishedAt.getTime()) / (1000 * 60 * 60)
12

13
  // Exponential decay: half-life of ~6 hours for breaking news
14
  return Math.exp(-ageHours / 8)
15
}

News-specific infrastructure:

Dedicated “fresh” index updated in real-time
RSS/Atom feed crawling every few minutes
Publisher push APIs for instant indexing
Separate ranking model trained on news engagement

Image Search

Image search combines visual features with text signals.

1
interface ImageDocument {
2
  imageUrl: string
3
  pageUrl: string
4
  altText: string
5
  surroundingText: string
6
  visualFeatures: number[] // CNN embeddings
7
  safeSearchScore: number
8
}
9

10
function rankImageResult(image: ImageDocument, query: Query): number {
11
  // Text signals from alt text and page context
12
  const textScore = computeTextRelevance(image.altText + " " + image.surroundingText, query)
13

14
  // Visual similarity to query (if query has image)
15
  const visualScore = query.hasImage ? cosineSimilarity(image.visualFeatures, query.imageFeatures) : 0
16

17
  // Page authority
18
  const pageScore = getPageRank(image.pageUrl)
19

20
  return textScore * 0.4 + visualScore * 0.3 + pageScore * 0.3
21
}

Local Search

Location-aware search requires geographic indexing.

1
interface LocalBusiness {
2
  id: string
3
  name: string
4
  category: string
5
  location: { lat: number; lng: number }
6
  rating: number
7
  reviewCount: number
8
}
9

10
function rankLocalResult(business: LocalBusiness, query: Query, userLocation: Location): number {
11
  const relevanceScore = computeTextRelevance(business.name + " " + business.category, query)
12

13
  // Distance decay: closer is better
14
  const distance = haversineDistance(userLocation, business.location)
15
  const distanceScore = 1 / (1 + distance / 5) // 5km reference distance
16

17
  // Quality signals
18
  const qualityScore = business.rating * Math.log(business.reviewCount + 1)
19

20
  return relevanceScore * 0.3 + distanceScore * 0.4 + qualityScore * 0.3
21
}

Local search infrastructure:

Geospatial index (R-tree or geohash-based)
Business database integration (Google My Business)
Real-time hours/availability from APIs
User location from GPS, IP, or explicit setting

Conclusion

Web search design requires solving four interconnected problems at planetary scale:

Crawling — Discovering and fetching content from billions of URLs while respecting server limits. Prioritization determines which pages stay fresh; adaptive politeness prevents overloading origin servers. The crawler is never “done”—the web changes continuously.
Indexing — Building data structures that enable sub-second query response. Inverted indexes map terms to documents; sharding distributes the index across thousands of machines. Compression (delta encoding) reduces storage 5-10x while maintaining query speed.
Ranking — Combining hundreds of signals to surface relevant results. PageRank provides baseline authority from link structure; BERT understands semantic meaning; RankBrain matches queries to documents in embedding space. No single signal dominates—the combination matters.
Serving — Processing 100K+ QPS with sub-second latency. Fan-out to all shards, aggregate results, apply final ranking—all within 200ms. Caching handles the long tail; early termination stops when good results are found.

What this design optimizes for:

Query latency: p50 < 200ms through caching, early termination, and parallel shard queries
Index freshness: Minutes for news, hours for regular content through tiered crawling
Result relevance: Multiple ranking systems (PageRank + BERT + RankBrain) cover different relevance aspects
Horizontal scale: Sharded architecture scales to 400B+ documents

What it sacrifices:

Simplicity: Thousands of components, multiple ranking systems, complex coordination
Cost: Massive infrastructure (estimated millions of servers)
Real-time indexing: Minutes to hours delay for most content (news excepted)

Known limitations:

Long-tail queries may have poor results (insufficient training data)
Adversarial SEO requires constant ranking updates
Fresh content from new sites may take weeks to surface
Personalization creates filter bubbles

Appendix

Prerequisites

Information retrieval fundamentals (TF-IDF, inverted indexes)
Distributed systems concepts (sharding, replication, consensus)
Basic machine learning (embeddings, neural networks)
Graph algorithms (PageRank, link analysis)

Terminology

Inverted Index — Data structure mapping terms to documents containing them
Posting List — List of documents (with positions/frequencies) for a single term
PageRank — Algorithm measuring page importance based on link structure
BERT — Bidirectional Encoder Representations from Transformers; understands word context
RankBrain — Google’s ML system for query-document matching via embeddings
Crawl Budget — Maximum pages a crawler will fetch from a domain in a time period
robots.txt — File specifying crawler access rules for a website
QDF — Query Deserves Freshness; flag indicating time-sensitive queries
SERP — Search Engine Results Page
Canonical URL — Preferred URL when multiple URLs have duplicate content

Summary

Web search processes 8.5B queries/day across 400B+ indexed pages with sub-second latency
Inverted indexes enable O(1) term lookup; sharding distributes load across thousands of machines
PageRank measures page authority via link analysis; BERT/RankBrain add semantic understanding
Crawl prioritization balances freshness vs. coverage; politeness respects server limits
Query processing includes spell correction (680M param DNN), intent understanding, and query expansion
Tiered indexing keeps hot data in memory; cold data on disk for cost efficiency
Early termination and caching reduce tail latency; hedged requests handle slow shards

References

The Anatomy of a Large-Scale Hypertextual Web Search Engine - Brin & Page (1998), original Google paper
Bigtable: A Distributed Storage System for Structured Data - Google (2006), storage architecture
How Search Works - Google official documentation
Google Search Ranking Systems Guide - Official ranking system documentation
A Peek Behind Colossus - Google’s distributed file system
BERT: Pre-training of Deep Bidirectional Transformers - Devlin et al., BERT paper
The PageRank Citation Ranking - Page et al., original PageRank paper
Web Search for a Planet - Google (2003), cluster architecture
Crawl Budget Management - Google crawl documentation

Read more