Design Google Search: A Web Search Engine

Google Search handles over 8.5 billion queries per day, indexing hundreds of billions of web pages and returning results in under 500 milliseconds. Designing a search engine is one of the most complex system design problems, covering web crawling, indexing, ranking algorithms, query processing, and global-scale serving infrastructure. This guide breaks down the core architecture.

1. Requirements

Functional Requirements

Users submit text queries and receive a ranked list of relevant web pages.
Results include title, URL, and text snippet with query terms highlighted.
Support for advanced operators: exact phrase (""), exclusion (-), site-specific (site:).
Autocomplete and spelling correction (did you mean?).
Support for different content types: web pages, images, news, videos.
Fresh results: newly published or updated pages appear quickly.
Personalization based on location, language, and search history.

Non-Functional Requirements

Low latency: Results returned in under 500ms (Google targets 200ms).
High availability: 99.999% uptime.
Scalability: Index hundreds of billions of pages, serve 100K+ queries/sec.
Freshness: Breaking news indexed within minutes.
Relevance: The most useful results appear at the top.

2. Capacity Estimation

Metric	Estimate
Queries per day	8.5 billion
Queries per second (avg)	~100,000 QPS
Peak QPS	~300,000 QPS
Indexed web pages	~400 billion (with deduplication)
Average page size (compressed)	~50 KB
Raw page storage	400B × 50 KB = ~20 PB
Inverted index size	~100+ PB (term → document postings)
Pages crawled per day	Hundreds of billions of URLs checked
Crawl bandwidth	Multiple Tbps

3. High-Level Design

The search engine has two major subsystems: the offline pipeline (crawl, index, rank) and the online serving system (query processing, result retrieval, ranking).

Subsystem	Component	Responsibility
Offline	Web Crawler	Discover and download web pages
Offline	Document Processor	Parse HTML, extract text, detect language, dedup
Offline	Indexer	Build inverted index (term → document list)
Offline	PageRank Computer	Compute page authority from link graph
Online	Query Processor	Parse query, expand terms, correct spelling
Online	Index Servers	Retrieve candidate documents from inverted index
Online	Ranking Service	Score and rank candidate documents
Online	Snippet Generator	Extract relevant text snippets for display
Online	Result Cache	Cache results for popular/repeated queries

4. Detailed Component Design

4.1 Web Crawler

The crawler discovers and downloads web pages. It is a distributed system running thousands of crawler instances.

Crawling algorithm:

Start with a seed list of known URLs.
Fetch each URL, parse the HTML, and extract all hyperlinks.
Add new URLs to the crawl frontier (priority queue).
Repeat, prioritizing pages by importance (PageRank), freshness, and crawl frequency.

class WebCrawler:
    def __init__(self):
        self.frontier = PriorityQueue()  # URL frontier
        self.seen_urls = BloomFilter(capacity=100_000_000_000)
        self.dns_cache = {}
        self.robots_cache = {}

    def crawl(self, seed_urls):
        for url in seed_urls:
            self.frontier.put(url, priority=1.0)

        while not self.frontier.empty():
            url = self.frontier.get()

            # Politeness: respect robots.txt and crawl delay
            if not self.is_allowed(url):
                continue

            # Rate limit per domain (1 req/sec per domain)
            self.wait_for_politeness(url.domain)

            html = self.fetch(url)
            if html is None:
                continue

            # Process the page
            document = self.parse(html)
            self.store_document(url, document)

            # Extract and enqueue new URLs
            for link in document.links:
                normalized = self.normalize_url(link)
                if not self.seen_urls.contains(normalized):
                    self.seen_urls.add(normalized)
                    priority = self.compute_priority(normalized)
                    self.frontier.put(normalized, priority)

    def is_allowed(self, url):
        robots = self.get_robots_txt(url.domain)
        return robots.is_allowed(url.path, user_agent="GoogleBot")

Key design considerations:

Politeness: Respect robots.txt rules and limit crawl rate per domain (typically 1 request per second per domain).
URL deduplication: Use a Bloom filter to avoid re-crawling the same URL. At 400B URLs, the Bloom filter needs careful sizing.
Freshness: Re-crawl important pages frequently (news sites every few minutes, others daily/weekly).
Distributed crawling: Partition URLs by domain across crawler instances using consistent hashing to ensure politeness per domain.

4.2 Inverted Index

The inverted index is the core data structure that makes search fast. It maps each term to a list of documents containing that term.

// Inverted index structure
{
    "distributed": [
        { "doc_id": 1001, "tf": 3, "positions": [12, 45, 102] },
        { "doc_id": 2045, "tf": 1, "positions": [7] },
        { "doc_id": 3099, "tf": 5, "positions": [1, 23, 56, 89, 134] }
    ],
    "systems": [
        { "doc_id": 1001, "tf": 2, "positions": [13, 46] },
        { "doc_id": 5002, "tf": 4, "positions": [3, 18, 44, 90] }
    ]
}

// For query "distributed systems":
// 1. Look up posting lists for "distributed" and "systems"
// 2. Intersect the two lists to find documents containing BOTH terms
// 3. Use position data for phrase matching (positions must be adjacent)

Index construction: The indexer processes crawled documents in batch (MapReduce/Spark). For each document: tokenize text, remove stop words, apply stemming (running → run), and add entries to the inverted index. The index is sharded across thousands of machines.

Index sharding strategies:

Strategy	How It Works	Trade-off
Document-partitioned	Each shard holds the complete index for a subset of documents	Query must hit ALL shards (scatter-gather); easy to add documents
Term-partitioned	Each shard holds the complete posting list for a subset of terms	Query hits only relevant shards; uneven load (popular terms are hot)

Google uses document-partitioned sharding with replication. Each query is sent to all shards in parallel, and results are merged. Read about sharding strategies for more details.

4.3 PageRank (Simplified)

PageRank computes the importance of a page based on the link structure of the web. The intuition: a page is important if many important pages link to it.

def compute_pagerank(graph, damping=0.85, iterations=20):
    """
    graph: dict mapping page -> list of pages it links to
    Returns: dict mapping page -> PageRank score
    """
    N = len(graph)
    pagerank = {page: 1.0 / N for page in graph}

    for _ in range(iterations):
        new_rank = {}
        for page in graph:
            rank_sum = 0
            for other_page in graph:
                if page in graph[other_page]:
                    num_outlinks = len(graph[other_page])
                    rank_sum += pagerank[other_page] / num_outlinks

            new_rank[page] = (1 - damping) / N + damping * rank_sum

        pagerank = new_rank

    return pagerank

# At web scale, this runs as a distributed MapReduce job
# over the entire link graph (hundreds of billions of edges)

Modern Google Search uses many more signals than PageRank alone, but PageRank remains a foundational concept for understanding link-based authority.

4.4 Query Processing Pipeline

When a user submits a query, the following happens in under 200ms:

Query parsing: Tokenize, detect language, handle operators ("", -, site:).
Spelling correction: Suggest corrections using edit distance, n-gram models, and query logs.
Query expansion: Add synonyms, related terms to improve recall.
Cache check: Look up the query in the result cache. Popular queries are served entirely from cache.
Index lookup: Send the query to all index shards in parallel (scatter). Each shard returns its top-K candidate documents.
Merge and rank: Gather results from all shards, apply a global ranking model considering 200+ signals.
Snippet generation: For the top results, generate text snippets with query terms bolded.
Return results: Send the final ranked list to the user.

async function processQuery(rawQuery, userContext) {
    // Step 1-3: Parse and enhance
    const query = parseQuery(rawQuery);
    const corrected = spellCheck(query);
    const expanded = expandQuery(corrected);

    // Step 4: Cache check
    const cacheKey = buildCacheKey(expanded, userContext.location);
    const cached = await resultCache.get(cacheKey);
    if (cached) return cached;

    // Step 5: Scatter to all index shards
    const shardResults = await Promise.all(
        indexShards.map(shard => shard.search(expanded, limit: 100))
    );

    // Step 6: Merge and global ranking
    const candidates = mergeResults(shardResults);
    const ranked = rankingModel.score(candidates, {
        query: expanded,
        userLocation: userContext.location,
        userLanguage: userContext.language,
        device: userContext.device
    });

    // Step 7: Generate snippets for top 10
    const results = ranked.slice(0, 10).map(doc => ({
        title: doc.title,
        url: doc.url,
        snippet: generateSnippet(doc.content, expanded.terms)
    }));

    // Cache and return
    await resultCache.set(cacheKey, results, ttl: 3600);
    return results;
}

4.5 Ranking Signals

Google uses 200+ signals to rank results. Key categories:

Signal Category	Examples
Relevance	TF-IDF score, BM25, term proximity, title match
Authority	PageRank, domain authority, backlink quality
Freshness	Page publish date, last modified date, content change frequency
User engagement	Click-through rate, time on page, bounce rate
Page quality	Content depth, mobile-friendliness, page speed, HTTPS
Personalization	User location, language, search history

5. Key Trade-offs

Decision	Trade-off
Index freshness vs completeness	Maintaining a real-time index for all pages is prohibitively expensive. Google uses a tiered approach: a real-time index for breaking news, a frequently refreshed index for popular pages, and a slower deep-crawl index for the long tail.
Precision vs recall	Showing only highly relevant results (precision) may miss useful pages. Showing everything (recall) overwhelms users. Google optimizes for precision on the first page while maintaining high recall overall.
Serving latency vs ranking quality	More sophisticated ranking models (deep learning) produce better results but take longer to evaluate. Google uses a multi-stage ranking pipeline: fast first-pass filtering with BM25, then expensive neural re-ranking on top candidates only.

6. Scaling Considerations

6.1 Index Serving

The index is sharded across thousands of machines. Each query triggers a scatter-gather across all shards. With replicas for availability and load distribution, Google operates millions of index serving nodes. Load balancing routes queries to the least-loaded replica of each shard.

6.2 Result Caching

A significant percentage of queries are repeated (e.g., "weather", "facebook", "amazon"). Caching query results eliminates the need for expensive index lookups. Google caches at multiple levels: full result pages, individual shard results, and document features.

6.3 Geographic Distribution

Google operates data centers worldwide. Queries are routed to the nearest data center. The full index is replicated across all major data centers. This ensures low latency globally. The CAP theorem trade-off here is that index updates take time to propagate, but eventual consistency is acceptable for search results.

Use swehelper.com tools to practice search system design and inverted index calculations.

7. Frequently Asked Questions

Q1: How does the inverted index enable fast search?

Without an index, searching would require scanning every document for query terms (full scan of 400B pages). The inverted index pre-computes a mapping from each term to the list of documents containing it. For a query like "distributed systems," the engine looks up the posting lists for "distributed" and "systems" and intersects them. This reduces the search space from hundreds of billions of documents to a few million candidates in milliseconds.

Q2: How does Google crawl the entire web efficiently?

Google uses a distributed crawler with thousands of crawler instances. URLs are partitioned by domain using consistent hashing, ensuring each domain is crawled by one instance (maintaining politeness). The crawler uses a priority queue (frontier) where important pages (high PageRank, frequently updated) are crawled more often. Bloom filters prevent re-crawling duplicate URLs. The entire crawl frontier is distributed across many machines.

Q3: How does PageRank work at web scale?

PageRank is computed as a distributed MapReduce job over the link graph. Each iteration, every page distributes its rank equally among its outbound links. After 20-50 iterations, the ranks converge. At web scale (hundreds of billions of pages and links), this requires a massive distributed compute cluster and takes hours to complete. It runs periodically (not in real-time).

Q4: How does Google achieve sub-200ms latency for search results?

Several techniques: (1) Heavy caching of popular queries. (2) Scatter-gather across index shards executes in parallel. (3) Each shard holds its portion of the index in memory or on SSDs. (4) Multi-stage ranking: fast BM25 scoring first, then expensive neural ranking only on top candidates. (5) Geographic routing to the nearest data center. (6) Aggressive timeout: if a shard is slow, skip it and use partial results.

Design Google Search: A Web Search Engine