Database Indexing: The Complete Guide to Faster Queries
Database indexing is the single most impactful technique for improving query performance. A well-designed index can reduce query time from minutes to milliseconds, while a poorly designed one can slow down writes and waste storage. This guide covers every index type, shows real SQL examples, and teaches you how to think about indexing for system design.
How Database Indexes Work
An index is a separate data structure that maintains a sorted reference to rows in a table. Without an index, the database must perform a full table scan — reading every row to find matches. With an index, the database uses the index structure to jump directly to matching rows, dramatically reducing I/O.
Think of it like a book index: instead of reading every page to find a topic, you look it up in the index and go directly to the right page. The trade-off is that indexes consume extra storage and must be updated on every write operation.
B-Tree Indexes
B-tree (balanced tree) indexes are the default and most common index type in all major SQL databases. They maintain data in a sorted, balanced tree structure where each node contains multiple keys and pointers to child nodes.
How B-Trees Work Internally
A B-tree index has three levels: the root node, internal (branch) nodes, and leaf nodes. The leaf nodes contain the actual index entries (key values and pointers to table rows). For a table with millions of rows, a B-tree typically has only 3-4 levels deep, meaning any lookup requires at most 3-4 disk reads.
B-trees support equality lookups (WHERE id = 5), range queries (WHERE price BETWEEN 10 AND 50), sorting (ORDER BY), and prefix matching (WHERE name LIKE 'John%'). This versatility is why they are the default.
-- Creating a B-tree index (default type)
CREATE INDEX idx_users_email ON users(email);
-- PostgreSQL: B-tree is explicit
CREATE INDEX idx_orders_date ON orders USING btree(created_at);
-- Verify the index is used with EXPLAIN
EXPLAIN ANALYZE
SELECT * FROM users WHERE email = 'alice@example.com';
-- Output shows Index Scan instead of Seq Scan:
-- Index Scan using idx_users_email on users
-- Index Cond: (email = 'alice@example.com')
-- Execution Time: 0.052 ms (vs 245ms without index)
Hash Indexes
Hash indexes use a hash function to map keys directly to their locations. They are extremely fast for exact equality lookups (WHERE id = 42) but cannot support range queries, sorting, or partial matching.
-- PostgreSQL hash index
CREATE INDEX idx_sessions_token ON sessions USING hash(session_token);
-- This query uses the hash index (equality only)
SELECT * FROM sessions WHERE session_token = 'abc123def456';
-- This query CANNOT use the hash index (range query)
SELECT * FROM sessions WHERE session_token > 'abc';
Hash indexes are rarely used in practice because B-trees handle equality lookups nearly as fast while also supporting range queries. PostgreSQL only made hash indexes crash-safe in version 10. Use them only when you exclusively need equality lookups and need the marginal performance gain.
Bitmap Indexes
Bitmap indexes store a bitmap (array of bits) for each distinct value in a column. Each bit represents a row — 1 if the row has that value, 0 otherwise. They are highly efficient for columns with low cardinality (few distinct values) like status, gender, or boolean flags.
Bitmap indexes excel at combining multiple conditions using bitwise AND/OR operations. Oracle supports explicit bitmap indexes. PostgreSQL uses bitmap index scans dynamically by combining multiple B-tree indexes at query time.
-- Oracle: Explicit bitmap index
CREATE BITMAP INDEX idx_orders_status ON orders(status);
-- PostgreSQL: Bitmap scan combines multiple indexes automatically
-- Given these two B-tree indexes:
CREATE INDEX idx_orders_status ON orders(status);
CREATE INDEX idx_orders_region ON orders(region);
-- This query may use a BitmapAnd of both indexes:
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE status = 'shipped' AND region = 'US';
-- BitmapAnd
-- Bitmap Index Scan on idx_orders_status
-- Bitmap Index Scan on idx_orders_region
Composite (Compound) Indexes
Composite indexes include multiple columns in a single index. The column order matters significantly — the index is sorted by the first column, then by the second column within each first-column value, and so on. This is known as the leftmost prefix rule.
-- Composite index on (last_name, first_name, age)
CREATE INDEX idx_name_age ON employees(last_name, first_name, age);
-- These queries CAN use the index:
SELECT * FROM employees WHERE last_name = 'Smith'; -- uses first column
SELECT * FROM employees WHERE last_name = 'Smith' AND first_name = 'John'; -- uses first two
SELECT * FROM employees WHERE last_name = 'Smith' AND first_name = 'John' AND age = 30; -- uses all three
-- These queries CANNOT efficiently use the index:
SELECT * FROM employees WHERE first_name = 'John'; -- skips first column
SELECT * FROM employees WHERE age = 30; -- skips first two columns
SELECT * FROM employees WHERE first_name = 'John' AND age = 30; -- skips first column
The rule of thumb for column ordering in composite indexes: place equality conditions first, then range conditions last. High-selectivity columns (more distinct values) should come before low-selectivity ones.
Covering Indexes
A covering index contains all the columns needed to satisfy a query, eliminating the need to access the table itself. This is called an "index-only scan" and is the fastest possible query execution path.
-- Covering index for a common query
CREATE INDEX idx_covering_orders ON orders(user_id, status, total);
-- This query is fully satisfied by the index (index-only scan)
SELECT status, total FROM orders WHERE user_id = 42;
-- PostgreSQL INCLUDE clause (non-key columns in the index)
CREATE INDEX idx_orders_user ON orders(user_id) INCLUDE (status, total, created_at);
-- The INCLUDE columns are stored in the index but not used for sorting/searching
-- They enable index-only scans without bloating the B-tree structure
Partial Indexes
Partial indexes only index rows that satisfy a condition. They are smaller and more efficient when you frequently query a subset of data.
-- Index only active users (90% of queries target active users)
CREATE INDEX idx_active_users ON users(email) WHERE status = 'active';
-- Index only recent orders
CREATE INDEX idx_recent_orders ON orders(created_at)
WHERE created_at > '2024-01-01';
-- Index only non-null values
CREATE INDEX idx_phone ON users(phone) WHERE phone IS NOT NULL;
Unique Indexes
Unique indexes enforce uniqueness constraints while also serving as performance indexes. When you create a PRIMARY KEY or UNIQUE constraint, the database automatically creates a unique index.
-- Unique index
CREATE UNIQUE INDEX idx_users_email ON users(email);
-- Unique composite index (combination must be unique)
CREATE UNIQUE INDEX idx_enrollment ON enrollments(student_id, course_id);
Index Trade-Offs
| Benefit | Cost |
|---|---|
| Faster SELECT queries | Slower INSERT, UPDATE, DELETE operations |
| Efficient sorting (ORDER BY) | Additional disk storage (10-30% of table size) |
| Quick lookups on indexed columns | Index maintenance overhead during writes |
| Constraint enforcement (UNIQUE) | Query planner complexity with many indexes |
When NOT to Use Indexes
- Small tables: If a table has fewer than a few thousand rows, a full table scan is often faster than an index lookup.
- Write-heavy tables with few reads: Log tables, event streams, and audit tables that are rarely queried should minimize indexes.
- Low-selectivity columns: A boolean column with only true/false values gives a B-tree index little benefit (bitmap indexes may help).
- Frequently updated columns: Columns that change constantly cause excessive index rebuilding.
Query Optimization with EXPLAIN
The EXPLAIN command is your primary tool for understanding how the database executes a query and whether it uses your indexes. Always use EXPLAIN ANALYZE to see actual execution times.
-- PostgreSQL: EXPLAIN ANALYZE with detailed output
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT u.name, COUNT(o.id)
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE o.status = 'completed'
AND o.created_at > '2024-01-01'
GROUP BY u.name;
-- Key things to look for:
-- 1. Seq Scan = full table scan (usually bad for large tables)
-- 2. Index Scan = using an index (good)
-- 3. Index Only Scan = covering index (best)
-- 4. Bitmap Index Scan = combining multiple indexes
-- 5. Actual rows vs estimated rows (large differences = stale statistics)
Index Design Strategy
Follow this process when designing indexes for your application:
- Identify your most frequent and critical queries — focus on queries that run thousands of times per second or back user-facing features.
- Check the WHERE clause — columns in WHERE conditions are the primary candidates for indexing.
- Check JOIN conditions — foreign key columns used in JOINs should almost always be indexed.
- Check ORDER BY and GROUP BY — indexes can eliminate the need for sorting operations.
- Consider composite indexes — if multiple columns always appear together in queries, a composite index is more efficient than separate indexes.
- Use EXPLAIN — validate that the database actually uses your index. Sometimes the query planner chooses a different path.
For more on optimizing database performance at scale, see our guides on database sharding, partitioning, and read replicas.
Frequently Asked Questions
How many indexes should a table have?
There is no fixed number, but a good rule of thumb is 3-5 indexes per table for OLTP workloads. Each additional index slows down writes. For read-heavy analytics tables, you can have more. Monitor your write performance and only add indexes that your queries actually use.
Does the order of columns in a composite index matter?
Yes, it matters critically. The leftmost prefix rule means the index can only be used efficiently if the query filters on columns starting from the left. Place equality conditions first, then range conditions. For example, an index on (status, created_at) supports WHERE status = 'active' AND created_at > '2024-01-01' but not WHERE created_at > '2024-01-01' alone.
Should I index foreign key columns?
Almost always yes. Foreign keys are used in JOIN operations, and without an index, the database performs a full table scan on the child table for every join. PostgreSQL does not automatically create indexes on foreign keys (unlike MySQL), so you must create them explicitly.
How do I find unused indexes?
In PostgreSQL, query the pg_stat_user_indexes view to find indexes with zero or low idx_scan counts. These indexes consume storage and slow writes without providing read benefits. Remove them after confirming they are not needed for constraint enforcement.
-- Find unused indexes in PostgreSQL
SELECT schemaname, tablename, indexname, idx_scan, pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
ORDER BY pg_relation_size(indexrelid) DESC;
What is index bloat and how do I fix it?
Index bloat occurs when dead tuples accumulate in an index after many updates and deletes. In PostgreSQL, VACUUM reclaims dead tuples, but heavy write workloads can cause bloat to outpace vacuuming. Use REINDEX to rebuild bloated indexes, or pg_repack for online rebuilds without locking.