Database Partitioning: Optimizing Performance with Data Segmentation
Database partitioning divides a large table into smaller, more manageable pieces within a single database server. Unlike sharding which distributes data across multiple servers, partitioning keeps all data on one server but organizes it for faster queries and easier maintenance. This guide covers horizontal and vertical partitioning, strategies, partition pruning, and real SQL examples.
Why Partition Your Database?
As tables grow to millions or billions of rows, performance degrades. Full table scans become slow, index maintenance becomes expensive, and operations like VACUUM or backups take increasingly longer. Partitioning addresses these problems by allowing the database to work with smaller chunks of data.
- Query performance: Partition pruning skips irrelevant partitions entirely
- Maintenance efficiency: VACUUM, REINDEX, and backups operate on individual partitions
- Data lifecycle management: Drop old partitions instead of deleting rows
- Parallel query execution: Multiple partitions can be scanned simultaneously
Horizontal vs Vertical Partitioning
Horizontal Partitioning
Horizontal partitioning splits a table by rows. Each partition contains a subset of rows with the same schema. For example, an orders table might be partitioned by month — January orders in one partition, February orders in another.
-- PostgreSQL: Range partitioning by date
CREATE TABLE orders (
id BIGSERIAL,
user_id INT NOT NULL,
total DECIMAL(10,2) NOT NULL,
status VARCHAR(20),
created_at TIMESTAMP NOT NULL
) PARTITION BY RANGE (created_at);
-- Create monthly partitions
CREATE TABLE orders_2024_01 PARTITION OF orders
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
CREATE TABLE orders_2024_02 PARTITION OF orders
FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
CREATE TABLE orders_2024_03 PARTITION OF orders
FOR VALUES FROM ('2024-03-01') TO ('2024-04-01');
-- Queries automatically route to the correct partition
SELECT * FROM orders WHERE created_at BETWEEN '2024-02-01' AND '2024-02-28';
-- Only scans orders_2024_02 (partition pruning)
Vertical Partitioning
Vertical partitioning splits a table by columns. Frequently accessed columns stay in one table, while rarely used or large columns (BLOBs, TEXT) go to another. This keeps the hot table smaller and more cache-friendly.
-- Vertical partitioning: Split a wide products table
-- Hot columns (frequently queried)
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name VARCHAR(200) NOT NULL,
price DECIMAL(10,2) NOT NULL,
category_id INT NOT NULL,
stock_count INT NOT NULL
);
-- Cold columns (rarely accessed, large data)
CREATE TABLE product_details (
product_id INT PRIMARY KEY REFERENCES products(id),
description TEXT,
specifications JSONB,
manual_pdf BYTEA,
seo_metadata TEXT
);
-- Most queries only need the hot table
SELECT name, price FROM products WHERE category_id = 5;
-- Small table, fast scan, fits in memory
Partitioning Strategies
Range Partitioning
Range partitioning divides data based on a continuous range of values. It is ideal for time-series data, date-based queries, and sequential data.
| Partition | Range | Example Data |
|---|---|---|
| orders_q1_2024 | Jan 1 - Mar 31 | 2.3M rows |
| orders_q2_2024 | Apr 1 - Jun 30 | 2.8M rows |
| orders_q3_2024 | Jul 1 - Sep 30 | 3.1M rows |
| orders_q4_2024 | Oct 1 - Dec 31 | 2.6M rows |
Hash Partitioning
Hash partitioning applies a hash function to the partition key to distribute rows evenly across partitions. It is useful when there is no natural range and you want even distribution.
-- PostgreSQL: Hash partitioning
CREATE TABLE sessions (
id UUID NOT NULL,
user_id INT NOT NULL,
data JSONB,
created_at TIMESTAMP NOT NULL
) PARTITION BY HASH (user_id);
-- Create 4 hash partitions
CREATE TABLE sessions_p0 PARTITION OF sessions FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE sessions_p1 PARTITION OF sessions FOR VALUES WITH (MODULUS 4, REMAINDER 1);
CREATE TABLE sessions_p2 PARTITION OF sessions FOR VALUES WITH (MODULUS 4, REMAINDER 2);
CREATE TABLE sessions_p3 PARTITION OF sessions FOR VALUES WITH (MODULUS 4, REMAINDER 3);
-- Queries with user_id hit only one partition
SELECT * FROM sessions WHERE user_id = 42;
-- Scans only the partition where hash(42) % 4 matches
List Partitioning
List partitioning assigns specific values to each partition. Ideal for categorical data like region, status, or country.
-- PostgreSQL: List partitioning by region
CREATE TABLE customers (
id SERIAL,
name VARCHAR(200),
email VARCHAR(255),
region VARCHAR(20) NOT NULL
) PARTITION BY LIST (region);
CREATE TABLE customers_us PARTITION OF customers FOR VALUES IN ('US', 'CA');
CREATE TABLE customers_eu PARTITION OF customers FOR VALUES IN ('UK', 'DE', 'FR');
CREATE TABLE customers_asia PARTITION OF customers FOR VALUES IN ('JP', 'IN', 'SG');
-- MySQL: List partitioning
CREATE TABLE orders (
id INT NOT NULL,
status ENUM('pending','processing','shipped','delivered','cancelled'),
total DECIMAL(10,2)
)
PARTITION BY LIST COLUMNS (status) (
PARTITION p_active VALUES IN ('pending', 'processing'),
PARTITION p_shipped VALUES IN ('shipped'),
PARTITION p_completed VALUES IN ('delivered'),
PARTITION p_cancelled VALUES IN ('cancelled')
);
Composite Partitioning
Composite (sub-partitioning) combines two strategies. For example, range partition by date, then hash sub-partition by user_id within each date range. This is useful for very large datasets that benefit from two levels of organization.
Partition Pruning
Partition pruning is the database optimizer's ability to skip partitions that cannot contain matching rows. This is the primary performance benefit of partitioning. When your query includes the partition key in the WHERE clause, the optimizer eliminates irrelevant partitions before scanning.
-- Verify partition pruning with EXPLAIN
EXPLAIN (ANALYZE)
SELECT * FROM orders
WHERE created_at BETWEEN '2024-03-01' AND '2024-03-31';
-- Output shows only orders_2024_03 is scanned:
-- Append
-- Subplans Removed: 11 -- 11 partitions pruned!
-- -> Seq Scan on orders_2024_03
-- Filter: (created_at >= '2024-03-01' AND created_at <= '2024-03-31')
-- Rows Removed by Filter: 0
-- Actual Rows: 245000
Partition pruning only works when the query includes the partition key. Queries without the partition key must scan all partitions, negating the benefit.
The Hot Partition Problem
A hot partition occurs when one partition receives disproportionately more traffic than others. Common causes include:
- Time-based partitioning: The current month's partition handles all new writes
- Skewed data: A popular region or category has far more data than others
- Celebrity problem: A few power users generate most of the traffic
Solutions:
- Use smaller time intervals (daily instead of monthly partitions)
- Combine range partitioning with hash sub-partitioning to spread writes within a time range
- Add a random suffix to the partition key to distribute hot keys
- Monitor partition sizes and rebalance when skew exceeds thresholds
Partitioning vs Sharding
| Feature | Partitioning | Sharding |
|---|---|---|
| Scope | Within a single database server | Across multiple database servers |
| Transparency | Handled by database engine | Requires application-level routing |
| Joins | Cross-partition joins supported | Cross-shard joins are expensive |
| Transactions | Normal ACID transactions | Distributed transactions needed |
| Scalability limit | Single server capacity | Nearly unlimited |
Start with partitioning. Only move to sharding when a single server cannot handle your workload. Partitioning gives you most of the query performance benefits without the operational complexity of a distributed system.
Best Practices for Partitioning
- Choose the partition key based on query patterns — the key should appear in the WHERE clause of your most frequent queries
- Keep partitions roughly equal in size — uneven partitions lead to hot spots
- Create partitions proactively — for time-based partitioning, create future partitions ahead of time (automate with pg_partman in PostgreSQL)
- Index each partition appropriately — partition-local indexes are smaller and more efficient than global indexes
- Monitor partition sizes — track growth and plan capacity accordingly
- Use partition-wise aggregation — PostgreSQL 11+ can aggregate within each partition in parallel
For deeper optimization, combine partitioning with proper indexing strategies and consider read replicas for scaling reads.
Frequently Asked Questions
How many partitions should I create?
There is no universal answer, but aim for partitions between 100MB and 10GB each. Too many small partitions increase planning overhead (the query optimizer must evaluate each one). Too few large partitions reduce the benefit of partition pruning. For time-based data, monthly or weekly partitions are common starting points.
Can I add partitions to an existing table?
In PostgreSQL 10+, you can add partitions with zero downtime using ALTER TABLE ... ATTACH PARTITION. However, converting an existing non-partitioned table to a partitioned one requires creating a new partitioned table and migrating data. pg_partman automates partition creation and maintenance.
Does partitioning replace indexing?
No. Partitioning and indexing are complementary. Partitioning reduces the amount of data the database needs to consider. Indexes speed up lookups within each partition. The ideal setup is partitioned tables with appropriate indexes on each partition.
Should I partition every large table?
Not necessarily. Partition tables that have clear access patterns aligned with a partition key. If your queries do not consistently filter on a specific column, partitioning may not help and could even hurt performance due to optimizer overhead. Benchmark before and after partitioning.
What happens when I query without the partition key?
The database must scan all partitions — a partition-wise sequential scan. This can be slower than scanning a single non-partitioned table due to overhead. Always design queries to include the partition key in the WHERE clause for optimal performance.