Database Partitioning: Optimizing Performance with Data Segmentation

Database partitioning divides a large table into smaller, more manageable pieces within a single database server. Unlike sharding which distributes data across multiple servers, partitioning keeps all data on one server but organizes it for faster queries and easier maintenance. This guide covers horizontal and vertical partitioning, strategies, partition pruning, and real SQL examples.

Why Partition Your Database?

As tables grow to millions or billions of rows, performance degrades. Full table scans become slow, index maintenance becomes expensive, and operations like VACUUM or backups take increasingly longer. Partitioning addresses these problems by allowing the database to work with smaller chunks of data.

Query performance: Partition pruning skips irrelevant partitions entirely
Maintenance efficiency: VACUUM, REINDEX, and backups operate on individual partitions
Data lifecycle management: Drop old partitions instead of deleting rows
Parallel query execution: Multiple partitions can be scanned simultaneously

Horizontal vs Vertical Partitioning

Horizontal Partitioning

Horizontal partitioning splits a table by rows. Each partition contains a subset of rows with the same schema. For example, an orders table might be partitioned by month — January orders in one partition, February orders in another.

-- PostgreSQL: Range partitioning by date
CREATE TABLE orders (
    id BIGSERIAL,
    user_id INT NOT NULL,
    total DECIMAL(10,2) NOT NULL,
    status VARCHAR(20),
    created_at TIMESTAMP NOT NULL
) PARTITION BY RANGE (created_at);

-- Create monthly partitions
CREATE TABLE orders_2024_01 PARTITION OF orders
    FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');

CREATE TABLE orders_2024_02 PARTITION OF orders
    FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');

CREATE TABLE orders_2024_03 PARTITION OF orders
    FOR VALUES FROM ('2024-03-01') TO ('2024-04-01');

-- Queries automatically route to the correct partition
SELECT * FROM orders WHERE created_at BETWEEN '2024-02-01' AND '2024-02-28';
-- Only scans orders_2024_02 (partition pruning)

Vertical Partitioning

Vertical partitioning splits a table by columns. Frequently accessed columns stay in one table, while rarely used or large columns (BLOBs, TEXT) go to another. This keeps the hot table smaller and more cache-friendly.

-- Vertical partitioning: Split a wide products table
-- Hot columns (frequently queried)
CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name VARCHAR(200) NOT NULL,
    price DECIMAL(10,2) NOT NULL,
    category_id INT NOT NULL,
    stock_count INT NOT NULL
);

-- Cold columns (rarely accessed, large data)
CREATE TABLE product_details (
    product_id INT PRIMARY KEY REFERENCES products(id),
    description TEXT,
    specifications JSONB,
    manual_pdf BYTEA,
    seo_metadata TEXT
);

-- Most queries only need the hot table
SELECT name, price FROM products WHERE category_id = 5;
-- Small table, fast scan, fits in memory

Partitioning Strategies

Range Partitioning

Range partitioning divides data based on a continuous range of values. It is ideal for time-series data, date-based queries, and sequential data.

Partition	Range	Example Data
orders_q1_2024	Jan 1 - Mar 31	2.3M rows
orders_q2_2024	Apr 1 - Jun 30	2.8M rows
orders_q3_2024	Jul 1 - Sep 30	3.1M rows
orders_q4_2024	Oct 1 - Dec 31	2.6M rows

Hash Partitioning

Hash partitioning applies a hash function to the partition key to distribute rows evenly across partitions. It is useful when there is no natural range and you want even distribution.

-- PostgreSQL: Hash partitioning
CREATE TABLE sessions (
    id UUID NOT NULL,
    user_id INT NOT NULL,
    data JSONB,
    created_at TIMESTAMP NOT NULL
) PARTITION BY HASH (user_id);

-- Create 4 hash partitions
CREATE TABLE sessions_p0 PARTITION OF sessions FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE sessions_p1 PARTITION OF sessions FOR VALUES WITH (MODULUS 4, REMAINDER 1);
CREATE TABLE sessions_p2 PARTITION OF sessions FOR VALUES WITH (MODULUS 4, REMAINDER 2);
CREATE TABLE sessions_p3 PARTITION OF sessions FOR VALUES WITH (MODULUS 4, REMAINDER 3);

-- Queries with user_id hit only one partition
SELECT * FROM sessions WHERE user_id = 42;
-- Scans only the partition where hash(42) % 4 matches

List Partitioning

List partitioning assigns specific values to each partition. Ideal for categorical data like region, status, or country.

-- PostgreSQL: List partitioning by region
CREATE TABLE customers (
    id SERIAL,
    name VARCHAR(200),
    email VARCHAR(255),
    region VARCHAR(20) NOT NULL
) PARTITION BY LIST (region);

CREATE TABLE customers_us PARTITION OF customers FOR VALUES IN ('US', 'CA');
CREATE TABLE customers_eu PARTITION OF customers FOR VALUES IN ('UK', 'DE', 'FR');
CREATE TABLE customers_asia PARTITION OF customers FOR VALUES IN ('JP', 'IN', 'SG');

-- MySQL: List partitioning
CREATE TABLE orders (
    id INT NOT NULL,
    status ENUM('pending','processing','shipped','delivered','cancelled'),
    total DECIMAL(10,2)
)
PARTITION BY LIST COLUMNS (status) (
    PARTITION p_active VALUES IN ('pending', 'processing'),
    PARTITION p_shipped VALUES IN ('shipped'),
    PARTITION p_completed VALUES IN ('delivered'),
    PARTITION p_cancelled VALUES IN ('cancelled')
);

Composite Partitioning

Composite (sub-partitioning) combines two strategies. For example, range partition by date, then hash sub-partition by user_id within each date range. This is useful for very large datasets that benefit from two levels of organization.

Partition Pruning

Partition pruning is the database optimizer's ability to skip partitions that cannot contain matching rows. This is the primary performance benefit of partitioning. When your query includes the partition key in the WHERE clause, the optimizer eliminates irrelevant partitions before scanning.

-- Verify partition pruning with EXPLAIN
EXPLAIN (ANALYZE)
SELECT * FROM orders
WHERE created_at BETWEEN '2024-03-01' AND '2024-03-31';

-- Output shows only orders_2024_03 is scanned:
-- Append
--   Subplans Removed: 11    -- 11 partitions pruned!
--   ->  Seq Scan on orders_2024_03
--         Filter: (created_at >= '2024-03-01' AND created_at <= '2024-03-31')
--         Rows Removed by Filter: 0
--         Actual Rows: 245000

Partition pruning only works when the query includes the partition key. Queries without the partition key must scan all partitions, negating the benefit.

The Hot Partition Problem

A hot partition occurs when one partition receives disproportionately more traffic than others. Common causes include:

Time-based partitioning: The current month's partition handles all new writes
Skewed data: A popular region or category has far more data than others
Celebrity problem: A few power users generate most of the traffic

Solutions:

Use smaller time intervals (daily instead of monthly partitions)
Combine range partitioning with hash sub-partitioning to spread writes within a time range
Add a random suffix to the partition key to distribute hot keys
Monitor partition sizes and rebalance when skew exceeds thresholds

Partitioning vs Sharding

Feature	Partitioning	Sharding
Scope	Within a single database server	Across multiple database servers
Transparency	Handled by database engine	Requires application-level routing
Joins	Cross-partition joins supported	Cross-shard joins are expensive
Transactions	Normal ACID transactions	Distributed transactions needed
Scalability limit	Single server capacity	Nearly unlimited

Start with partitioning. Only move to sharding when a single server cannot handle your workload. Partitioning gives you most of the query performance benefits without the operational complexity of a distributed system.

Best Practices for Partitioning

Choose the partition key based on query patterns — the key should appear in the WHERE clause of your most frequent queries
Keep partitions roughly equal in size — uneven partitions lead to hot spots
Create partitions proactively — for time-based partitioning, create future partitions ahead of time (automate with pg_partman in PostgreSQL)
Index each partition appropriately — partition-local indexes are smaller and more efficient than global indexes
Monitor partition sizes — track growth and plan capacity accordingly
Use partition-wise aggregation — PostgreSQL 11+ can aggregate within each partition in parallel

For deeper optimization, combine partitioning with proper indexing strategies and consider read replicas for scaling reads.

Frequently Asked Questions

How many partitions should I create?

There is no universal answer, but aim for partitions between 100MB and 10GB each. Too many small partitions increase planning overhead (the query optimizer must evaluate each one). Too few large partitions reduce the benefit of partition pruning. For time-based data, monthly or weekly partitions are common starting points.

Can I add partitions to an existing table?

In PostgreSQL 10+, you can add partitions with zero downtime using ALTER TABLE ... ATTACH PARTITION. However, converting an existing non-partitioned table to a partitioned one requires creating a new partitioned table and migrating data. pg_partman automates partition creation and maintenance.

Does partitioning replace indexing?

No. Partitioning and indexing are complementary. Partitioning reduces the amount of data the database needs to consider. Indexes speed up lookups within each partition. The ideal setup is partitioned tables with appropriate indexes on each partition.

Should I partition every large table?

Not necessarily. Partition tables that have clear access patterns aligned with a partition key. If your queries do not consistently filter on a specific column, partitioning may not help and could even hurt performance due to optimizer overhead. Benchmark before and after partitioning.

What happens when I query without the partition key?

The database must scan all partitions — a partition-wise sequential scan. This can be slower than scanning a single non-partitioned table due to overhead. Always design queries to include the partition key in the WHERE clause for optimal performance.

Database Partitioning: Optimizing Performance with Data Segmentation