Time-Series Databases: Storing and Querying Temporal Data at Scale

Time-series databases are purpose-built for data that is indexed by time — sensor readings, application metrics, financial prices, and event logs. They optimize for high write throughput, efficient time-range queries, and automatic data lifecycle management. This guide covers the leading time-series databases, data modeling patterns, and real-world use cases.

What Makes Time-Series Data Special?

Time-series data has characteristics that differ significantly from typical relational data:

Write-heavy: Data arrives continuously at high volume (millions of data points per second for IoT)
Append-only: Data is almost never updated — new data points are always appended
Time-ordered: Queries almost always filter by time range
Recent data is hot: Recent data is queried frequently; old data is rarely accessed
Downsampling is acceptable: Old data can be aggregated (e.g., minute-level to hourly averages)
High cardinality tags: Data is often tagged with metadata (server name, region, sensor ID)

Traditional SQL databases can store time-series data, but they are not optimized for these patterns. A generic PostgreSQL table with timestamps will work for small scale but struggles at millions of writes per second with time-range queries spanning billions of rows.

Time-Series Data Model

Time-series data typically consists of three components:

Component	Description	Example
Timestamp	When the measurement was taken	2024-03-15T10:30:00Z
Tags (dimensions)	Metadata for filtering/grouping (indexed)	host=web-01, region=us-east
Fields (values)	The actual measurements (not indexed)	cpu_usage=72.5, memory_used=8192

InfluxDB

InfluxDB is one of the most popular purpose-built time-series databases. It features a custom storage engine optimized for time-series workloads, built-in HTTP API, and the Flux query language for data transformation.

Data Model and Line Protocol

// InfluxDB Line Protocol: measurement,tags fields timestamp
cpu,host=web-01,region=us-east usage=72.5,idle=27.5 1710499800000000000
cpu,host=web-02,region=us-east usage=45.2,idle=54.8 1710499800000000000
memory,host=web-01,region=us-east used=8192,total=16384 1710499800000000000

// Write via HTTP API
// POST http://localhost:8086/api/v2/write?org=myorg&bucket=metrics
// Body: cpu,host=web-01 usage=72.5 1710499800000000000

Flux Query Language

// Flux: Query CPU usage for the last hour, aggregated per 5 minutes
from(bucket: "metrics")
  |> range(start: -1h)
  |> filter(fn: (r) => r._measurement == "cpu" and r._field == "usage")
  |> filter(fn: (r) => r.region == "us-east")
  |> aggregateWindow(every: 5m, fn: mean, createEmpty: false)
  |> yield(name: "mean_cpu")

// Flux: Find hosts with CPU usage over 90% in the last 30 minutes
from(bucket: "metrics")
  |> range(start: -30m)
  |> filter(fn: (r) => r._measurement == "cpu" and r._field == "usage")
  |> max()
  |> filter(fn: (r) => r._value > 90.0)

Retention Policies

// InfluxDB: Create buckets with retention policies
// Raw data: keep for 7 days
// Downsampled (hourly): keep for 90 days
// Downsampled (daily): keep forever

// Create bucket with 7-day retention
influx bucket create --name metrics-raw --retention 7d

// Create downsampling task
option task = {name: "downsample_hourly", every: 1h}

from(bucket: "metrics-raw")
  |> range(start: -task.every)
  |> filter(fn: (r) => r._measurement == "cpu")
  |> aggregateWindow(every: 1h, fn: mean)
  |> to(bucket: "metrics-hourly")

TimescaleDB

TimescaleDB is a time-series database built as a PostgreSQL extension. It gives you the power of time-series optimizations while keeping full PostgreSQL compatibility — standard SQL, joins, indexes, extensions, and the entire PostgreSQL ecosystem.

Hypertables

TimescaleDB's core abstraction is the hypertable — a virtual table that automatically partitions data into chunks based on time. You interact with it as a single table, but internally it manages partitions for optimal performance.

-- TimescaleDB: Create a hypertable
CREATE TABLE metrics (
    time TIMESTAMPTZ NOT NULL,
    host TEXT NOT NULL,
    region TEXT NOT NULL,
    cpu_usage DOUBLE PRECISION,
    memory_used BIGINT,
    disk_io DOUBLE PRECISION
);

-- Convert to a hypertable (auto-partitions by time)
SELECT create_hypertable('metrics', 'time');

-- Optional: Add space partitioning by host for parallel queries
SELECT add_dimension('metrics', 'host', number_partitions => 4);

-- Standard SQL works perfectly
SELECT host,
       time_bucket('5 minutes', time) AS bucket,
       AVG(cpu_usage) AS avg_cpu,
       MAX(cpu_usage) AS max_cpu
FROM metrics
WHERE time > NOW() - INTERVAL '1 hour'
  AND region = 'us-east'
GROUP BY host, bucket
ORDER BY bucket DESC;

-- Continuous Aggregates (materialized views that auto-update)
CREATE MATERIALIZED VIEW hourly_metrics
WITH (timescaledb.continuous) AS
SELECT host,
       time_bucket('1 hour', time) AS bucket,
       AVG(cpu_usage) AS avg_cpu,
       MAX(cpu_usage) AS max_cpu,
       AVG(memory_used) AS avg_memory
FROM metrics
GROUP BY host, bucket;

-- Add a refresh policy (automatically update every hour)
SELECT add_continuous_aggregate_policy('hourly_metrics',
    start_offset => INTERVAL '3 hours',
    end_offset => INTERVAL '1 hour',
    schedule_interval => INTERVAL '1 hour');

-- Compression policy (compress chunks older than 7 days)
ALTER TABLE metrics SET (
    timescaledb.compress,
    timescaledb.compress_segmentby = 'host',
    timescaledb.compress_orderby = 'time DESC'
);

SELECT add_compression_policy('metrics', INTERVAL '7 days');

-- Retention policy (drop data older than 90 days)
SELECT add_retention_policy('metrics', INTERVAL '90 days');

Prometheus

Prometheus is an open-source monitoring and alerting system with a built-in time-series database. Unlike InfluxDB and TimescaleDB (which are push-based), Prometheus uses a pull model — it scrapes metrics from targets at regular intervals.

PromQL

# PromQL: Query CPU usage rate over 5 minutes
rate(node_cpu_seconds_total{mode="user"}[5m])

# Average HTTP request duration by endpoint
histogram_quantile(0.95,
  rate(http_request_duration_seconds_bucket{job="api"}[5m])
)

# Alert: High error rate
sum(rate(http_requests_total{status=~"5.."}[5m]))
/
sum(rate(http_requests_total[5m]))
> 0.05  # More than 5% errors

# Memory usage percentage
(node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes)
/ node_memory_MemTotal_bytes * 100

Prometheus Architecture

Prometheus stores data locally on disk in a custom format optimized for time-series. For long-term storage, it integrates with remote storage backends like Thanos or Cortex that provide horizontal scalability and long-term retention. Prometheus itself is designed for reliability over durability — it is a monitoring system first, not a general-purpose database.

Comparison of Time-Series Databases

Feature	InfluxDB	TimescaleDB	Prometheus
Query Language	Flux	Standard SQL	PromQL
Foundation	Custom engine	PostgreSQL extension	Custom engine
Data Collection	Push (HTTP API)	Push (SQL INSERT)	Pull (scraping)
JOINs	Limited	Full SQL JOINs	Limited
Best For	IoT, custom metrics	SQL teams, complex analysis	Infrastructure monitoring

Downsampling and Retention Strategies

Raw time-series data grows rapidly. A system monitoring 1,000 hosts with 100 metrics each at 10-second intervals generates 8.6 billion data points per day. Downsampling and retention policies manage this growth.

A typical retention strategy involves three tiers: raw data retained for 7-30 days, minute-level aggregates for 90 days, and hourly aggregates for 1-2 years. This provides detailed data for recent troubleshooting and long-term trends for capacity planning.

Compression Techniques

Time-series databases achieve 10-20x compression through techniques like delta encoding (storing differences between consecutive values), run-length encoding (compressing repeated values), dictionary encoding (for tags), and Gorilla compression (Facebook's algorithm for floating-point time-series).

Use Cases

IoT Sensor Data

Thousands of sensors reporting temperature, humidity, pressure, and location every few seconds. TimescaleDB or InfluxDB handles the ingestion volume while allowing complex queries across sensor dimensions.

Application Performance Monitoring

Prometheus scrapes request latency, error rates, and throughput from application instances. Combined with Grafana dashboards, it provides real-time visibility into system health.

Financial Data

Stock prices, trading volumes, and order book data at tick-level granularity. TimescaleDB's SQL compatibility makes it ideal for financial analysts who already know SQL.

For related topics, see our guides on distributed databases, data modeling patterns, and monitoring systems.

Frequently Asked Questions

Can I use PostgreSQL for time-series data instead of a specialized database?

Yes, for small to medium scale. PostgreSQL with table partitioning and proper indexing handles time-series well up to a few billion rows. Beyond that, TimescaleDB (a PostgreSQL extension) adds hypertables, continuous aggregates, and compression that dramatically improve performance while keeping SQL compatibility.

What is the difference between push and pull metrics collection?

Push-based systems (InfluxDB) receive data sent by applications. Pull-based systems (Prometheus) actively fetch data from targets. Pull is better for monitoring (the monitor controls the schedule and detects down targets). Push is better for event-driven data (IoT sensors, application logs) where the source controls when data is generated.

How do I handle high cardinality in time-series databases?

High cardinality (many unique tag combinations) is the biggest performance challenge. Avoid using unbounded values (user IDs, IP addresses) as tags. Instead, use them as fields (not indexed). InfluxDB's IOx engine and TimescaleDB handle high cardinality better than older InfluxDB versions or Prometheus.

Should I use InfluxDB or TimescaleDB?

Choose TimescaleDB if your team knows SQL, you need JOINs with relational data, or you want PostgreSQL ecosystem tools. Choose InfluxDB if you want a purpose-built time-series experience, use Flux for data transformation pipelines, or need the InfluxDB/Telegraf/Grafana stack. Both are excellent choices.

Time-Series Databases: Storing and Querying Temporal Data at Scale