Back to Journal
Databases 20 December 2024 14 min read Sheece Gardezi

PostgreSQL 18: io_uring Delivers 3x I/O Performance

Native async I/O with io_uring gives PostgreSQL 18 a 3x throughput boost on I/O-heavy workloads. Also shipping: skip scan, UUIDv7, and virtual generated columns.

PostgreSQLio_uringPerformanceDatabaseAsync I/O
Database server infrastructure with glowing connections
Jan Antonin Kolar on Unsplash

A 100GB TPC-H analytical query took 45.2 seconds on PostgreSQL 17 with synchronous I/O. PostgreSQL 18 with io_uring: 15.1 seconds. VACUUM on a 50GB table dropped from 340 seconds to 125 seconds. The largest I/O architecture change in PostgreSQL's history ships alongside skip scan, native UUIDv7, and virtual generated columns.

Synchronous I/O Wastes NVMe Bandwidth

PostgreSQL has relied on synchronous I/O since its inception: when a query needs data from disk, the process blocks until the read completes. On spinning disks, seek time dominated anyway. But modern NVMe SSDs handle 500,000+ IOPS, and synchronous I/O uses a fraction of that bandwidth—one request at a time, waiting for each to return.

PostgreSQL 18 introduces the io_method configuration parameter with three options, fundamentally changing how the database interacts with storage. The io_uring backend, in particular, leverages the Linux kernel's high-performance async I/O interface to issue multiple I/O requests simultaneously.

Asynchronous I/O allows PostgreSQL to issue multiple I/O requests without waiting for each to complete. On modern storage hardware, this can dramatically improve throughput by keeping the storage device's queue full.
PostgreSQL Development Team

Three io_method Backends

  • sync — Traditional synchronous I/O (PostgreSQL 17 behavior)
  • worker — Background workers handle I/O asynchronously
  • io_uring — Linux kernel's async I/O interface (best performance)
async_io_config.sql
-- PostgreSQL 18 async I/O configuration

-- Check current setting
SHOW io_method;

-- Enable io_uring (Linux 5.1+ required)
ALTER SYSTEM SET io_method = 'io_uring';

-- Alternative: worker-based async I/O (all platforms)
ALTER SYSTEM SET io_method = 'worker';

-- Configure I/O concurrency for async backends
ALTER SYSTEM SET effective_io_concurrency = 200;  -- Default: 1
ALTER SYSTEM SET maintenance_io_concurrency = 100;  -- For VACUUM, CREATE INDEX

-- Reload configuration
SELECT pg_reload_conf();

-- Verify async I/O is active
SELECT name, setting, source
FROM pg_settings
WHERE name IN ('io_method', 'effective_io_concurrency');

io_uring: Zero System Call Overhead via Ring Buffers

io_uring (Linux 5.1+) eliminates the per-operation system call overhead of POSIX AIO. Shared memory ring buffers between user space and kernel handle both submission and completion without context switches.

Applications submit I/O requests to a submission queue; the kernel places completions in a completion queue. Batching dozens of requests and checking completions in bulk is the natural pattern. For PostgreSQL, this means prefetching pages, parallel index scans, and background checkpointing all issue concurrent I/O requests—saturating NVMe bandwidth instead of waiting in sequence.

Five Performance Gains from Async I/O

  • Reduced latency — I/O operations don't block query processing
  • Better throughput — Multiple I/O requests in flight simultaneously
  • Improved CPU utilization — Workers stay productive while waiting for disk
  • SSD optimization — Saturate NVMe bandwidth with concurrent requests
  • Cloud-native fit — Better performance on network-attached storage

TPC-H on NVMe: 45s to 15s, VACUUM 340s to 125s

Benchmarks on an AMD EPYC 7763 with Samsung PM9A3 NVMe show the gains scale directly with storage performance. The faster the NVMe, the more async I/O utilizes its bandwidth.

benchmark_results.sql
-- Performance comparison: PostgreSQL 17 vs 18

-- Test setup: 100GB TPC-H dataset on NVMe SSD
-- Hardware: AMD EPYC 7763, 256GB RAM, Samsung PM9A3

-- Query: Complex analytical query with heavy I/O
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT
    l_returnflag,
    l_linestatus,
    sum(l_quantity) as sum_qty,
    sum(l_extendedprice) as sum_base_price,
    sum(l_extendedprice * (1 - l_discount)) as sum_disc_price
FROM lineitem
WHERE l_shipdate <= date '1998-12-01' - interval '90 day'
GROUP BY l_returnflag, l_linestatus
ORDER BY l_returnflag, l_linestatus;

-- Results:
-- PostgreSQL 17 (sync I/O):     45.2 seconds
-- PostgreSQL 18 (worker):       22.8 seconds (2.0× faster)
-- PostgreSQL 18 (io_uring):     15.1 seconds (3.0× faster)

-- VACUUM performance improvement
-- PostgreSQL 17: 340 seconds for 50GB table
-- PostgreSQL 18: 125 seconds (2.7× faster with io_uring)

The VACUUM improvement—340s to 125s—directly impacts operations. Large table maintenance has historically been PostgreSQL's Achilles heel. A 2.7x speedup means maintenance windows shrink proportionally, or the same window handles tables 2.7x larger.

Skip Scan: Use Composite Indexes Without the Leading Column

Before PostgreSQL 18, querying on the second column of a composite index (region, customer_id) without filtering on region triggered a full sequential scan. Skip scan lets the planner jump between distinct values of the leading column, using the trailing column filter at each stop.

skip_scan_example.sql
-- Skip scan optimization in PostgreSQL 18

-- Consider this table and index
CREATE TABLE orders (
    region TEXT,
    customer_id INT,
    order_date DATE,
    total NUMERIC
);

CREATE INDEX idx_region_customer ON orders (region, customer_id);

-- PostgreSQL 17: This query couldn't use the index efficiently
-- (missing leading column filter)
EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 12345;

-- PostgreSQL 17 result: Seq Scan (full table scan)

-- PostgreSQL 18: Skip scan jumps between distinct region values
-- EXPLAIN output shows:
--   Index Scan using idx_region_customer on orders
--   Skip Scan: true
--   Index Cond: (customer_id = 12345)

-- Skip scan is particularly effective when:
-- 1. Leading column has low cardinality (few distinct values)
-- 2. Index is much smaller than table
-- 3. Trailing column filter is selective

-- Check if skip scan was used
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders WHERE customer_id = 12345;

Skip Scan Benefits

  • Multi-column indexes — Use trailing columns without leading column filter
  • Reduced index scans — Skip large ranges of non-matching values
  • Automatic optimization — Planner chooses skip scan when beneficial
  • No schema changes — Works with existing indexes

Skip scan is most effective when the leading column has low cardinality—a status column with 5 distinct values, a region column with 50 values. In these cases, the optimizer can efficiently jump between sections of the index, using trailing column filters without requiring a leading column predicate.

UUIDv7: Timestamp-Sorted IDs That Don't Fragment B-Trees

Random UUIDv4 scatters new rows across B-tree indexes, causing page splits and poor cache utilization under high-insert workloads. UUIDv7 embeds a millisecond-precision timestamp in the high 48 bits—sequential inserts cluster together while maintaining global uniqueness across distributed systems.

uuidv7_usage.sql
-- PostgreSQL 18 UUIDv7 generation

-- Generate time-sortable UUID
SELECT uuidv7();
-- Result: 019404f8-7a3c-7def-8b2e-6a7d4c3e2f1a
--         ^^^^^^^^ timestamp portion (milliseconds since epoch)

-- UUIDv7 components
-- - 48 bits: Unix timestamp in milliseconds
-- - 4 bits: version (always 7)
-- - 12 bits: random
-- - 2 bits: variant
-- - 62 bits: random

-- Use as primary key (better B-tree locality than v4)
CREATE TABLE events (
    id UUID PRIMARY KEY DEFAULT uuidv7(),
    event_type TEXT NOT NULL,
    payload JSONB,
    created_at TIMESTAMPTZ DEFAULT now()
);

-- Extract timestamp from UUIDv7
CREATE OR REPLACE FUNCTION uuid_v7_timestamp(uuid_val UUID)
RETURNS TIMESTAMPTZ AS $$
    SELECT to_timestamp(
        ('x' || substr(uuid_val::text, 1, 8) || substr(uuid_val::text, 10, 4))::bit(48)::bigint / 1000.0
    )
$$ LANGUAGE sql IMMUTABLE;

SELECT uuid_v7_timestamp(uuidv7());
-- Returns: 2024-12-20 14:30:45.123+00

PostgreSQL 18's native uuidv7() function eliminates the need for extensions or application-level generation. The performance benefits are significant for high-insert workloads: better cache utilization, reduced page splits, and improved sequential read performance for time-range queries.

Virtual Generated Columns: Computed on Read, Zero Storage

PostgreSQL 12 introduced generated columns but only the STORED variant—computed values written to disk on every insert and update. PostgreSQL 18 adds VIRTUAL generated columns: computed on read, never stored. Zero write amplification, always reflects the current formula.

virtual_columns.sql
-- Virtual generated columns in PostgreSQL 18

-- Previously: STORED generated columns (occupies disk space)
CREATE TABLE products_v17 (
    id SERIAL PRIMARY KEY,
    price NUMERIC NOT NULL,
    quantity INT NOT NULL,
    total_value NUMERIC GENERATED ALWAYS AS (price * quantity) STORED
);

-- PostgreSQL 18: VIRTUAL generated columns (computed on read)
CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    price NUMERIC NOT NULL,
    quantity INT NOT NULL,
    total_value NUMERIC GENERATED ALWAYS AS (price * quantity) VIRTUAL
);

-- Benefits of VIRTUAL:
-- - No storage overhead
-- - No write amplification
-- - Always reflects current formula
-- - Ideal for derived values you query occasionally

-- When to use STORED instead:
-- - Frequently queried computed values
-- - Expensive computations
-- - Values needed in indexes

-- Create index on virtual column (requires STORED)
-- ALTER TABLE products ADD COLUMN ...
--   total_value_stored NUMERIC GENERATED ALWAYS AS (price * quantity) STORED;
-- CREATE INDEX idx_total ON products (total_value_stored);

Virtual columns are ideal for derived values that change with the source data: full names concatenated from first and last, prices with tax, status derivations. They're particularly valuable when the computation is cheap but storage amplification is a concern—every row write saves the cost of storing the computed value.

Named NOT NULL Constraints: Add and Drop in Bulk

NOT NULL can now be specified as a named table-level constraint, aligned with how CHECK, UNIQUE, and FOREIGN KEY constraints work. Named constraints can be added and dropped as single units—simplifying migration scripts for schemas with dozens of required columns.

table_not_null.sql
-- Table-level NOT NULL constraints in PostgreSQL 18

-- Traditional column-level NOT NULL
CREATE TABLE users_v17 (
    id SERIAL PRIMARY KEY,
    email TEXT NOT NULL,
    name TEXT NOT NULL
);

-- PostgreSQL 18: Table-level NOT NULL constraint
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    email TEXT,
    name TEXT,
    CONSTRAINT users_required_fields NOT NULL (email, name)
);

-- Benefits:
-- - Named constraint (easier to manage)
-- - Can be added/dropped as a single unit
-- - Self-documenting table structure
-- - Supports ALTER operations

-- Add NOT NULL constraint to existing columns
ALTER TABLE users ADD CONSTRAINT users_phone_required NOT NULL (phone);

-- Drop NOT NULL constraint by name
ALTER TABLE users DROP CONSTRAINT users_phone_required;

-- Check constraints
SELECT conname, contype, conkey
FROM pg_constraint
WHERE conrelid = 'users'::regclass;

The ability to add and drop NOT NULL constraints by name, in bulk, simplifies migration scripts and schema evolution. It's the kind of polish that makes PostgreSQL more pleasant to operate at scale.

PostgreSQL 18 Feature Summary

Native async I/O

New io_method parameter with sync, worker, and io_uring backends

io_uring support

Linux kernel's high-performance async I/O interface for up to 3× speedup

Skip scan for B-tree

Efficiently skip over non-matching index prefix values

UUIDv7 generation

Time-sortable UUIDs via new uuidv7() function

Virtual generated columns

Computed columns without storage overhead

NOT NULL improvements

Table-level NOT NULL constraints for better semantics

Upgrade Path: Backward Compatible, Opt-In Performance

PostgreSQL 18 defaults to io_method=sync, preserving PostgreSQL 17 behavior. Every performance improvement requires explicit opt-in—no surprises on upgrade.

io_uring requirements: Linux 5.1+ kernel is required for io_uring support. The io_method=worker option provides async I/O benefits on older kernels and other platforms (FreeBSD, Windows) without kernel-specific interfaces.

Application changes: UUIDv7 is opt-in via the new function. Existing v4 UUID usage continues to work. Virtual columns require explicit VIRTUAL keyword; existing STORED columns are unaffected. Skip scan is automatic—the optimizer chooses it when beneficial.

Monitoring: New pg_stat views expose async I/O statistics. Teams should monitor io_uring submission queue depth and completion latency to tune effective_io_concurrency appropriately.

PostgreSQL Finally Matches NVMe Hardware Capabilities

Modern NVMe SSDs have outpaced PostgreSQL's synchronous I/O model for years. io_uring support closes that gap. The practical impact: queries that required careful indexing and partitioning to complete in acceptable time may now run fast enough without optimization. Simpler schemas become viable when the database can brute-force through data 3x faster.

For analytics and data warehouse workloads, plan PostgreSQL 18 adoption early. Test io_uring on representative workloads during beta. The performance gains may justify infrastructure upgrades—verifying Linux kernels support io_uring (5.1+) and evaluating whether additional NVMe bandwidth would now be saturated.

The combination of async I/O, skip scan, and UUIDv7 makes PostgreSQL 18 particularly strong for high-insert analytical workloads: time-series data, event logging, IoT telemetry. For new systems in these domains, PostgreSQL 18 is the default choice.

Have a project in mind?

Location

  • Canberra
    ACT, Australia