Sequences are a robust PostgreSQL feature for generating unique identifiers for database rows. Unlike basic auto-increments, PostgreSQL sequences have advanced functionalities that allow extensive customization and sharing across tables.

In this comprehensive guide, we will dive deep into PostgreSQL sequence capabilities, usage patterns, internals, optimizations and industry best practices.

What Sets PostgreSQL Sequences Apart?

The key advantages that sets PostgreSQL sequences above auto-increments are:

1. Flexible Generation Rules

Granular control over increments, upper/lower bounds and circular ranges allows numbers to be generated as per application logic. This surpasses the simplicity of standard auto-increments.

2. Shareable Across Tables

As sequences are independent objects, they can be defined once and leveraged across multiple tables requiring identifiers.

3. Advanced Caching & Prefetching

Sequences support optimized data access by generating ids in batches through configurable caches. This minimizes disk writes.

4. Independent Metadata Tables

Sequence state is maintained in dedicated metadata tables allowing robust recoverability after crashes. Auto-increments rely on max values from fragile base tables.

5. Set-Returning Variants

Additional functions like nextval_mult, setval_mult provide the ability to reserve blocks of future values in a single call.

These capabilities expand the horizons of sequential number generation and enable sequences to handle more complex database patterns.

SQL Syntax & Options

Sequences are created using the CREATE SEQUENCE syntax:

CREATE SEQUENCE sequence_name
    INCREMENT BY 1
    MINVALUE 1 
    MAXVALUE 9223372036854775807
    START WITH 1
    CACHE 1;

The key sequence configuration options are:

Option Description Default
INCREMENT BY Increment between numbers 1
MINVALUE Minimum value 1
MAXVALUE Maximum value Max of data type
START Initial sequence value Min value
CACHE Preallocated numbers 1
CYCLE Recycle on limits Not set

These parameters allow sequences to be tailored as per application patterns.

Incrementing Sequences

The increment controls the difference between subsequent numbers in a sequence.

For sequences used in primary keys, an increment higher than 1 results in skipped numbers:

CREATE SEQUENCE id_seq INCREMENT BY 5;

TABLE users (
   id INTEGER PRIMARY KEY DEFAULT nextval(‘id_seq‘)
);

INSERT INTO users VALUES (1); -- Id = 1
INSERT INTO users VALUES (2); -- Id = 6 

This leaves gaps which may be undesirable. Hence an increment of 1 is commonly used.

However, larger increments are useful when reserving blocks of IDs beforehand for batched allocation.

Upper & Lower Limits

The MINVALUE and MAXVALUE bounds define the valid range for the sequence:

CREATE SEQUENCE cyclic_seq
  INCREMENT BY 1
  MINVALUE 1
  MAXVALUE 5
  CYCLE; 

Hitting these limits will cause errors unless CYCLE is used to wrap the range.

Caching Sequence Numbers

Setting CACHE allocates numbers in memory cache for faster access:

CREATE SEQUENCE cache_seq CACHE 100;

Accessing the next 100 numbers will now avoid disk I/O. However cache settings higher than 1 can produce unused holes on system failures.

Caches pose a trade-off between performance and missing numbers on crashes. Higher caches work better for sequences not used as primary keys.

Functions for Sequence Values

PostgreSQL provides special functions to operate on sequences:

SELECT nextval(‘seq‘); -- Advance & return next number 
SELECT currval(‘seq‘); -- Current value  
SELECT setval(‘seq‘, 10); -- Reset
SELECT lastval(); -- Last returned number

These allow the current state and values from a sequence to be obtained.

The nextval function plays a crucial role in extracting the subsequent identifier from the sequence.

lastval() vs currval()

lastval() returns the last obtained sequence value across all sessions. In contrast, currval() is bound to only the current session.

This differentiation is vital for understanding and preventing concurrency issues when accessing sequences from pooled connections.

Typical Usage Patterns

Sequences lend themselves well to some classic use cases:

Auto-incrementing Keys

The foremost usage of sequences is generating auto-incrementing primary keys:

CREATE TABLE users (
  id INT PRIMARY KEY DEFAULT nextval(‘user_id‘),
  name TEXT
);

This removes the need to manually handle cumbersome primary key allocation.

However this does pose recoverability issues due to gaps on transaction rollbacks. Alternatives like HiLo algorithms and sequence preallocation can help.

Universally Unique Identifiers (UUIDs)

Sequences can be combined with UUID data types to produce hybrid identifiers with incremental order and global uniqueness:

CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
CREATE SEQUENCE order_uid;

SELECT uuid_in(uuid_ns_dns()::text || 
              nextval(‘order_uid‘)::text); 

This leverages UUID namespacing for uniqueness and sequences for monotonic values.

Look-ahead Allocation

We can reserve blocks of future sequence numbers for batched allocation using nextval and transaction tricks:

SELECT nextval(‘seq‘) FROM generate_series(1, 10);

BEGIN;
INSERT INTO table (id) SELECT nextval(‘seq‘) FROM generate_series(1, 1000); 
COMMIT;

This reduces round trips while allocating in batches. However gaps can arise on rollbacks.

Purpose built setval functions help prevent this issue in PostgreSQL 14 onwards.

Circular Numbering

The CYCLE option allows recycling sequence values once limits are reached:

CREATE SEQUENCE cyclic_seq
  INCREMENT BY 1
  MINVALUE 1
  MAXVALUE 5 
  CYCLE;

SELECT nextval(‘cyclic_seq‘); -- 1, 2, 3, 4, 5, 1, 2, 3...  

This builds circular sequences suitable for cases like invoice numbers, ticket numbers etc.

Sequence Performance & Optimization

Let‘s analyze some key performance factors around sequences:

1. Metadata Storage Overhead

Each sequence results in new entries across multiple system catalogs – pg_class, pg_sequence and others. Overuse of sequences can bloat the database with excess metadata.

Hence sequences should be designed keeping application patterns in mind rather than arbitrarily. Reusing sequences via configuration changes is preferable compared to proliferating sequences.

2. Transaction Overheads

Each nextval() executes as a Write-Ahead Log transaction to ensure recoverability. In addition, sequences currently acquire an ExclusiveLock to enforce orderly allocation. This introduces notable transactioncoordination overheads.

These constraints mean obtaining bulk sequence IDs in batches is considerably faster than individual nextval calls per row operation. Client-side allocation helps amortize these expenses.

3. Cache Settings

The cache size controls a major performance facet – avoiding disk writes and round trips. PostgreSQL stores unallocated cached values in memory and persists only the last returned number.

Higher caches reduce physical I/Os significantly. But risk of lost numbers increases in crashes. 50-100 seems optimal in most cases if gaps are acceptable.

4. Gaps on Rollbacks

Like cache loss, rollbacks too create holes as the sequence state has already advanced. This can be mitigated via client-side preallocation and savepoints to batch inserts. Newer set-returning variants will explicitly address this.

Overall when used judiciously, sequences impose minimal overheads and deliver optimized data access.

Sequences vs Serial Columns

SERIAL columns are a convenience wrapper that use sequences implicitly:

CREATE TABLE users (
  id SERIAL, -- Implicit sequence + default  
  name TEXT
); 

This simplicity comes at the cost of customization as the underlying sequence cannot be configured.

Hence explicit sequences are preferable for precision control and sharing across tables. Serials work best for vanilla key columns.

Sequences in Other Databases

MySQL auto-increments serve the same purpose as sequences but have stricter table coupling and lesser features like circular increments.

SQL Server lacks generic sequences but provides IDENTITY columns closely matching serial behavior and SEED/INCREMENT options similar to PostgreSQL parameters.

Oracle sequences too are highly configurable akin to PostgreSQL but omit multi-table sharing of sequence generators.

Thus PostgreSQL sequences strike the right balance of power, customization and ease of use in identifier generation.

Conclusion & Best Practices

We explored how PostgreSQL sequences enable robust generation of unique IDs with versatile controls compared to standard auto-increments.

Here are some key guidelines for optimal use of sequences:

  • Prefer sequences over serials for configurable and shareable behavior.
  • Set cache above 1 for performance while ensuring application compatibility with gaps.
  • Define MAXVALUE boundaries to plan sequence cycles and prevent errors.
  • Use client-side preallocation via caches/multi-row functions for efficiency.
  • Share sequences judiciously instead of proliferating them.
  • Use bigint typed sequences for future proof 64-bit identifiers.

By mastering these sequence capabilities, PostgreSQL developers can tackle complex numbering schemes in their applications with flair.

Sequences lend a flexible helping hand to the intricate world of managing identifiers across ever-growing databases. Their versatility makes them indispensable Swiss army knives for any PostgreSQL architect and a feature that sets PostgreSQL apart in the database landscape.

Similar Posts