The Lister Learning and Teaching Centre at the University of Edinburgh. Photo by Paul Zanre - COPYRIGHT: PAUL ZANRE PHOTOGRAPHY.
I'm thrilled to announce that the PostgreSQL Edinburgh meetup is back! 🐘
After a fantastic launch, we are gathering again on Thursday, February 12th. We'll be meeting at the University of Edinburgh's Lister Learning and Teaching Centre for another evening to talk tech, eat pizza, and get to know each other.
Whether you're a seasoned DBA, an app developer just getting started with databases, or simply curious about high availability and search integrations, this meetup is for you. All levels are welcome!
Here's the schedule for our second outing, featuring two insightful technical talks.
Follow Your Leader
Alastair Turner from Percona joins us to dive deep into high availability. He will be sharing the critical mechanics of how to ensure availability after a failover occurs—a vital topic for anyone running Postgres in production.
Postgres to Elasticsearch Syncing
Sean Hammond will be sharing a "war story" about a legendary outage. He'll walk us through the complexities and challenges encountered when syncing data between Postgres and Elasticsearch, offering lessons learned.
This event is completely free, but registration is required so we can get the numbers right for the food & refreshments!
Register here 👇
The meetup is bound by the PostgreSQL Code of Conduct. We are also looking for volunteer help (greeting attendees, logistics, etc.) for this and future meetups. If you can help, please drop me an email at vyruss000 (at) gmail.com.
Follow our community u
[...]Google Summer of Code is back for 2026! We’re celebrating the 22nd year of this incredible program that has brought countless talented developers into the open-source world. Please take a moment to review Google’s announcement and familiarize yourself with what makes this year special.
Now is the time to work on getting together a strong set of projects we’d like to have GSoC contributors work on over the summer. Like previous years, we must have an excellent Ideas list ready before the deadline for mentoring organizations. This list is the primary criterion Google uses to evaluate and accept us into the program.
The deadline for Mentoring organizations to apply is February 3, 2026 at 18:00 UTC. The list of accepted organizations will be published on February 19. Here’s the complete timeline for your planning:
GSoC 2026 Timeline:
I’ve already created our GSoC 2026 Ideas page as our central hub for project submissions. The quality of this Ideas list will define the tone of our participation this year, so let’s make it outstanding!
What Makes a Great Project Idea?
Each project on our Ideas page should include these essential elements:
Critical Guidelines to Remember:
Never link to just a bug tracker as your project description. T
[...]

© Laurenz Albe 2025
In a previous article, I recommended using bigint for sequence-generated primary keys (but I make an exception for lookup tables!). If you didn't heed that warning, you might experience integer overflow. That causes downtime and pain. So I thought it would be a good idea to show you how to monitor for the problem and how to keep the worst from happening.
There are two ways how people typically create sequence-generated primary keys. The “traditional” way is to use serial:
CREATE TABLE tab ( id serial PRIMARY KEY, ... );
That will actually create a (four-byte) integer column with a DEFAULT value:
\d tab
Table "public.tab"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------------------------------
id | integer | | not null | nextval('tab_id_seq'::regclass)
...
The smarter, SQL standard compliant way of creating an auto-incrementing column is to use an identity column:
CREATE TABLE tab ( id integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY, ... );
An identity column will also create a sequence behind the scenes. No matter which of the two techniques you choose: if you insert a lot of rows, you will eventually get the dreaded error message
ERROR: nextval: reached maximum value of sequence "tab_id_seq" (2147483647)
Note that you don't need to have a large table to encounter this problem: If you INSERT and DELETE rows frequently, you will hit the same problem.
The solution to the problem is to convert the column to bigint, which is an eight-byte integer:
ALTER TABLE tab ALTER id TYPE bigint;
That solution is simple enough, but modifying a table like that requires PostgreSQL to rewrite the entire table. The reason for rewriting the table is that the storage format of bigint is different from that of a four-byte integer. However, such a t
Starting with arrays in PostgreSQL is as simple as declaring a column as integer[], inserting some values, and you are done.
Or building the array on the fly.
SELECT '{1,2,3}'::int[];
SELECT array[1,2,3];
int4
---------
{1,2,3}
(1 row)
array
---------
{1,2,3}
(1 row)
The official documentation provides a good introduction. But beneath this straightforward interface lies a set of more complex properties than most of us realise. Arrays in PostgreSQL are not just "lists" in a field. They have their own memory management strategy, their own index logic, and a lot of edge-case scenarios.
As it goes with boringSQL deep-dives, this article will explore the corners of array functionality that might break your production.
Wait? Are we going to talk about JSONB arrays? Not at all. The whole concept of arrays in RDBMSs is actually document storage in disguise.
In database design, locality ensures faster retrieval times by keeping related data close on physical storage.Whether you use a distinct integer[] type or a JSON list [1, 2, 3], you are making the exact same architectural decision: you are prioritising locality over normalisation.
When you store tag_ids in an array, you are embedding related data directly into a row - just like a NoSQL database might embed subdocuments. This is not inherently wrong. Document databases exist for good reasons: they eliminate joins, simplify reads, and map naturally to application objects.
But PostgreSQL is a relational database. It was designed around the relational model, where:
Arrays give you document-model convenience, but you lose relational promises. There are no foreign keys and no ON DELETE referential_action (like CASCADE) for array elements. If you delete a tags entry, the orphaned ID will remain in your array forever.
The rule of thumb is tha
[...]
Happy New Year! I'm excited to announce release 0.9 of pg_statviz, the minimalist extension and utility pair for time series analysis and visualization of PostgreSQL internal statistics.
This is a significant feature release that expands the scope of analysis to include several new modules and a visualization update:
oldest_xact_age, oldest_query_age, and oldest_backend_age, helping you quickly identify long-running transactions and idle connections that may be holding resources or preventing vacuum cleanup.
pg_stat_replication and replication slot statistics from pg_stat_replication_slots.
multixact or pg_xact contention).
pg_stat_database to assist in early detection of data corruption issues.
This release also includes important maintenance and optimizations:
snapshot_conf() has been optimized to only store configuration changes rather than saving values for every snapshot. The upgrade path automatically compacts existing duplicate configuration rows to reclaim space.
argh has been pinned to <0.30 for CLI argument compatibility.
pg_statviz takes the view that everything should be light and minimal. Unlike commercial observability platforms, it doesn't require invasive agents or open connections to the database —
At first glance, the idea that an idle session could lead to a transaction wraparound might seem counterintuitive. Most PostgreSQL users are familiar with how long-running queries or open transactions can cause table bloat and wraparound risks by pinning the xmin horizon, which prevents autovacuum from reclaiming dead tuples and Transaction IDs.
An idle session causing transaction wraparound indeed involves a temporary table. The real culprit here is the temporary table. If you create a temp table and leave the session idle indefinitely, you risk exhausting your Transaction IDs.
The Root Cause: Why Autovacuum Can’t Help
Temporary tables reside in backend local memory rather than shared_buffers. Because of this isolation, autovacuum cannot access or process them. As other database traffic continues to burn through transaction IDs, the “age” of temporary table increases. Resolution in this specific case is faster, however, because xmin continues to move forward and does not obstruct the vacuum across the entire instance. You can simply perform a vacuum freeze on the temporary table, drop it entirely, or terminate the backend process to rectify the situation.
While temporary table data is invisible to the autovacuum process, autovacuum will intervene to clean up orphaned temporary tables in the pg_catalog if a backend process or the entire instance crashes before it could not clean up the temp table.
The Challenge of Mitigation
This behavior is well-documented in the PostgreSQL manuals, yet it’s an unfortunate edge case where autovacuum is aware of the rising Transaction IDs age but lacks the authority to intervene.
Final Thoughts
While autovacuum cannot be faulted here, the core issue is that DML operations on temporary tables, like those on persistent tables, utilize standard Transaction IDs (xmin, xmax). Recycling these IDs necessitates a vacuum freeze, despite the temporary table’s scope being restricted to a single session. This mechanism is also why temporary tables are incom
[...]Working in development, there is a common challenge: how to attach custom metadata to database objects without modifying Postgres's core code. In this article, I briefly demonstrate a practical solution using Postgres's SECURITY LABELS mechanism to implement custom properties that are transactional, properly linked to database objects, and work with standard Postgres operations.
I am writing this post over the weekend but scheduling it to be published on Tuesday, after the PG DATA CfP closes, because I do not want to distract anyone, including myself, from the submission process.
A couple of months ago, I created a placeholder in my GitHub, promising to publish pg_acm before the end of the year. The actual day I pushed the initial commit was January 3, but it still counts, right? At least, it happened before the first Monday of 2026!
It has been about two years since I first spoke publicly about additional options I would love to see in PostgreSQL privileges management. Now I know it was not the most brilliant idea to frame it as “what’s wrong with Postgres permissions,” and this time I am much better with naming.
pg_acm stands for “Postgres Access Control Management.” The key feature of this framework is that each schema is created with a set of predefined roles and default privileges, which makes it easy to achieve complete isolation between different projects sharing the same database, and allows authorized users to manage access to their data without having superuser privileges.
Please take a look, give it a try, and let me know what’s wrong with my framework
I do a fair amount of benchmarks as part of development, both on my own patches and while reviewing patches by others. That often requires dealing with noise, particularly for small optimizations. Here’s an overview of ways I use to filter out random variations / noise.
Most of the time it’s easy - the benefits are large and obvious. Great! But sometimes we need to care about cases when the changes are small (think less than 5%).
PostgreSQL 18 made one very important change – data block checksums are now enabled by default for new clusters at cluster initialization time. I already wrote about it in my previous article. I also mentioned that there are still many existing PostgreSQL installations without data checksums enabled, because this was the default in previous versions. In those installations, data corruption can sometimes cause mysterious errors and prevent normal operational functioning. In this post, I want to dissect common PostgreSQL data corruption modes, to show how to diagnose them, and sketch how to recover from them.
Corruption in PostgreSQL relations without data checksums surfaces as low-level errors like “invalid page in block xxx”, transaction ID errors, TOAST chunk inconsistencies, or even backend crashes. Unfortunately, some backup strategies can mask the corruption. If the cluster does not use checksums, then tools like pg_basebackup, which copy data files as they are, cannot perform any validation of data, so corrupted pages can quietly end up in a base backup. If checksums are enabled, pg_basebackup verifies them by default unless –no-verify-checksums is used. In practice, these low-level errors often become visible only when we directly access the corrupted data. Some data is rarely touched, which means corruption often surfaces only during an attempt to run pg_dump — because pg_dump must read all data.
Typical errors include:
[...]-- invalid page in a table: pg_dump: error: query failed: ERROR: invalid page in block 0 of relation base/16384/66427 pg_dump: error: query was: SELECT last_value, is_called FROM public.test_table_bytea_id_seq -- damaged system columns in a tuple: pg_dump: error: Dumping the contents of table "test_table_bytea" failed: PQgetResult() failed. pg_dump: error: Error message from server: ERROR: could not access status of transaction 3353862211 DETAIL: Could not open file "pg_xact/0C7E": No such file or directory. pg_dump: error: The command was: COPY publ
PostgreSQL has built-in support for logical replication. Unlike streaming replication, which works at the block level, logical replication replicates data changes based on replica-identities, usually primary keys, rather than exact block addresses or byte-by-byte copies.
PostgreSQL logical replication follows a publish–subscribe model. One or more subscribers can subscribe to one or more publications defined on a publisher node. Subscribers pull data changes from the publications they are subscribed to, and they can also act as publishers themselves, enabling cascading logical replication.
Logical replication has many use cases, such as;
Other than these cases, if your PostgreSQL instance is running on RHEL and you want to migrate to CNPG, streaming replication can be used. However, since CNPG images are based on Debian, the locale configuration must be compatible, and when using libc-based collations, differences in glibc versions can affect collation behavior. That is why we will practice logical replication setup to avoid these limitations.
The specifications of the source PostgreSQL instance:
cat /etc/os-release | head -5
NAME="Rocky Linux"
VERSION="10.1 (Red Quartz)"
ID="rocky"
ID_LIKE="rhel centos fedora"
VERSION_ID="10.1"
psql -c "select version();"
version
----------------------------------------------------------------------------------------------------------
PostgreSQL 18.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 14.3.1 20250617 (Red Hat 14.3.1-2), 64-bit
(1 row)
Populating some test data using pgbench and checking pgbench_accounts table:
/usr/pgsql-18/bin/pgbench -i -s 10 dropping old tables...[...]
PostgreSQL 18 has arrived with some fantastic improvements, and among them, the RETURNING clause enhancements stand out as a feature that every PostgreSQL developer and DBA should be excited about. In this blog, I'll explore these enhancements, with particular focus on the MERGE RETURNING clause enhancement, and demonstrate how they can simplify your application architecture and improve data tracking capabilities.
In this post, I describe experiments on the write-versus-read costs of PostgreSQL's temporary buffers. For the sake of accuracy, the PostgreSQL functions set is extended with tools to measure buffer flush operations. The measurements show that writes are approximately 30% slower than reads. Based on these results, the cost estimation formula for the optimiser has been proposed:
flush_cost = 1.30 × dirtied_bufs + 0.01 × allocated_bufs.
Temporary tables in PostgreSQL have always been parallel restricted. From my perspective, the reasoning is straightforward: temporary tables exist primarily to compensate for the absence of relational variables, and for performance reasons, they should remain as simple as possible. Since PostgreSQL parallel workers behave like separate backends, they don't have access to the leader process's local state, where temporary tables reside. Supporting parallel operations on temporary tables would significantly increase the complexity of this machinery.
However, we now have at least two working implementations of parallel temporary table support: Postgres Pro and Tantor. One more reason: identification of temporary tables within a UTILITY command is an essential step toward auto DDL in logical replication. So, maybe it is time to propose such a feature for PostgreSQL core.
After numerous code improvements over the years, AFAICS, only one fundamental problem remains: temporary buffer pages are local to the leader process. If these pages don't match the on-disk table state, parallel workers cannot access the data.
A comment in the code (80558c1) made by Robert Haas in 2015 clarifies the state of the art:
/*
* Currently, parallel workers can't access the leader's temporary
* tables. We could possibly relax this if we wrote all of its
* local buffers at the start of the query and made no changes
* thereafter (maybe we could allow hint bit changes), and if we
* taught the workers to read them. Writing a large number of
* temporary buffers could be [...]
Recently during a post-migration activity, we had to populate a very large table with a new UUID column (NOT NULL with a default) and backfill it for all existing rows.
Instead of doing a straight:
ALTER TABLE ... ADD COLUMN ... DEFAULT ... NOT NULL;
we chose the commonly recommended performance approach:
This approach is widely used to avoid long-running locks and table rewrites but it comes with hidden gotchas. This post is about one such gotcha: object dependencies, especially views, and how PostgreSQL tracks them internally using OIDs.
On PostgreSQL 11+, adding a column with a constant default is a metadata-only operation and does not rewrite the table. However:
uuidv7()),
NOT NULL,
So the rename approach is still valid but only when truly needed.
Picture this: You’ve got a massive table with millions of rows, and you need to add a column with unique UUID default value and not null constraint values. The naive approach? Just run ALTER TABLE ADD COLUMN. But wait for large tables, this can lock your table while PostgreSQL rewrites every single row and can incur considerable time.
So what do we do? We get clever. We use the intermediate table(we can also use unlogged table) with rename trick, below is an sample created to show the scenario’s.
drop table test1;
create table test1
(col1 integer, col2 text, col3 timestamp(0));
insert into test1
select col1, col1::text , (now() - (col1||' hour')::interval)
from generate_series(1,1000000) as col1;
create view vw_test1 as
select * from test1;
CREATE TABLE test1_new
(like test1 including all);
alteNumber of posts in the past two months
Number of posts in the past two months
Get in touch with the Planet PostgreSQL administrators at planet at postgresql.org.