PostgreSQL Table Rename and Views – An OID Story

Recently during a post-migration activity, we had to populate a very large table with a new UUID column (NOT NULL with a default) and backfill it for all existing rows.

Instead of doing a straight:

ALTER TABLE ... ADD COLUMN ... DEFAULT ... NOT NULL;

we chose the commonly recommended performance approach:

  • Create a new table (optionally UNLOGGED),
  • Copy the data,
  • Rename/swap the tables.

This approach is widely used to avoid long-running locks and table rewrites but it comes with hidden gotchas. This post is about one such gotcha: object dependencies, especially views, and how PostgreSQL tracks them internally using OIDs.

A quick but important note

On PostgreSQL 11+, adding a column with a constant default is a metadata-only operation and does not rewrite the table. However:

  • This is still relevant when the default is volatile (like uuidv7()),
  • Or when you must immediately enforce NOT NULL,
  • Or when working on older PostgreSQL versions,
  • Or when rewriting is unavoidable for other reasons.

So the rename approach is still valid but only when truly needed.

The Scenario: A Common Performance Optimization

Picture this: You’ve got a massive table with millions of rows, and you need to add a column with unique UUID default value and not null constraint values. The naive approach? Just run ALTER TABLE ADD COLUMN. But wait for large tables, this can lock your table while PostgreSQL rewrites every single row and can incur considerable time.

So what do we do? We get clever. We use the intermediate table(we can also use unlogged table) with rename trick, below is an sample created to show the scenario’s.

drop table test1;
create table test1
(col1 integer, col2 text, col3 timestamp(0));

insert into test1 
select col1, col1::text , (now() - (col1||' hour')::interval) 
from generate_series(1,1000000) as col1;

create view vw_test1 as 
select * from test1;


CREATE TABLE test1_new 
(like test1 including all);
alter table test1_new 
add column col_uuid uuid default uuidv7() not null;
insert into test1_new 
select * , uuidv7() from test1;

BEGIN;
ALTER TABLE test1 RENAME TO test1_old;
ALTER TABLE test1_new RENAME TO test1;
COMMIT;

Looks perfect, right? Fast, efficient, minimal downtime.
But do take a note on what happen to when we want to drop old table.

demo=# drop table test1_old;
ERROR:  cannot drop table test1_old because other objects depend on it
DETAIL:  view vw_test1 depends on table test1_old
HINT:  Use DROP ... CASCADE to drop the dependent objects too.

The Silent Killer: Views Still Point to the Old Table

Even though test1 was renamed, the view still references test1_old:

demo=# \dv vw_test1
             List of views
┌────────┬──────────┬──────┬──────────┐
│ Schema │   Name   │ Type │  Owner   │
├────────┼──────────┼──────┼──────────┤
│ public │ vw_test1 │ view │ postgres │
└────────┴──────────┴──────┴──────────┘
(1 row)

demo=# \sv vw_test1
CREATE OR REPLACE VIEW public.vw_test1 AS
 SELECT col1,
    col2,
    col3
   FROM test1_old

Why?

Understanding PostgreSQL’s OID and its usage in dependency.

PostgreSQL does not track dependencies by object names. It uses internal object identifiers (OIDs).

When a view is created:

  • The SQL is parsed,
  • * is expanded,
  • Table and column references are stored by OID and attribute number in pg_rewrite.

Renaming a table only changes the human-readable name in pg_class. The OID remains the same.

But when you swap tables, you are not renaming — you are introducing a new object with a new OID. The view still points to the old one.

-- Check what your view is actually referencing
SELECT DISTINCT
    source_ns.nspname AS source_schema,
    source_table.relname AS source_table,
    source_table.relkind AS object_type
FROM pg_depend d
JOIN pg_rewrite r ON r.oid = d.objid
JOIN pg_class view_class ON view_class.oid = r.ev_class
JOIN pg_class source_table ON source_table.oid = d.refobjid
JOIN pg_namespace source_ns ON source_ns.oid = source_table.relnamespace
WHERE view_class.relname = 'vw_test1'
    AND d.deptype = 'n'  -- normal dependency
    AND source_table.relkind IN ('r', 'v', 'm', 'f')  -- tables, views, mat views, foreign tables
ORDER BY source_schema, source_table;

And It’s Not Just Views, it’s just the visible symptom. This approach also affects:

  • Grants and privileges
  • RLS policies
  • Triggers
  • Foreign keys
  • Logical replication publications
  • Statistics objects
  • Atomic Functions that reference the table

All of these are bound to the old table’s OID.

Ready to enhance your PostgreSQL development skills? My course on PL/pgSQL will help you excel as Database Developer. Click here to start your journey!!


The Right Way to Handle It

So how do we fix this? We have a few options:

Option 1: Recreate the Views (Safest)

-- After the rename, drop and recreate all dependent views
demo=# drop view vw_test1;create view vw_test1 as select * from test1;
DROP VIEW
CREATE VIEW

Option 2: Avoid table swapping if you can, Batch updates are often safer:

Sharing an sample just for reference.

-- Add column without default (fast)
ALTER TABLE users ADD COLUMN status TEXT;

-- Update in batches (no long lock)
DO $$
DECLARE
    batch_size INT := 10000;
    last_id BIGINT := 0;
BEGIN
    LOOP
        UPDATE users 
        SET status = 'active'
        WHERE id > last_id 
        AND status IS NULL
        AND id <= last_id + batch_size;
        
        EXIT WHEN NOT FOUND;
        last_id := last_id + batch_size;
        COMMIT;
    END LOOP;
END $$;

-- Then add the default for future rows
ALTER TABLE users ALTER COLUMN status SET DEFAULT 'active';

Always Check Dependencies First

SELECT 
    dependent_view.relname as view_name
FROM pg_depend 
JOIN pg_rewrite ON pg_depend.objid = pg_rewrite.oid 
JOIN pg_class as dependent_view ON pg_rewrite.ev_class = dependent_view.oid 
JOIN pg_class as source_table ON pg_depend.refobjid = source_table.oid 
WHERE source_table.relname = '<<table_name>>'
AND dependent_view.relkind = 'v';

If you see views, you know you’ll need to recreate them.

The Takeaway

PostgreSQL’s OID-based dependency system is elegant and efficient — but it also means:

Names are for humans. OIDs are what PostgreSQL actually uses.

The rename/swap trick can be a powerful optimization, but it trades execution-time safety for operational complexity. If you use it:

  • Audit dependencies,
  • Script object recreation,
  • Test in staging,
  • And document everything.

Otherwise, the optimization meant to avoid a lock may quietly introduce correctness, maintenance, and operational risks that cost far more later.

Posted in Oracle to PG migration, postgresql | Tagged , , , , | 1 Comment

PostgreSQL Partition Pruning: The Role of Function Volatility

In one of our earlier blogs, we explored how improper volatility settings in PL/pgSQL functions — namely using IMMUTABLE, STABLE, or VOLATILE — can lead to unexpected behavior and performance issues during migrations.

Today, let’s revisit that topic from a slightly different lens. This time, we’re not talking about your user-defined functions, but the ones PostgreSQL itself provides — and how their volatility can quietly shape your query performance, especially when you’re using partitioned tables.

⚙️ When Partitioning Doesn’t Perform

When designing partitioning strategies, performance is usually the first thing that comes to mind.
But what if despite all your tuning the queries still fail to leverage partition pruning effectively?

The answer sometimes lies not in your schema or indexes, but in how your filters are evaluated.

Every function you use in a filter has its own volatility behavior that determines how it’s planned and executed. You might pick the right filter logically — but if the function is volatile, it can completely change how PostgreSQL builds and optimizes the plan.

Let’s say you’ve designed a table as daily partitions and use Date Functions as partitions filters and check its underlying executions plan.

CREATE TABLE IF NOT EXISTS events_daily (
    id bigint generated always as identity ,
    created_at timestamptz NOT NULL,
    created_date date NOT NULL,      
    user_id int NOT NULL,
    amount numeric(10,2) NOT NULL,
    status text,
    payload text
) PARTITION BY RANGE (created_date);

--create a year of partitions as daily
DO
$$
DECLARE
    start_date date := current_date - INTERVAL '364 days';
    d date;
    part_name text;
BEGIN
    d := start_date;
    WHILE d <= current_date LOOP
        part_name := format('events_p_%s', to_char(d, 'YYYYMMDD'));
        EXECUTE format($sql$
            CREATE TABLE IF NOT EXISTS %I PARTITION OF events_daily
            FOR VALUES FROM (%L) TO (%L);
        $sql$, part_name, d::text, (d + INTERVAL '1 day')::date::text);
        d := d + INTERVAL '1 day';
    END LOOP;
    EXECUTE 'CREATE TABLE IF NOT EXISTS events_default PARTITION OF events_daily DEFAULT';
END;
$$;

--Populate Partitions
WITH params AS (
  SELECT
    (current_date - INTERVAL '364 days')::date AS start_date,
    365 AS ndays
)
INSERT INTO events_daily (created_at, created_date, user_id, amount, status, payload)
SELECT
  (params.start_date + ((s.i - 1) % params.ndays) * INTERVAL '1 day')
    + ( (random() * 86399)::int * INTERVAL '1 second' ) AS created_at,
  (params.start_date + ((s.i - 1) % params.ndays) * INTERVAL '1 day')::date AS created_date,
  (1 + floor(random() * 50000))::int AS user_id,
  round(random() * 10000::numeric) AS amount,
  (array['new','processing','done','failed'])[1 + (floor(random() * 4))::int]::text AS status,
  md5((random()::text || clock_timestamp()::text || s.i::text)) AS payload
FROM
  generate_series(1, 1000000) AS s(i),params
;

analyze events_daily;

When working with partitioned tables, always ensure filters are applied on the partition key.Otherwise, PostgreSQL will scan every partition — even if indexes exist.

In most date-based partitioning strategies, filters are based on the system date. Let’s assume we’re querying data for the last two days.

Sharing some of the samples across data functions to get last 3 days of partitions data.

--Filtering using clock_timestamp(), now(), and statement_timestamp()
select count(1) from events_daily 
where 
created_date between clock_timestamp() - interval '2 days' 
and clock_timestamp();

select count(1) from events_daily 
where created_date between now() - interval '2 days' 
and now();

select count(1) from events_daily where 
created_date between statement_timestamp() - interval '2 days' 
and statement_timestamp();

🔍 Same Logic, Different Performance

At first glance, all these queries look functionally identical — they fetch data for the last few days.
But there’s one subtle, yet crucial difference: the volatility of the functions used in the filter against the partition column.

Below is the volatility characteristics of the date functions used in each queries.

From PostgreSQL Docs on Volatile functions.

The optimizer makes no assumptions about the behavior of such functions. A query using a volatile function will re-evaluate the function at every row where its value is needed.

⚙️ Why Volatility Matters for Partition Pruning

Partition pruning depends on whether PostgreSQL can evaluate the filter expression at plan time.
If it can, only relevant partitions are scanned.
If not, PostgreSQL must assume any partition could match — and ends up scanning all of them.

  • With statement_timestamp() or now() (both STABLE), PostgreSQL can safely compute their values during planning. It knows exactly which partitions fall within range — so it prunes the rest efficiently.

  • But with clock_timestamp() (VOLATILE), the optimizer can’t predict its value in advance. So PostgreSQL plays it safe — scanning every partition because pruning can’t be guaranteed.

Putting It All Together: Volatility and Query Planning

Relying on volatile functions in partition filters doesn’t break correctness — but it quietly breaks performance.
What should have been a quick lookup across a few recent partitions can turn into a full partition scan, wasting CPU and I/O.

The fix isn’t complex:
✅ Prefer STABLE functions like now() and statement_timestamp() when filtering partitions
❌ Avoid VOLATILE ones like clock_timestamp() in partition predicates

In performance-sensitive environments, understanding function volatility isn’t optional — it’s a key part of writing partition-friendly SQL.

Posted in Oracle to PG migration, postgresql | Tagged , , , , , , , , , , | 4 Comments

Exploring PostgreSQL 18: A Developer’s Guide to New Features – Part 1: The COPY Command Gets More User-Friendly

PostgreSQL 18 was released on September 25, 2024, introducing a host of exciting new features across all major areas. In this blog series, we’ll delve into these features and explore how they can benefit database developers and migration engineers transitioning to PostgreSQL.

Part 1 of the series focuses on enhancements to the COPY command, particularly for use cases involving loading external files into the database.

PostgreSQL 18 – Enhancements with the COPY Command

The COPY command is the default native option for loading flat files, primarily in text or CSV format, into PostgreSQL. If you are a data engineer or database developer looking for ways to load flat files into PostgreSQL, you must take care of quality checks and encapsulate a mechanism to report or discard failed records.

Until PostgreSQL 16, if quality checks were planned to be done on the database side, one option was to load flat files into staging tables with all columns as text or a generic data type and later move legitimate and discarded data to the concerned tables.With PostgreSQL 17 onward, the COPY command provides options for handling data type incompatibilities and logging failed rows or records using on_error options.

In PostgreSQL 18 these features is further enhances with REJECT_LIMIT options. If overall discarded or error row data is beyond the reject limit the copy command is discarded as whole.

PostgreSQL 18 – New REJECT_LIMIT option.

Let’s start by creating sample target tables and a CSV file with intentional data type mismatch issues. The sample table also has check and NOT NULL constraints defined on it.

CREATE TABLE copy18(
    col1 integer,
    col2 text DEFAULT 'x'::text,
    col3 text NOT NULL,
    col4 text,
    col5 boolean,
    col6 date,
    col7 timestamp without time zone
    CONSTRAINT copy18_col4_check CHECK (length(col4) > 2)
);

The CSV file contains data type incompatibilities in the following lines:

  • Timestamp (4th line)
  • Integer (5th line)
  • Boolean type (7th line)
  • Date (8th line)

Here is the content of the test.csv file:

--test.csv file.
1,a,a,aaaa,yes,20240101,2024-01-01 12:00:00
2,a,a,aaaa,no,20240102,2024-01-01 12:00:00
3,a,a,aaaa,Y,20240103,2024-01-01 12:00:00
4,a,a,aaaa,N,20240104,2024-01-01 12:00:61
a,a,a,aaaa,1,20240103,20240101010101
999999999999999999999999,1,1,1,0,20241201,20240101010101
5,,a,aaaa,adhoc,20241201,2024-01-01 12:00:51
6,,a,aaaa,false,20241301,2024-01-01 12:00:51

In our sample, we will be using the client-side copy command available as part of the psql command line. It is one of the preferred options when working with managed PostgreSQL on cloud platforms like RDS or Cloud SQL. If you are wondering about the difference between COPY and copy, do check out here.

Controlling Data Quality During Loads with Reject Limits in PostgreSQL 18

If we try to load the same CSV into PostgreSQL 17 with on_error options, it will processed data and discard all error without any limit on overall no.of errors.

postgres=# \copy copy18 from 'test.csv' (on_error ignore, format csv, log_verbosity verbose);
NOTICE: skipping row due to data type incompatibility at line 4 for column "col7": "2024-01-01 12:00:61"
NOTICE: skipping row due to data type incompatibility at line 5 for column "col1": "a"
NOTICE: skipping row due to data type incompatibility at line 6 for column "col1": "999999999999999999999999"
NOTICE: skipping row due to data type incompatibility at line 7 for column "col5": "adhoc"
NOTICE: skipping row due to data type incompatibility at line 8 for column "col6": "20241301"
NOTICE: 5 rows were skipped due to data type incompatibility
COPY 3
postgres=# table copy18;
┌──────┬──────┬──────┬──────┬──────┬────────────┬─────────────────────┐
│ col1 │ col2 │ col3 │ col4 │ col5 │ col6 │ col7 │
├──────┼──────┼──────┼──────┼──────┼────────────┼─────────────────────┤
│ 1 │ a │ a │ aaaa │ t │ 2024-01-01 │ 2024-01-01 12:00:00 │
│ 2 │ a │ a │ aaaa │ f │ 2024-01-02 │ 2024-01-01 12:00:00 │
│ 3 │ a │ a │ aaaa │ t │ 2024-01-03 │ 2024-01-01 12:00:00 │
└──────┴──────┴──────┴──────┴──────┴────────────┴─────────────────────┘
(3 rows)

If we need to control error’s rows and accept it only when it is within a limit, REJECT_LIMIT helps to achieve it. it’s applicable with on_error option.

postgres=# \copy copy18 from 'test.csv' (on_error ignore, reject_limit 3, format csv, log_verbosity verbose);
NOTICE: skipping row due to data type incompatibility at line 4 for column "col7": "2024-01-01 12:00:61"
NOTICE: skipping row due to data type incompatibility at line 5 for column "col1": "a"
NOTICE: skipping row due to data type incompatibility at line 6 for column "col1": "999999999999999999999999"
NOTICE: skipping row due to data type incompatibility at line 7 for column "col5": "adhoc"
ERROR: skipped more than REJECT_LIMIT (3) rows due to data type incompatibility
CONTEXT: COPY copy18, line 7, column col5: "adhoc"

With REJECT_LIMIT set to 3, for 4th error it failed the COPY command.

Additional LOG_VERBOSITY silent option

LOG_VERBOSITY also provides an additional option silent to limits information printed or logged.

postgres=# \copy copy18 from 'test.csv' (on_error ignore, reject_limit 5, format csv, log_verbosity silent);
COPY 3

Conclusion

Database developer working on PostgreSQL 18 can take benefit of newly introduced features on COPY command,:

  • Reject data load based on data type incompatibilities on ON_ERROR options based on REJECT_LIMIT
  • Improved logging with LOG_VERBOSITY with silent options.

These features streamline data loading and ensure higher data quality.

Posted in postgresql | Tagged , , , , , , , , | Leave a comment

PostgreSQL Case-Insensitive Search: Handling LIKE with Nondeterministic Collations

Case-insensitive search is one of the most common issues I encounter when helping customers migrate from Oracle or SQL Server to PostgreSQL. Unlike Oracle (with NLS_SORT) or SQL Server (with case-insensitive collations), PostgreSQL does not natively support case-insensitive search.

Developers typically rely on workarounds such as:

  • Wrapping comparisons with LOWER() or UPPER()
  • leverage citext data type (not recommended)
  • Leveraging ICU-based nondeterministic collations

Why citext is not recommended.
Reference – https://www.postgresql.org/docs/current/citext.html

In this post, I’ll focus on nondeterministic collations and their behavior with LIKE or ILIKE operators.

Understanding Nondeterministic Collations

In PostgreSQL, a nondeterministic collation lets text comparisons ignore case/accents, enabling flexible searches like case-insensitive equality.

creating sample collation to support case insensitive search.

CREATE COLLATION collate_ci (provider = icu, deterministic = false, locale = 'en-US-u-ks-level2');

Let’s create a sample table with columns using nondeterministic collation:

create table testcollation(col1 text collate collate_ci); 
insert into testcollation values ('a');
insert into testcollation values ('A');

select * from testcollation where col1 = 'a';
 col1 
------
 a
 A
(2 rows)

Using nondeterministic collations helps us implement case-insensitive comparison with equality operators. However, let’s examine how this works with LIKE or ILIKE operators for wildcard-based searches.


The Problem: Wildcard Operator Limitations

While nondeterministic collations enable case-insensitive searches, they fail when used with wildcard operators (LIKE or ILIKE) because this functionality is not supported in PostgreSQL versions prior to 18:
postgres=> select * from testcollation where col1 like 'a%';
ERROR:  nondeterministic collations are not supported for LIKE

PostgreSQL 17 does not support LIKE/ILIKE with nondeterministic collations.
Let’s explore two workarounds.

Solution Approaches

We will examine two alternative approaches to mitigate the limitation of nondeterministic collations when using wildcard operators.

Option 1 :- Using buildin “C” Collation with ilike

PostgreSQL offers multiple built-in collations for different use cases. You can view available collations using the following psql command:

postgres=# \dOS+

List of collations
┌────────────┬────────────────────────┬──────────┬─────────────┬─────────────┐
│   Schema   │  Name │ Provider │  Collate  │ Ctype  │  Locale │ ICU Rules │ Deterministic? │
├────────────┼────────────────────────┼──────────┼───────────┤
│ pg_catalog │ C   │ libc     │ C     │ C   │    │    │ yes            │
│ pg_catalog │ POSIX  │ libc  │ POSIX │ POSIX    │   │   │ yes            │
........


In PostgreSQL, the C collation is one of the built-in collations that comes from the underlying C library (libc).

The Workaround: Override the collation with “C” and use it in SQL statements that require the LIKE operator. Such occurrences can be identified in views or procedural code and fixed proactively as part of the migration process.

To support case-insensitive searches, map the LIKE operator to ILIKE while applying the “C” collation:

postgres=# select * from testcollation where col1 collate "C" ilike 'a%';
┌──────┐
│ col1 │
├──────┤
│ a    │
│ A    │
└──────┘
(2 rows)


Option 2: Using Custom Operators with Internal “C” Collation

PostgreSQL’s extensibility allows us to create custom operators. By defining our own, we can extend the behavior of LIKE and ILIKE to overcome this limitation.

Understanding Internal Operators: When we use the LIKE operator, it is internally transformed to ~~ as shown in the execution plan. Similarly, the ILIKE operator uses ~~* internally.

Similarly, we can identify operators for ILIKE operators as well, i..e ~~*

Creating Custom Functions: To create a custom operator in PostgreSQL, we first need supporting functions. PostgreSQL provides built-in functions that we can leverage to implement our own comparison logic.

Sample function samples using texticlike to exhibit case insensitive search.

To avoid code changes while preserving case-insensitive functionality, we can create a custom schema with operators that override ~~, the internal operator used by LIKE:

Sample code on creating custom operator, for more information check out here.

create schema customoperator;

CREATE OR REPLACE FUNCTION customoperator.custom_texticlike(text, text) RETURNS boolean
LANGUAGE sql IMMUTABLE STRICT
AS $_$SELECT pg_catalog.texticlike($1 collate "C",$2 collate "C");$_$;

CREATE OR REPLACE FUNCTION customoperator.custom_texticnotlike(text, text) RETURNS boolean
LANGUAGE sql IMMUTABLE STRICT
AS $_$SELECT pg_catalog.texticnlike($1 collate "C",$2 collate "C");$_$;

CREATE OPERATOR customoperator.~~ (
FUNCTION = customoperator.custom_texticlike,
LEFTARG = text,
RIGHTARG = text,
COMMUTATOR = OPERATOR(customoperator.~~),
NEGATOR = OPERATOR(customoperator.!~~)
);

CREATE OPERATOR customoperator.!~~ (
FUNCTION = customoperator.custom_texticnotlike,
LEFTARG = text,
RIGHTARG = text,
COMMUTATOR = OPERATOR(customoperator.!~~),
NEGATOR = OPERATOR(customoperator.~~)
);

Important Consideration:
Search Path Precedence When overriding built-in operators, we must consider the precedence of pg_catalog within the search_path. The search_path tells PostgreSQL which schemas to search when you reference an object without a schema name. By default, the system schema pg_catalog—which contains all built-in functions and operators—is always included.

To make our custom operators take priority over the built-in ones, we must place our custom schema before pg_catalog in the search_path:

Setting custom search_path with customoperator schema precede pg_catalog.

postgres=# set search_path to customoperator,pg_catalog, "$user", public;

Once the search_path is updated, any LIKE operator in SQL will automatically use the custom operator and its function. In effect, every LIKE clause is transparently handled as a case-insensitive match with “C” collation:

postgres=# explain select * from testcollation where col1 like 'a%';
┌──────────────────────────────────────────────────────────────────┐
│                            QUERY PLAN                            │
├──────────────────────────────────────────────────────────────────┤
│ Seq Scan on testcollation  (cost=0.00..27.00 rows=1360 width=32) │
│   Filter: texticlike((col1)::text, 'a%'::text COLLATE "C")       │
└──────────────────────────────────────────────────────────────────┘
(2 rows)

Comparison of Solutions:-

Nondeterministic collations fixesPro’sCon’s
Using buildin “C” Collation with ilikeUse inbuild “C” collation for all like comparisons.

Changes are limited to each occurrence.
Require manual code changes 
Using Custom operators that internally use collate “C”Requires no manual code changes.

App code can function without any changes.
Change default PostgreSQL behaviour(overriding pg_catalog search_path)

Impact all sessions of PostgreSQL Database.

Final Recommendations

  • Use Option 1 (Collate “C” + ILIKE) for production environments — it’s safer and more explicit Consider
  • Option 2 (custom operators) only if code changes are impractical — due to the global impact on search_path
  • Plan to re-test with PostgreSQL 18 once it becomes generally available — for native support

Note: Overwriting search_path to change pg_catalog precedence can have minor impacts on all sessions and is not generally recommended as best practice.

Looking Ahead: PostgreSQL 18 Enhancements

PostgreSQL 18 introduces significant enhancements related to collations, particularly around nondeterministic collations:

  • New CASEFOLD collation feature to further assist with case-insensitive matches
  • Native support for LIKE comparisons on text columns that use nondeterministic (case-insensitive) ICU collations
  • Simplified pattern matching on case-insensitive text, making it more consistent across operations

These improvements will make pattern matching on case-insensitive text much simpler and eliminate the need for the workarounds described in this post.

Conclusion

While PostgreSQL’s current limitations with nondeterministic collations and LIKE operators present challenges for case-insensitive searches, both workarounds provide viable solutions. Choose the approach that best fits your migration strategy and maintenance requirements, keeping in mind the upcoming native support in PostgreSQL 18.


Reference 

https://aws.amazon.com/blogs/database/manage-case-insensitive-data-in-postgresql/(PostgreSQL 15)

Posted in Oracle to PG migration, postgresql | Tagged , , , , , | 1 Comment

PostgreSQL 18 Beta Preview – Export or Amend Statistics with Ease

PostgreSQL 18 beta has been released, and it’s the perfect time to start exploring the new features we can expect in the General Availability (GA) release. One feature that particularly caught my attention relates to improvements in statistics export and amendment.

Here’s an excerpt from the official PostgreSQL release notes:

Add functions to modify per-relation and per-column optimizer statistics (Corey Huinker) 
Add pg_dump, pg_dumpall, and pg_restore options --statistics-only, --no-statistics, --no-data, and --no-schema (Corey Huinker, Jeff Davis)

One of the key ingredients in performance analysis is understanding the underlying statistics of tables, columns, and indexes. Often, we encounter cases where queries behave differently in production compared to Pre-Prod or UAT environments. A common reason for this discrepancy is the difference in data distribution and statistics. How many times have we wished for a way to replicate production-like statistics in lower environments—without the overhead of copying data and running manual ANALYZE?

With PostgreSQL 18, we can look forward to features that allow us to export, import, or even modify statistics—making it easier to simulate production behavior in non-production environments without needing actual data loads.

Dump Statistics.
pg_dump --statistics-only --table=public.skewed_data_int
Reset Table, Index or Materialize View Stats
pg_restore_relation_stats
Reset Attributes level statistics
pg_restore_attribute_stats

The following is a sample output from pg_dump that includes statistics. This can also be used within the same instance—or in another environment—to overwrite existing statistics and influence execution plans.

--table or materialize views
SELECT * FROM pg_catalog.pg_restore_relation_stats(
	'version', '180000'::integer,
	'schemaname', 'public',
	'relname', 'skewed_data_int',
	'relpages', '84'::integer,
	'reltuples', '10000'::real,
	'relallvisible', '84'::integer,
	'relallfrozen', '0'::integer
);
--columns or attributes
SELECT * FROM pg_catalog.pg_restore_attribute_stats(
	'version', '180000'::integer,
	'schemaname', 'public',
	'relname', 'skewed_data_int',
	'attname', 'categorical_col',
	'inherited', 'f'::boolean,
	'null_frac', '0'::real,
	'avg_width', '4'::integer,
	'n_distinct', '4'::real,
	'most_common_vals', '{1,2,3,4}'::text,
	'most_common_freqs', '{0.5999,0.3154,0.0809,0.0038}'::real[],
	'correlation', '0.4595874'::real
);
--Index
SELECT * FROM pg_catalog.pg_restore_relation_stats(
	'version', '180000'::integer,
	'schemaname', 'public',
	'relname', 'skewed_data_int_pkey',
	'relpages', '30'::integer,
	'reltuples', '10000'::real,
	'relallvisible', '0'::integer,
	'relallfrozen', '0'::integer
);

Conclusion

Statistics are at the heart of PostgreSQL’s query planner, and having accurate or production-like stats can significantly impact performance tuning and execution plan accuracy. With PostgreSQL 18, the ability to export, import, and tweak statistics opens up powerful new possibilities for DBAs and developers—especially in testing and pre-production environments.

This feature reduces the guesswork in performance investigations and offers a safer, data-free way to simulate real-world workloads. If you’re serious about performance, this is a feature worth exploring in the PostgreSQL 18 Beta.

Posted in postgresql | Tagged , , , , , | 4 Comments

Understanding Volatility in PL/pgSQL Functions: A Real-World Lesson

The PL/pgSQL language, available as a default extension in PostgreSQL, provides powerful tools and flexibility for application developers to build complex, enterprise-scale functionality within the database. Through PL/pgSQL’s functions and procedures, developers can choose different volatility categories—IMMUTABLE, STABLE, or VOLATILE—that offer varying performance benefits, especially in terms of result caching and data state awareness. For a deeper dive, refer to the official documentation. Choosing the appropriate volatility for a function is crucial; as they say, “With great power comes great responsibility.

During one of my migration projects, we ran into unexpected behavior while debugging code converted from Oracle. Below is a recreated example using mock functions to illustrate the issue:

truncate table user_status;
create table user_status(col1 bigint GENERATED BY DEFAULT AS IDENTITY , status text);

create or replace function upd_status(text)
returns bigint language plpgsql 
as $$
declare var1 bigint;
begin
with alias1 as (insert into user_status(status) values($1) returning col1)
        select * into var1 from alias1;
return var1;
end;$$;

create or replace function lookup_status(text)
returns boolean language plpgsql 
immutable parallel safe
as $$
begin return (select exists (select 1 from user_status where status = $1)); end;$$;

create or replace function check_status(text)
returns void language plpgsql 
as $$ 
declare var1 bigint;
begin
var1 := upd_status($1);
if lookup_status($1) then 
    raise notice 'Status exists as part of immutable check - %', $1;
else 
    raise notice 'Status does not exists - %', $1;
end if;
raise notice 'Actual Result for status % is %', $1,(select exists (select 1 from user_status where status = $1 and col1 = var1));
end;$$;

If you observe the output, the Open status is inserted into user_status via the check_status call, but any call to lookup_status within the same snapshot doesn’t detect this data change.

It took some time to diagnose the root cause: lookup_status was incorrectly labeled as IMMUTABLE by conversion engineer, causing it to return stale results by not capturing changes in data state within the same snapshot.

Note: PostgreSQL does not issue warnings when an **IMMUTABLE** function executes a SELECT statement on a table. Nonetheless, given that IMMUTABLE functions are predicated on the assumption that the data remains unchanged, any modifications to the table may result in inconsistent outcomes due to the enforced caching of query plans.
Ready to enhance your PostgreSQL development skills? My course on PL/pgSQL will help you excel as Database Developer. Click here to start your journey, and use code DBGRP30 to avail 30% off!

Function Volatility – Decision Matrix

Volatility selection provides performance benefits but can also lead to data consistency issues if not carefully considered, especially when migrating code from Oracle or SQL Server. Below is a guide on choosing the appropriate volatility for different use cases:

Below is a guide on choosing the appropriate volatility for different use cases:

Use CaseVolatility CategoryReason
Functions with side effects (e.g., updates)VOLATILEEnsures functions are re-evaluated each time.
Functions with changing values per callVOLATILERequired if values can change within a single query (e.g., random(), currval(), timeofday()).
Functions dependent on transaction contextSTABLESTABLE functions (e.g., current_timestamp) remain consistent within a transaction.
Pure functions with no external dependenciesIMMUTABLESuitable if function output never changes (e.g., mathematical calculations with no data access or look-ups).
Functions that select data from tablesSTABLEProvides a fixed view within a single query snapshot; IMMUTABLE would cause stale results on data updates.

After updating the volatility of lookup_status to STABLE, it worked as expected, correctly reflecting state changes as part of the caller’s snapshot:

create or replace function lookup_status(text)
returns boolean language plpgsql 
stable parallel safe
as $$
begin return (select exists (select 1 from user_status where status = $1)); end;$$;

By carefully setting the appropriate volatility, we can achieve reliable functionality and performance—especially important when working with procedural code across databases.

Conclusion

Setting the right volatility level in PostgreSQL functions can significantly impact performance and data consistency. When developing or migrating code, always assess whether your functions should be marked IMMUTABLE, STABLE, or VOLATILE to avoid unexpected results.

  1. Use VOLATILE for functions with side effects or changing values.
  2. Choose STABLE when consistent results are needed within transactions.
  3. Limit IMMUTABLE to functions with no external dependencies.

Being mindful of volatility settings will help keep your application’s data accurate and efficient.

Posted in postgresql | Tagged , , , , , , , , , | 2 Comments

PL/pgSQL Secrets: How Conditional Expressions Are Parsed and Evaluated Under the Hood.

Recently, in the Official PostgreSQL Slack, a user posted a PL/pgSQL block (shown below) and was surprised that it did not generate a syntax error:

DO $$
DECLARE i INT;
BEGIN
  i = 0;
  IF i = 0 AND THEN 
    RAISE NOTICE 'i = 0';
  END IF;
END; $$;

At first glance, this code block seems incomplete. Notice the IF condition: it appears to be missing an additional condition after the AND operator. Logically, this should cause an exception due to the incomplete condition following AND.

  IF i = 0 AND THEN 

However, during PL/pgSQL execution, the condition is evaluated without any syntax errors or warnings. This raises a critical question:

How does PostgreSQL internally process this condition?
What allows this seemingly incomplete expression to work?

While reviewing such Pl/pgSQL it looks incomplete and assuming that it should fail will be surprising.

In this blog, we’ll dive into the internals of PL/pgSQL to understand how this conditional statement is processed and why PostgreSQL does not flag it as erroneous.

Ready to enhance your PostgreSQL development skills? My course on PL/pgSQL will help you master database development. Click here to start your journey, and use code DBGRP30 to avail 30% off!
This blog covers the topic both in text and through an accompanying video for a more in-depth look. Scroll down to watch the video or read on for the full written guide.
DatabaseGyaan – PL/pgSQL Conditional Expression Processing Internal.

Uncover PL\pgSQL Internal Code

One of the greatest advantages of open-source software is the ability to examine the code base directly. This gives us a foundation for understanding how things work or, in some cases, why they don’t break as expected.

Our investigation begins with the PLPGSQL_STMT_IF structure, tracing through the call stack in the pl_exec.c file.

Double click to enlarge

PL\pgSQL Internal Code

By exploring the code, we find that IF statements and their conditions are evaluated using exec_run_select, which effectively executes a SELECT statement that returns a boolean result.

PL/pgSQL conditions as SELECT clause

Looking back at the original example, the condition i = 0 AND is processed within a SELECT clause. Here, AND essentially acts as a placeholder, allowing the PL/pgSQL engine to evaluate the condition without triggering syntax errors.

This insight — that conditional expressions are evaluated as SELECT statements — opens up new possibilities. It means we can leverage various functions within conditions in PL/pgSQL, as demonstrated in the examples below.

The following snippets illustrate different ways conditional expressions can be evaluated in PL/pgSQL:

ConditionsSQLPL/pgSQL
Count evaluationselect count(1) = 1;do $$
begin
if count(1) = 1 and then
raise notice ‘%’,’Matched If Clause’;
else raise notice ‘Not Match If Clase’;
end if;
end;$$;
Case-insensitive matchselect (‘a’ ilike ‘A’);do $$
begin
if (‘a’ ilike ‘A’) and then
raise notice ‘%’,’Matched If Clause’;
else raise notice ‘Not Match If Clase’;
end if;
end;$$;
Complex unnest – Array based Condition.select COUNT(col1) filter(where col1 = ‘A’) = 2 from (select unnest(ARRAY[‘A’,’B’,’A’]) col1)do $$
begin if COUNT(col1) filter(where col1 = ‘A’) = 2 from (select unnest(ARRAY[‘A’,’B’,’A’]) col1) then
raise notice ‘%’, ‘Matched If Clause’;
else raise notice ‘Not Match If Clase’;
end if;
end;$$;
Row Existence checkselect exists (select 1 from generate_series(1,10000));do $$
begin
if exists (select 1
from generate_series(1, 10000)) then

raise notice ‘%’, ‘Matched If Clause’;
else raise notice ‘Not Match If Clase’;
end if;
end;$$;

Conclusion

PL/pgSQL processes conditional expressions using SELECT statements, allowing incomplete conditions like IF i = 0 AND to execute without errors.

Instead of failing, the AND is treated as part of a SELECT expression, letting PostgreSQL evaluate it flexibly. This approach lets developers incorporate various SELECT expressions directly into conditional evaluations, offering additional ways to build and test complex logic seamlessly in PL/pgSQL.

Posted in postgresql | Tagged , , , , , , | 1 Comment

How to Connect to PostgreSQL Database Through AWS SSM Using pgAdmin or DBeaver.

Based on my extensive experience working with customers across domains and regions, especially in cloud databases, one of the most time-consuming processes is establishing a secure connection to a managed database instance. I fully understand the steps involved and why it takes time—ensuring everything is done correctly, with all security concerns addressed, and adhering to organizational governance like IAM policies is critical. Most cloud databases are set up with private endpoints, requiring additional network configurations, such as a VPN or an extra hop through a bastion host or jump server.

One of the most common methods to connect to your PostgreSQL managed instance is via a jump server or bastion host. In some cases, we might have the necessary identity file or password for SSH tunneling. However, in many situations, AWS System Manager Session Manager(AWS SSM) is used to establish the connection to the managed instance on the cloud.

In this blog, we will walk through how to enable connections to a PostgreSQL managed instance using our prefer UI tool like pgAdmin or DBeaver on AWS using AWS System Manager Session Manager (AWS SSM). I assume you or concern team have already configured the necessary IAM roles, permissions, and user setups. Now, let’s explore how to connect to your database using popular PostgreSQL UI tools like pgAdmin or DBeaver.

Below are the steps we will follow once we establish the necessary connections using the CLI on the bastion host to a PostgreSQL instance in the cloud.

Step 1: Modify the hosts file to map the Managed Instance

We will modify our machine’s hosts file to connect to the PostgreSQL managed instance, such as RDS or Amazon Aurora. The entry should look like this:

127.0.0.1 <<RDS_OR_AMAZON_AURORA_END_POINT>>

Step 2: Create a connection profile based on IAM role and authentication

Create a profile based on your IAM roles and login credentials, and update the configuration and credentials files generated during the AWS configuration steps.

Sample path of config and credential file on Windows machine.

Sample configuration:

[sample]
aws_access_key_id = *************
aws_secret_access_key = ***************
[profile sample]
role_arn= arn:aws:iam::**********:role/***********-user-role
source_profile=sample
mfa_serial = arn:aws:iam::**********:mfa/<<iamuser>>
role-session-name = <<iamuser>>
region = us-east-1
output = json

Step 3: Start a session and establish port forwarding using AWS SSM

Using AWS SSM, start a session in your preferred command line terminal. This will create the necessary tunneling for port forwarding to the remote RDS or Amazon Aurora endpoints. In the sample command below, we are forwarding requests from local port 5433 to the remote database port 5432 (the default for PostgreSQL).

aws ssm start-session --profile sample --target <<INSTANCE_ID>> --document-name AWS-StartPortForwardingSessionToRemoteHost --parameters "host=<<RDS_OR_AMAZON_AURORA_END_POINT>>,portNumber=5432,localPortNumber=5433"

Step 4: Connect to PostgreSQL on the cloud using pgAdmin or DBeaver

Now, use the following connection parameters to configure the connections to your Amazon Aurora or RDS PostgreSQL instance.

Conclusion

By following these steps, you can securely connect to your PostgreSQL cloud instance using AWS SSM. Whether with pgAdmin or DBeaver, the process simplifies access without complex network setups. This approach ensures secure, efficient database management.

Posted in AWS - Database | Tagged , , , , , , , , , | Leave a comment

Exploring PostgreSQL 17: A Developer’s Guide to New Features – Part 7: pg_maintain Predefined Role for Maintenance.

Welcome to Part 7 of our series exploring the exciting new features anticipated in the official PostgreSQL 17 release. In this series, we delve into newly added features and discuss how they can benefit database developers and migration engineers transitioning to PostgreSQL 17 in future.

In case you missed the earlier parts:

Ready to enhance your PostgreSQL development skills? My course on PostgreSQL Migration and PL/pgSQL will help you master database development and Migration. Click here to start your journey, and use code DBGRP30 to avail 30% off!

New Maintenance Role and Access privilege in PostgreSQL 17

Databases are designed to handle critical and sensitive information, and it’s essential to enforce minimal grants to control who can access certain data and what operations can be performed. While necessary grants on data access and processing are common, having permissions for maintenance activities is equally crucial to ensure business SLA adherence and performance optimization.

With PostgreSQL 17, a new predefined role, pg_maintain, has been introduced to manage privileges for maintenance operations such as VACUUM, ANALYZE, REINDEX, and more on database tables, indexes, or views.

This means that a database developer is not only responsible for building business-critical functionality but can also maintain its respective database objects like Table or indexes, ensuring they meet performance and business SLAs all governed by grants that can be precisely controlled.

The following maintenance activities are supported with the new pg_maintain role, allowing for granular maintenance permissions at the table , index or view level:

VACUUM: Cleans up dead tuples and recovers storage.
ANALYZE: Collects statistics.
REINDEX: Rebuilds indexes.
REFRESH MATERIALIZED VIEW: Updates the data in a materialized view.
CLUSTER: Reorders the physical storage of data in a table based on an index.
LOCK TABLE: Acquires a lock on a table to prevent concurrent modifications.

Additionally, a new access privilege, maintain, allows granting specific maintenance permissions at the table level.

Here’s an example illustrating how a database user, who initially only had basic read and write privileges, can now be elevated to perform maintenance tasks like VACUUM and ANALYZE:

With the predefined pg_maintain role, it’s now easier to delegate maintenance operations at the user level, ensuring better control and management within the database.

postgres=# grant pg_maintain  to myuser2;
GRANT ROLE

postgres=# \drg+ myuser2
List of role grants
Role name | Member of | Options | Grantor
-----------+-------------+--------------+----------
myuser2 | pg_maintain | INHERIT, SET | postgres
(1 row)

Conclusion

The introduction of the pg_maintain role in PostgreSQL 17 marks a important step towards more refined access control for maintenance activities. This enhancement not only simplifies the process of assigning and managing maintenance tasks but also helps maintain database performance and reliability by ensuring that only authorized users can perform critical operations. As organizations continue to prioritize data integrity and performance, the pg_maintain role and maintain access privilege will be a valuable tool in the database administrator’s toolkit.

Posted in postgresql | Tagged , , , , , , | 2 Comments