Skip to content

PostgreSQL 17 incremental backup issue (PostgreSQL plugin) #1982

@lvaracka

Description

@lvaracka

Bareos component version

  1. bareos-fd: 23.0.5~pre146.7e91df1c0
  2. bareos-dir: 23.0.5~pre13.2643648c1
  3. bareos-sd: 23.0.5~pre13.2643648c1
  4. bconsole: 23.0.5~pre13.2643648c1
  5. python: 3.9.18
  6. pg8000: 1.31.2

Steps to reproduce

  1. configure backup job using PostgreSQL plugin (bareos-fd-postgresql)
  2. run full backup -> OK
  3. run incremental backup -> FAIL

Expected results

Expected result is the incremental backup job to finish successfully.

Actual results

Actual result is -> Fatal error: python3-fd-mod: Timeout waiting 60 sec. for wal file 00000001000000000000002B to be archived

Environment

- OS: Rocky Linux 9.4 (Blue Onyx)
- component: postgresql17-server-17.0-2PGDG.rhel9.x86_64

Relevant log output

17-Oct 13:57 test-pgsql-fd JobId 31017: python3-fd-mod: Connected to PostgreSQL version 170000
17-Oct 13:57 test-pgsql-fd JobId 31017: python3-fd-mod: Current LSN 0/2A000168, last LSN: 0/28000000
17-Oct 13:57 test-pgsql-fd JobId 31017: python3-fd-mod: A difference was found, between current_lsn 0/2A000168 and last LSN: 0/28000000
17-Oct 13:57 test-pgsql-fd JobId 31017: python3-fd-mod: Current LSN 0/2B000000, last LSN: 0/28000000
17-Oct 13:58 test-pgsql-fd JobId 31017: Fatal error: python3-fd-mod: Timeout waiting 60 sec. for wal file 00000001000000000000002B to be archived
17-Oct 13:58 test-pgsql-fd JobId 31017: Fatal error: filed/fd_plugins.cc:673 PluginSave: Command plugin "python:module_name=bareos-fd-postgresql:db_host=/run/postgresql/:db_user=postgres:wal_archive_dir=/var/lib/pgsql/wal_archive/" requested, but job is already cancelled.
17-Oct 13:58 test-pgsql-fd JobId 31017: python3-fd-mod: Database connection closed.

Relevant traces output

17-Oct-2024 13:57:25.872208 test-pgsql-fd (150): module/bareosfd.cc:1422-31017 python3-fd-mod: after pg_switch_wal(): current_lsn: 0/2B000000 last_lsn: 0/28000000
17-Oct-2024 13:57:25.872558 test-pgsql-fd (100): module/bareosfd.cc:1422-31017 python3-fd-mod: __wait_for_wal_archiving() started
17-Oct-2024 13:57:25.873487 test-pgsql-fd (100): module/bareosfd.cc:1422-31017 python3-fd-mod: __wait_for_wal_archiving(0/2B000000): wal filename=00000001000000000000002B
17-Oct-2024 13:58:27.136316 test-pgsql-fd (100): module/bareosfd.cc:1422-31017 python3-fd-mod: __wait_for_wal_archiving() ended timeout

Anything else?

I think the issue might be related to PostgreSQL commit where they changed the behavior of pg_walfile_name() and pg_walfile_name_offset() functions to always return current WAL file name instead of preceding WAL even if the LSN is on a segment boundary. So bareos postgresql plugin is waiting for WAL file currently in use and not the archived one.

Here's the difference between PostgreSQL 16 and 17:

PG 16:

postgres=# SELECT pg_current_wal_lsn(), pg_walfile_name(pg_current_wal_lsn()), pg_switch_wal(), pg_current_wal_lsn(), pg_walfile_name(pg_current_wal_lsn());
 pg_current_wal_lsn |     pg_walfile_name      | pg_switch_wal | pg_current_wal_lsn |     pg_walfile_name
--------------------+--------------------------+---------------+--------------------+--------------------------
 0/26000060         | 000000010000000000000026 | 0/26000078    | 0/27000000         | 000000010000000000000026

PG 17:

postgres=# SELECT pg_current_wal_lsn(), pg_walfile_name(pg_current_wal_lsn()), pg_switch_wal(), pg_current_wal_lsn(), pg_walfile_name(pg_current_wal_lsn());
 pg_current_wal_lsn |     pg_walfile_name      | pg_switch_wal | pg_current_wal_lsn |     pg_walfile_name
--------------------+--------------------------+---------------+--------------------+--------------------------
 0/2F0000A0         | 00000001000000000000002F | 0/2F0000B8    | 0/30000000         | 000000010000000000000030

Metadata

Metadata

Labels

bugThis addresses a bug

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions