-
Notifications
You must be signed in to change notification settings - Fork 290
Closed
Description
Bareos component version
- bareos-fd: 23.0.5~pre146.7e91df1c0
- bareos-dir: 23.0.5~pre13.2643648c1
- bareos-sd: 23.0.5~pre13.2643648c1
- bconsole: 23.0.5~pre13.2643648c1
- python: 3.9.18
- pg8000: 1.31.2
Steps to reproduce
- configure backup job using PostgreSQL plugin (bareos-fd-postgresql)
- run full backup -> OK
- run incremental backup -> FAIL
Expected results
Expected result is the incremental backup job to finish successfully.
Actual results
Actual result is -> Fatal error: python3-fd-mod: Timeout waiting 60 sec. for wal file 00000001000000000000002B to be archived
Environment
- OS: Rocky Linux 9.4 (Blue Onyx)
- component: postgresql17-server-17.0-2PGDG.rhel9.x86_64Relevant log output
17-Oct 13:57 test-pgsql-fd JobId 31017: python3-fd-mod: Connected to PostgreSQL version 170000
17-Oct 13:57 test-pgsql-fd JobId 31017: python3-fd-mod: Current LSN 0/2A000168, last LSN: 0/28000000
17-Oct 13:57 test-pgsql-fd JobId 31017: python3-fd-mod: A difference was found, between current_lsn 0/2A000168 and last LSN: 0/28000000
17-Oct 13:57 test-pgsql-fd JobId 31017: python3-fd-mod: Current LSN 0/2B000000, last LSN: 0/28000000
17-Oct 13:58 test-pgsql-fd JobId 31017: Fatal error: python3-fd-mod: Timeout waiting 60 sec. for wal file 00000001000000000000002B to be archived
17-Oct 13:58 test-pgsql-fd JobId 31017: Fatal error: filed/fd_plugins.cc:673 PluginSave: Command plugin "python:module_name=bareos-fd-postgresql:db_host=/run/postgresql/:db_user=postgres:wal_archive_dir=/var/lib/pgsql/wal_archive/" requested, but job is already cancelled.
17-Oct 13:58 test-pgsql-fd JobId 31017: python3-fd-mod: Database connection closed.Relevant traces output
17-Oct-2024 13:57:25.872208 test-pgsql-fd (150): module/bareosfd.cc:1422-31017 python3-fd-mod: after pg_switch_wal(): current_lsn: 0/2B000000 last_lsn: 0/28000000
17-Oct-2024 13:57:25.872558 test-pgsql-fd (100): module/bareosfd.cc:1422-31017 python3-fd-mod: __wait_for_wal_archiving() started
17-Oct-2024 13:57:25.873487 test-pgsql-fd (100): module/bareosfd.cc:1422-31017 python3-fd-mod: __wait_for_wal_archiving(0/2B000000): wal filename=00000001000000000000002B
17-Oct-2024 13:58:27.136316 test-pgsql-fd (100): module/bareosfd.cc:1422-31017 python3-fd-mod: __wait_for_wal_archiving() ended timeoutAnything else?
I think the issue might be related to PostgreSQL commit where they changed the behavior of pg_walfile_name() and pg_walfile_name_offset() functions to always return current WAL file name instead of preceding WAL even if the LSN is on a segment boundary. So bareos postgresql plugin is waiting for WAL file currently in use and not the archived one.
Here's the difference between PostgreSQL 16 and 17:
PG 16:
postgres=# SELECT pg_current_wal_lsn(), pg_walfile_name(pg_current_wal_lsn()), pg_switch_wal(), pg_current_wal_lsn(), pg_walfile_name(pg_current_wal_lsn());
pg_current_wal_lsn | pg_walfile_name | pg_switch_wal | pg_current_wal_lsn | pg_walfile_name
--------------------+--------------------------+---------------+--------------------+--------------------------
0/26000060 | 000000010000000000000026 | 0/26000078 | 0/27000000 | 000000010000000000000026
PG 17:
postgres=# SELECT pg_current_wal_lsn(), pg_walfile_name(pg_current_wal_lsn()), pg_switch_wal(), pg_current_wal_lsn(), pg_walfile_name(pg_current_wal_lsn());
pg_current_wal_lsn | pg_walfile_name | pg_switch_wal | pg_current_wal_lsn | pg_walfile_name
--------------------+--------------------------+---------------+--------------------+--------------------------
0/2F0000A0 | 00000001000000000000002F | 0/2F0000B8 | 0/30000000 | 000000010000000000000030
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugThis addresses a bugThis addresses a bug