Skip to content

fix: disable management of end-of-wal file flag during backup restoration#8873

Merged
mnencia merged 1 commit intocloudnative-pg:mainfrom
leonardoce:wal-size-restore
Oct 17, 2025
Merged

fix: disable management of end-of-wal file flag during backup restoration#8873
mnencia merged 1 commit intocloudnative-pg:mainfrom
leonardoce:wal-size-restore

Conversation

@leonardoce
Copy link
Contributor

@leonardoce leonardoce commented Oct 16, 2025

When the end of the WAL stream is reached, the parallel WAL restore feature attempts to predict the names of subsequent WAL files to restore and records the first missing WAL file.

On high-availability (HA) replicas, if PostgreSQL requests the first missing WAL file, the code returns an error status that prompts PostgreSQL to switch to streaming replication.

Currently, the code assumes a wal_segment_size of 16MB for predicting the next WAL file names. If the configured WAL segment size exceeds 16MB, it may request non-existent WAL files. For instance, with 16MB segments, the names would range from 000000010000000100000000 to 0000000100000001000000FF before moving to the next segment. For 1GB segments, they would range from 000000010000000100000000 to 000000010000000100000003.

With the assumption of a 16MB segment size, the code will not find the WALs from 000000010000000100000004 to 0000000100000001000000FF.

While this assumption does not affect HA replicas - which can shift to streaming mode - it's problematic for a PostgreSQL instance seeking consistency after a restore, as the restore process will fail.

This patch disables end-of-wal file marker management during replication, addressing restore issues for backups that were:

  1. using a custom WAL file segment size
  2. utilizing parallel WAL recovery
  3. initiated on one WAL segment and concluded on a different one

Closes: #8874

…tion

When the end of the WAL stream is reached, the parallel WAL restore feature
attempts to predict the names of subsequent WAL files to restore and records
the first missing WAL file.

On high-availability (HA) replicas, if PostgreSQL requests the first missing
WAL file, the code returns an error status that prompts PostgreSQL to switch
to streaming replication.

Currently, the code assumes a wal_segment_size of 16MB for predicting the
next WAL file names. If the configured WAL segment size exceeds 16MB, it
may request non-existent WAL files. For instance, with 16MB segments, the
names would range from 000000010000000100000000 to
0000000100000001000000FF before moving to the next segment. For 1GB
segments, they would range from 000000010000000100000000 to
000000010000000100000003.

With the assumption of a 16MB segment size, the code will not find the WALs
from 000000010000000100000004 to 0000000100000001000000FF.

While this assumption does not affect HA replicas—which can shift to
streaming mode—it's problematic for a PostgreSQL instance seeking
consistency after a restore, as the restore process will fail.

This patch disables end-of-wal file marker management during replication,
addressing restore issues for backups that were:

1. using a custom WAL file segment size
2. utilizing parallel WAL recovery
3. initiated on one WAL segment and concluded on a different one

Signed-off-by: Leonardo Cecchi <leonardo.cecchi@enterprisedb.com>
@leonardoce leonardoce requested a review from a team as a code owner October 16, 2025 14:07
@cnpg-bot cnpg-bot added backport-requested ◀️ This pull request should be backported to all supported releases release-1.25 release-1.26 release-1.27 labels Oct 16, 2025
@github-actions
Copy link
Contributor

❗ By default, the pull request is configured to backport to all release branches.

  • To stop backporting this pr, remove the label: backport-requested ◀️ or add the label 'do not backport'
  • To stop backporting this pr to a certain release branch, remove the specific branch label: release-x.y

@dosubot dosubot bot added size:XS This PR changes 0-9 lines, ignoring generated files. bug 🐛 Something isn't working ok to merge 👌 This PR can be merged labels Oct 16, 2025
@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Oct 16, 2025
@gbartolini
Copy link
Contributor

/test

@github-actions
Copy link
Contributor

@gbartolini, here's the link to the E2E on CNPG workflow run: https://github.com/cloudnative-pg/cloudnative-pg/actions/runs/18564767910

@mnencia mnencia merged commit 22f9338 into cloudnative-pg:main Oct 17, 2025
37 of 50 checks passed
cnpg-bot pushed a commit that referenced this pull request Oct 17, 2025
…tion (#8873)

When the end of the WAL stream is reached, the parallel WAL restore
feature attempts to predict the names of subsequent WAL files to restore
and records the first missing WAL file.

On high-availability (HA) replicas, if PostgreSQL requests the first
missing WAL file, the code returns an error status that prompts
PostgreSQL to switch to streaming replication.

Currently, the code assumes a wal_segment_size of 16MB for predicting
the next WAL file names. If the configured WAL segment size exceeds
16MB, it may request non-existent WAL files. For instance, with 16MB
segments, the names would range from `000000010000000100000000` to
`0000000100000001000000FF` before moving to the next segment. For 1GB
segments, they would range from `000000010000000100000000` to
`000000010000000100000003`.

With the assumption of a 16MB segment size, the code will not find the
WALs from `000000010000000100000004` to `0000000100000001000000FF`.

While this assumption does not affect HA replicas - which can shift to
streaming mode - it's problematic for a PostgreSQL instance seeking
consistency after a restore, as the restore process will fail.

This patch disables end-of-wal file marker management during
replication, addressing restore issues for backups that were:

1. using a custom WAL file segment size
2. utilizing parallel WAL recovery
3. initiated on one WAL segment and concluded on a different one

Closes: #8874

Signed-off-by: Leonardo Cecchi <leonardo.cecchi@enterprisedb.com>
(cherry picked from commit 22f9338)
cnpg-bot pushed a commit that referenced this pull request Oct 17, 2025
…tion (#8873)

When the end of the WAL stream is reached, the parallel WAL restore
feature attempts to predict the names of subsequent WAL files to restore
and records the first missing WAL file.

On high-availability (HA) replicas, if PostgreSQL requests the first
missing WAL file, the code returns an error status that prompts
PostgreSQL to switch to streaming replication.

Currently, the code assumes a wal_segment_size of 16MB for predicting
the next WAL file names. If the configured WAL segment size exceeds
16MB, it may request non-existent WAL files. For instance, with 16MB
segments, the names would range from `000000010000000100000000` to
`0000000100000001000000FF` before moving to the next segment. For 1GB
segments, they would range from `000000010000000100000000` to
`000000010000000100000003`.

With the assumption of a 16MB segment size, the code will not find the
WALs from `000000010000000100000004` to `0000000100000001000000FF`.

While this assumption does not affect HA replicas - which can shift to
streaming mode - it's problematic for a PostgreSQL instance seeking
consistency after a restore, as the restore process will fail.

This patch disables end-of-wal file marker management during
replication, addressing restore issues for backups that were:

1. using a custom WAL file segment size
2. utilizing parallel WAL recovery
3. initiated on one WAL segment and concluded on a different one

Closes: #8874

Signed-off-by: Leonardo Cecchi <leonardo.cecchi@enterprisedb.com>
(cherry picked from commit 22f9338)
cnpg-bot pushed a commit that referenced this pull request Oct 17, 2025
…tion (#8873)

When the end of the WAL stream is reached, the parallel WAL restore
feature attempts to predict the names of subsequent WAL files to restore
and records the first missing WAL file.

On high-availability (HA) replicas, if PostgreSQL requests the first
missing WAL file, the code returns an error status that prompts
PostgreSQL to switch to streaming replication.

Currently, the code assumes a wal_segment_size of 16MB for predicting
the next WAL file names. If the configured WAL segment size exceeds
16MB, it may request non-existent WAL files. For instance, with 16MB
segments, the names would range from `000000010000000100000000` to
`0000000100000001000000FF` before moving to the next segment. For 1GB
segments, they would range from `000000010000000100000000` to
`000000010000000100000003`.

With the assumption of a 16MB segment size, the code will not find the
WALs from `000000010000000100000004` to `0000000100000001000000FF`.

While this assumption does not affect HA replicas - which can shift to
streaming mode - it's problematic for a PostgreSQL instance seeking
consistency after a restore, as the restore process will fail.

This patch disables end-of-wal file marker management during
replication, addressing restore issues for backups that were:

1. using a custom WAL file segment size
2. utilizing parallel WAL recovery
3. initiated on one WAL segment and concluded on a different one

Closes: #8874

Signed-off-by: Leonardo Cecchi <leonardo.cecchi@enterprisedb.com>
(cherry picked from commit 22f9338)
THE-BRAHMA pushed a commit to THE-BRAHMA/cloudnative-pg that referenced this pull request Oct 30, 2025
…tion (cloudnative-pg#8873)

When the end of the WAL stream is reached, the parallel WAL restore
feature attempts to predict the names of subsequent WAL files to restore
and records the first missing WAL file.

On high-availability (HA) replicas, if PostgreSQL requests the first
missing WAL file, the code returns an error status that prompts
PostgreSQL to switch to streaming replication.

Currently, the code assumes a wal_segment_size of 16MB for predicting
the next WAL file names. If the configured WAL segment size exceeds
16MB, it may request non-existent WAL files. For instance, with 16MB
segments, the names would range from `000000010000000100000000` to
`0000000100000001000000FF` before moving to the next segment. For 1GB
segments, they would range from `000000010000000100000000` to
`000000010000000100000003`.

With the assumption of a 16MB segment size, the code will not find the
WALs from `000000010000000100000004` to `0000000100000001000000FF`.

While this assumption does not affect HA replicas - which can shift to
streaming mode - it's problematic for a PostgreSQL instance seeking
consistency after a restore, as the restore process will fail.

This patch disables end-of-wal file marker management during
replication, addressing restore issues for backups that were:

1. using a custom WAL file segment size
2. utilizing parallel WAL recovery
3. initiated on one WAL segment and concluded on a different one

Closes: cloudnative-pg#8874

Signed-off-by: Leonardo Cecchi <leonardo.cecchi@enterprisedb.com>
Signed-off-by: theBrahma <office.utpal.brahma@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-requested ◀️ This pull request should be backported to all supported releases bug 🐛 Something isn't working lgtm This PR has been approved by a maintainer ok to merge 👌 This PR can be merged release-1.25 release-1.26 release-1.27 size:XS This PR changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Restore can fail if WAL size is not the default and maxParallel > 1

5 participants