Skip to content

fix(major-upgrade): reset TimelineID to 1 after upgrade#9830

Merged
mnencia merged 4 commits intomainfrom
dev/9764
Feb 3, 2026
Merged

fix(major-upgrade): reset TimelineID to 1 after upgrade#9830
mnencia merged 4 commits intomainfrom
dev/9764

Conversation

@mnencia
Copy link
Member

@mnencia mnencia commented Feb 2, 2026

After a major version upgrade, pg_upgrade creates a new database system with a new System ID and resets the timeline to 1. However, cluster.Status.TimelineID retained its old value (e.g., timeline 2), causing replicas to restore incompatible timeline history files from object storage.

This resulted in PostgreSQL fatal errors:

"requested timeline 2 is not a child of this server's history"
"Latest checkpoint in file backup_label is at 0/13000080 on timeline 1, but in the history of the requested timeline, the server forked off from that timeline at 0/70226C8."

The fix sets cluster.Status.TimelineID to 1 after major upgrade completion, ensuring validateTimelineHistoryFile() blocks incompatible timeline history files (e.g., 00000002.history).

E2E tests verify this by performing a switchover to timeline 2 before the major upgrade, then confirming TimelineID is reset to 1 after upgrade completion.

Closes #9764

@cnpg-bot cnpg-bot added backport-requested ◀️ This pull request should be backported to all supported releases release-1.25 release-1.27 release-1.28 labels Feb 2, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Feb 2, 2026

❗ By default, the pull request is configured to backport to all release branches.

  • To stop backporting this pr, remove the label: backport-requested ◀️ or add the label 'do not backport'
  • To stop backporting this pr to a certain release branch, remove the specific branch label: release-x.y

@mnencia
Copy link
Member Author

mnencia commented Feb 2, 2026

/test

@github-actions
Copy link
Contributor

github-actions bot commented Feb 2, 2026

@mnencia, here's the link to the E2E on CNPG workflow run: https://github.com/cloudnative-pg/cloudnative-pg/actions/runs/21597168874

@mnencia mnencia marked this pull request as ready for review February 2, 2026 16:13
@mnencia mnencia requested review from a team, NiccoloFei, jsilvela and litaocdl as code owners February 2, 2026 16:13
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. bug 🐛 Something isn't working labels Feb 2, 2026
@mnencia
Copy link
Member Author

mnencia commented Feb 2, 2026

/test ft=postgres-major-upgrade

@github-actions
Copy link
Contributor

github-actions bot commented Feb 2, 2026

@mnencia, here's the link to the E2E on CNPG workflow run: https://github.com/cloudnative-pg/cloudnative-pg/actions/runs/21598394984

@mnencia mnencia force-pushed the dev/9764 branch 3 times, most recently from fe9639c to f59b9bb Compare February 2, 2026 17:06
@mnencia
Copy link
Member Author

mnencia commented Feb 2, 2026

/test ft=postgres-major-upgrade

@github-actions
Copy link
Contributor

github-actions bot commented Feb 2, 2026

@mnencia, here's the link to the E2E on CNPG workflow run: https://github.com/cloudnative-pg/cloudnative-pg/actions/runs/21599721427

@mnencia mnencia force-pushed the dev/9764 branch 2 times, most recently from 3f47dd3 to 141533d Compare February 2, 2026 17:27
@cnpg-bot cnpg-bot added the ok to merge 👌 This PR can be merged label Feb 2, 2026
@mnencia mnencia force-pushed the dev/9764 branch 2 times, most recently from bf9d378 to 78317af Compare February 2, 2026 17:33
mnencia and others added 3 commits February 3, 2026 11:03
After a major version upgrade, pg_upgrade creates a new database
system with a new System ID and resets the timeline to 1.
However, cluster.Status.TimelineID retained its old value (e.g.,
timeline 2), causing replicas to restore incompatible timeline
history files from object storage.

This resulted in PostgreSQL fatal errors:

    "requested timeline 2 is not a child of this server's history"
    "Latest checkpoint in file backup_label is at 0/13000080 on
     timeline 1, but in the history of the requested timeline,
     the server forked off from that timeline at 0/70226C8."

The fix sets cluster.Status.TimelineID to 1 after major upgrade
completion, ensuring validateTimelineHistoryFile() blocks
incompatible timeline history files (e.g., 00000002.history).

E2E tests verify this by performing a switchover to timeline 2
before the major upgrade, then confirming TimelineID is reset to
1 after upgrade completion.

Closes #9764

Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com>
Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com>
Add explicit test verification for TimelineID being reset to 1
after major upgrade completion in majorVersionUpgradeHandleCompletion.

Signed-off-by: Armando Ruocco <armando.ruocco@enterprisedb.com>
Signed-off-by: Gabriele Bartolini <gabriele.bartolini@enterprisedb.com>
@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Feb 3, 2026
@mnencia mnencia merged commit a2f9d47 into main Feb 3, 2026
36 checks passed
@mnencia mnencia deleted the dev/9764 branch February 3, 2026 12:17
cnpg-bot pushed a commit that referenced this pull request Feb 3, 2026
After a major version upgrade, pg_upgrade creates a new database system
with a new System ID and resets the timeline to 1. However,
cluster.Status.TimelineID retained its old value (e.g., timeline 2),
causing replicas to restore incompatible timeline history files from
object storage.

This resulted in PostgreSQL fatal errors:

    "requested timeline 2 is not a child of this server's history"

    "Latest checkpoint in file backup_label is at 0/13000080 on timeline 1,
     but in the history of the requested timeline, the server forked off from
     that timeline at 0/70226C8."

The fix sets cluster.Status.TimelineID to 1 after major upgrade
completion, ensuring validateTimelineHistoryFile() blocks incompatible
timeline history files (e.g., 00000002.history).

E2E tests verify this by performing a switchover to timeline 2 before
the major upgrade, then confirming TimelineID is reset to 1 after
upgrade completion.

Closes #9764

Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com>
Signed-off-by: Armando Ruocco <armando.ruocco@enterprisedb.com>
Signed-off-by: Gabriele Bartolini <gabriele.bartolini@enterprisedb.com>
Co-authored-by: Armando Ruocco <armando.ruocco@enterprisedb.com>
Co-authored-by: Gabriele Bartolini <gabriele.bartolini@enterprisedb.com>
(cherry picked from commit a2f9d47)
cnpg-bot pushed a commit that referenced this pull request Feb 3, 2026
After a major version upgrade, pg_upgrade creates a new database system
with a new System ID and resets the timeline to 1. However,
cluster.Status.TimelineID retained its old value (e.g., timeline 2),
causing replicas to restore incompatible timeline history files from
object storage.

This resulted in PostgreSQL fatal errors:

    "requested timeline 2 is not a child of this server's history"

    "Latest checkpoint in file backup_label is at 0/13000080 on timeline 1,
     but in the history of the requested timeline, the server forked off from
     that timeline at 0/70226C8."

The fix sets cluster.Status.TimelineID to 1 after major upgrade
completion, ensuring validateTimelineHistoryFile() blocks incompatible
timeline history files (e.g., 00000002.history).

E2E tests verify this by performing a switchover to timeline 2 before
the major upgrade, then confirming TimelineID is reset to 1 after
upgrade completion.

Closes #9764

Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com>
Signed-off-by: Armando Ruocco <armando.ruocco@enterprisedb.com>
Signed-off-by: Gabriele Bartolini <gabriele.bartolini@enterprisedb.com>
Co-authored-by: Armando Ruocco <armando.ruocco@enterprisedb.com>
Co-authored-by: Gabriele Bartolini <gabriele.bartolini@enterprisedb.com>
(cherry picked from commit a2f9d47)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-requested ◀️ This pull request should be backported to all supported releases bug 🐛 Something isn't working lgtm This PR has been approved by a maintainer ok to merge 👌 This PR can be merged release-1.27 release-1.28 size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Problem upgrading on a restored cluster.

5 participants