Skip to content

feat: introduce experimental support for failover quorum#7572

Merged
gbartolini merged 22 commits intocloudnative-pg:mainfrom
leonardoce:syncquorum
Jul 29, 2025
Merged

feat: introduce experimental support for failover quorum#7572
gbartolini merged 22 commits intocloudnative-pg:mainfrom
leonardoce:syncquorum

Conversation

@leonardoce
Copy link
Contributor

@leonardoce leonardoce commented May 14, 2025

This commit adds opt-in, experimental support for "failover quorum", also known as "quorum-based failover", in CloudNativePG.

Failover quorum is a mechanism designed to improve data durability and safety during failover events by ensuring that the promoted replica contains all synchronously committed data.

With synchronous replication, a transaction is acknowledged only after all required synchronous standbys have received the WAL data. However, this alone doesn't guarantee that the operator can always promote the most advanced replica during a failure.

The failover quorum mechanism addresses this risk by requiring the operator to verify that a quorum of replicas agrees on the promotion. This ensures that the selected replica has all required committed data. If this quorum is not met, failover will not proceed, reducing the risk of data loss.

To enable the feature, set the annotation: cnpg.io/failoverQuorum="true" in the Cluster resource.

Once stabilised, a dedicated configuration field will replace this annotation.

For further information, refer to the included documentation and the related issue.

Closes #7481

@cnpg-bot cnpg-bot added backport-requested ◀️ This pull request should be backported to all supported releases release-1.22 release-1.24 release-1.25 labels May 14, 2025
@github-actions
Copy link
Contributor

❗ By default, the pull request is configured to backport to all release branches.

  • To stop backporting this pr, remove the label: backport-requested ◀️ or add the label 'do not backport'
  • To stop backporting this pr to a certain release branch, remove the specific branch label: release-x.y

@leonardoce leonardoce marked this pull request as ready for review May 19, 2025 09:34
@leonardoce leonardoce requested review from a team and jsilvela as code owners May 19, 2025 09:34
@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. enhancement 🪄 New feature or request labels May 19, 2025
@armru armru force-pushed the syncquorum branch 3 times, most recently from 00a7307 to 5d34856 Compare May 22, 2025 10:35
@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label May 26, 2025
@armru armru force-pushed the syncquorum branch 2 times, most recently from b251677 to 4f6fbe8 Compare May 26, 2025 15:11
@armru
Copy link
Member

armru commented May 26, 2025

/test limit=local

@github-actions
Copy link
Contributor

@armru, here's the link to the E2E on CNPG workflow run: https://github.com/cloudnative-pg/cloudnative-pg/actions/runs/15257221326

@cnpg-bot cnpg-bot added the ok to merge 👌 This PR can be merged label May 26, 2025
@mnencia mnencia force-pushed the syncquorum branch 2 times, most recently from 13e1914 to 08b4771 Compare July 7, 2025 08:40
@mnencia mnencia force-pushed the syncquorum branch 2 times, most recently from 80a5126 to 63ef015 Compare July 15, 2025 09:16
@mnencia mnencia force-pushed the syncquorum branch 2 times, most recently from 3055371 to 918eaca Compare July 22, 2025 15:13
armru and others added 18 commits July 29, 2025 09:21
Signed-off-by: Armando Ruocco <armando.ruocco@enterprisedb.com>
Signed-off-by: Armando Ruocco <armando.ruocco@enterprisedb.com>
Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com>
Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com>
Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com>
Signed-off-by: Leonardo Cecchi <leonardo.cecchi@enterprisedb.com>
Signed-off-by: Francesco Canovai <francesco.canovai@enterprisedb.com>
Fix an issue where enabling the syncquorum on a cluster with synchronous
replication prevented failovers until a configuration change.

Signed-off-by: Francesco Canovai <francesco.canovai@enterprisedb.com>
Signed-off-by: Francesco Canovai <francesco.canovai@enterprisedb.com>
The .spec.cluster.name doesn't exist anymore and the syncquorum name is
tied to the cluster name. We remove the column.

Signed-off-by: Francesco Canovai <francesco.canovai@enterprisedb.com>
We should display how long ago the resource was generated.

Signed-off-by: Francesco Canovai <francesco.canovai@enterprisedb.com>
Signed-off-by: Leonardo Cecchi <leonardo.cecchi@enterprisedb.com>
Document how to enable the feature and how it works, with a few
scenarios to make it easier to understand.

Signed-off-by: Francesco Canovai <francesco.canovai@enterprisedb.com>
Signed-off-by: Leonardo Cecchi <leonardo.cecchi@enterprisedb.com>
Signed-off-by: Leonardo Cecchi <leonardo.cecchi@enterprisedb.com>
Signed-off-by: Leonardo Cecchi <leonardo.cecchi@enterprisedb.com>
Signed-off-by: Francesco Canovai <francesco.canovai@enterprisedb.com>
Signed-off-by: Leonardo Cecchi <leonardo.cecchi@enterprisedb.com>
Signed-off-by: Gabriele Bartolini <gabriele.bartolini@enterprisedb.com>
Signed-off-by: Gabriele Bartolini <gabriele.bartolini@enterprisedb.com>
@gbartolini gbartolini changed the title feat: quorum check before promotion feat: Quorum-based failover Jul 29, 2025
@gbartolini gbartolini changed the title feat: Quorum-based failover feat: introduce experimental support for failover quorum Jul 29, 2025
@gbartolini gbartolini merged commit b01bbc9 into cloudnative-pg:main Jul 29, 2025
32 checks passed
@gbartolini
Copy link
Contributor

@ardentperf, I apologise for not mentioning your contributions in the commit itself. I will make sure we add you in #8170.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-requested ◀️ This pull request should be backported to all supported releases enhancement 🪄 New feature or request lgtm This PR has been approved by a maintainer ok to merge 👌 This PR can be merged release-1.22 release-1.25 release-1.26 size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Check for synchronous replication quorum before a failover

7 participants