Skip to content

fix(slots): cleanup replication slots on replicas when features disabled#9381

Merged
leonardoce merged 4 commits intomainfrom
dev/9380
Dec 15, 2025
Merged

fix(slots): cleanup replication slots on replicas when features disabled#9381
leonardoce merged 4 commits intomainfrom
dev/9380

Conversation

@mnencia
Copy link
Member

@mnencia mnencia commented Dec 7, 2025

When both replicationSlots.highAvailability.enabled and replicationSlots.synchronizeReplicas.enabled are disabled, the Replicator now performs cleanup by calling reconcile() before continuing the loop, ensuring user-defined replication slots are properly removed from replica pods.

This change also adds webhook validation to prevent users from removing
the synchronizeReplicas configuration section while it is still enabled.
This ensures that cleanup logic can properly execute before the
configuration is removed, preventing orphaned replication slots on
replicas.

Closes #9380

TODO

  • document the known limitations: when excludedPatterns is modified the operator isn't able to compare the difference with the previous this my cause some lingering rep slots

@mnencia mnencia requested a review from a team as a code owner December 7, 2025 10:53
@dosubot dosubot bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Dec 7, 2025
@cnpg-bot cnpg-bot added backport-requested ◀️ This pull request should be backported to all supported releases release-1.25 release-1.26 release-1.27 labels Dec 7, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Dec 7, 2025

❗ By default, the pull request is configured to backport to all release branches.

  • To stop backporting this pr, remove the label: backport-requested ◀️ or add the label 'do not backport'
  • To stop backporting this pr to a certain release branch, remove the specific branch label: release-x.y

@dosubot dosubot bot added bug 🐛 Something isn't working ok to merge 👌 This PR can be merged labels Dec 7, 2025
@mnencia
Copy link
Member Author

mnencia commented Dec 7, 2025

/test

@github-actions
Copy link
Contributor

github-actions bot commented Dec 7, 2025

@mnencia, here's the link to the E2E on CNPG workflow run: https://github.com/cloudnative-pg/cloudnative-pg/actions/runs/20003199261

@mnencia mnencia removed the ok to merge 👌 This PR can be merged label Dec 7, 2025
@cnpg-bot cnpg-bot added the ok to merge 👌 This PR can be merged label Dec 7, 2025
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:XS This PR changes 0-9 lines, ignoring generated files. labels Dec 10, 2025
@armru
Copy link
Member

armru commented Dec 10, 2025

/test limit=local

@github-actions
Copy link
Contributor

@armru, here's the link to the E2E on CNPG workflow run: https://github.com/cloudnative-pg/cloudnative-pg/actions/runs/20095657321

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Dec 10, 2025
Copilot AI review requested due to automatic review settings December 11, 2025 09:36
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes cleanup of user-defined replication slots on replica pods when replication slot features are disabled. It adds webhook validation to prevent configuration removal while features are enabled and ensures cleanup reconciliation runs before the replicator continues its loop.

Key Changes:

  • Modified replication slot runner to execute cleanup reconciliation when features are disabled
  • Added webhook validation preventing removal of synchronizeReplicas configuration while enabled
  • Enhanced test coverage for new synchronizeReplicas validation rules

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
internal/management/controller/slots/runner/runner.go Added cleanup reconciliation call when replication features are disabled to properly remove user-defined slots from replicas
internal/webhook/v1/cluster_webhook.go Added validation to prevent removing synchronizeReplicas configuration section while the feature is enabled, ensuring cleanup can execute properly
internal/webhook/v1/cluster_webhook_test.go Added comprehensive test cases for new synchronizeReplicas validation rules, including scenarios for removing configuration sections and disabling features

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@mnencia mnencia force-pushed the dev/9380 branch 2 times, most recently from 8fe975a to a49b60a Compare December 11, 2025 10:04
mnencia and others added 3 commits December 15, 2025 13:18
When both replicationSlots.highAvailability.enabled and
replicationSlots.synchronizeReplicas.enabled are disabled, the
Replicator now performs cleanup by calling reconcile() before
continuing the loop, ensuring user-defined replication slots
are properly removed from replica pods.

Closes #9380

Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com>
This change also adds webhook validation to prevent users from removing
the synchronizeReplicas configuration section while it is still enabled.
This ensures that cleanup logic can properly execute before the
configuration is removed, preventing orphaned replication slots on
replicas.

Signed-off-by: Armando Ruocco <armando.ruocco@enterprisedb.com>
Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com>
Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com>
@leonardoce
Copy link
Contributor

leonardoce commented Dec 15, 2025

After a chat with @armru we found that the improved webhook doesn't really check prevent the mistake from happening because it will run after the defaulting webhook, and the defaulting webhook will just set the cluster.spec.replicationSlots.highAvailability and cluster.spec.replicationSlots.synchronizeReplicas if they are not set.

The condition in https://github.com/cloudnative-pg/cloudnative-pg/pull/9381/files#diff-30bf0aa461c01460908ae409672d7d5be2a1d9065620671ea75ef81a5e9cec03R2311 won't ever be met because of that.

To cover this, we need to keep the old version of the cluster.spec.replicationSlots.highAvailability stanza in the cluster status to clean up only the replication slots that are managed by the operator. This will be addressed in a subsequent commit.

@leonardoce leonardoce merged commit b7c9690 into main Dec 15, 2025
31 of 34 checks passed
@leonardoce leonardoce deleted the dev/9380 branch December 15, 2025 15:15
cnpg-bot pushed a commit that referenced this pull request Dec 15, 2025
…led (#9381)

When both replicationSlots.highAvailability.enabled and
replicationSlots.synchronizeReplicas.enabled are disabled, the
Replicator now performs cleanup by calling reconcile() before continuing
the loop, ensuring user-defined replication slots are properly removed
from replica pods.

This change also adds webhook validation to prevent users from removing
the synchronizeReplicas configuration section while it is still enabled.
This ensures that cleanup logic can properly execute before the
configuration is removed, preventing orphaned replication slots on
replicas.

Closes #9380

Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com>
Signed-off-by: Armando Ruocco <armando.ruocco@enterprisedb.com>
Co-authored-by: Armando Ruocco <armando.ruocco@enterprisedb.com>
(cherry picked from commit b7c9690)
cnpg-bot pushed a commit that referenced this pull request Dec 15, 2025
…led (#9381)

When both replicationSlots.highAvailability.enabled and
replicationSlots.synchronizeReplicas.enabled are disabled, the
Replicator now performs cleanup by calling reconcile() before continuing
the loop, ensuring user-defined replication slots are properly removed
from replica pods.

This change also adds webhook validation to prevent users from removing
the synchronizeReplicas configuration section while it is still enabled.
This ensures that cleanup logic can properly execute before the
configuration is removed, preventing orphaned replication slots on
replicas.

Closes #9380

Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com>
Signed-off-by: Armando Ruocco <armando.ruocco@enterprisedb.com>
Co-authored-by: Armando Ruocco <armando.ruocco@enterprisedb.com>
(cherry picked from commit b7c9690)
leonardoce pushed a commit that referenced this pull request Dec 15, 2025
…led (#9381)

When both replicationSlots.highAvailability.enabled and
replicationSlots.synchronizeReplicas.enabled are disabled, the
Replicator now performs cleanup by calling reconcile() before continuing
the loop, ensuring user-defined replication slots are properly removed
from replica pods.

This change also adds webhook validation to prevent users from removing
the synchronizeReplicas configuration section while it is still enabled.
This ensures that cleanup logic can properly execute before the
configuration is removed, preventing orphaned replication slots on
replicas.

Closes #9380

Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com>
Signed-off-by: Armando Ruocco <armando.ruocco@enterprisedb.com>
Co-authored-by: Armando Ruocco <armando.ruocco@enterprisedb.com>
(cherry picked from commit b7c9690)
mnencia added a commit that referenced this pull request Jan 20, 2026
…led (#9381)

When both replicationSlots.highAvailability.enabled and
replicationSlots.synchronizeReplicas.enabled are disabled, the
Replicator now performs cleanup by calling reconcile() before continuing
the loop, ensuring user-defined replication slots are properly removed
from replica pods.

This change also adds webhook validation to prevent users from removing
the synchronizeReplicas configuration section while it is still enabled.
This ensures that cleanup logic can properly execute before the
configuration is removed, preventing orphaned replication slots on
replicas.

Closes #9380

Signed-off-by: Marco Nenciarini <marco.nenciarini@enterprisedb.com>
Signed-off-by: Armando Ruocco <armando.ruocco@enterprisedb.com>
Co-authored-by: Armando Ruocco <armando.ruocco@enterprisedb.com>
(cherry picked from commit b7c9690)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-requested ◀️ This pull request should be backported to all supported releases bug 🐛 Something isn't working lgtm This PR has been approved by a maintainer ok to merge 👌 This PR can be merged release-1.25 release-1.27 release-1.28 size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Replication slots not cleaned up on replicas when both HA and synchronizeReplicas are disabled

5 participants