-
Notifications
You must be signed in to change notification settings - Fork 632
Closed
[Bug]: Race condition in replica cluster switch prevents designated primary transition completion#9591
Defect
Copy link
Labels
bug 🐛Something isn't workingSomething isn't working
Description
Is there an existing issue already for this bug?
- I have searched for an existing issue, and could not find anything. I believe this is a new bug.
I have read the troubleshooting guide
- I have read the troubleshooting guide and I think this is a new bug.
I am running a supported version of CloudNativePG
- I have read the troubleshooting guide and I think this is a new bug.
Contact Details
No response
Version
trunk (main)
What version of Kubernetes are you using?
1.34
What is your Kubernetes environment?
Self-managed: kind (evaluation)
How did you install the operator?
YAML manifest
What happened?
During a replica cluster switch operation, a rare race condition (observed in ~1-5% of test runs) occurs where the operator and instance manager concurrently modify the cluster object, resulting in an optimistic lock conflict that prevents the designated primary transition from completing.
From E2E test failure: https://github.com/cloudnative-pg/cloudnative-pg/actions/runs/20547048546
Relevant log output
Attempted transition:
{"ts":"2025-12-28T02:52:54.810548Z",
"msg":"Setting myself as the current designated primary"}Optimistic lock conflict:
{"ts":"2025-12-28T02:52:55.372302Z",
"error":"Operation cannot be fulfilled... the object has been modified"}Test timeout:
Timed out after 300.000s.
Expected pod to be in recovery mode
Code of Conduct
- I agree to follow this project's Code of Conduct
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bug 🐛Something isn't workingSomething isn't working