What happened?
Test_RU12_RollingUpdateWithPCSScaleInDuringUpdate intermittently fails in CI with:
Failed to trigger rolling update on pc-c: Operation cannot be fulfilled on
podcliquesets.grove.io "workload1": the object has been modified; please apply
your changes to the latest version and try again
triggerPodCliqueRollingUpdate calls GET → modify → UPDATE sequentially for pc-a, pc-b, pc-c. After the first update, the grove controller reconciles and writes back to the PCS, changing its resourceVersion. The next update conflicts and RetryOnConflict (5 retries × 10ms) cannot recover.
Failures confirmed across multiple branches:
erez/chore/update-kai-version: runs 23736130190, 23480636572
crd-upgrader-impl: runs 23756654965, 23747858360, 23445539025
Not reproducible locally — CI DinD with CPU throttling causes bursty controller reconciliation that widens the conflict window.
What did you expect to happen?
Rolling update tests should pass reliably in CI regardless of controller reconciliation timing.
Environment
- CI runners:
prod-grove-e2e-v1 (Docker-in-Docker with CPU limits)
- Cluster: k3d with 30 KWOK nodes (default e2e preset)
- Kubernetes: k3s v1.34.2
What happened?
Test_RU12_RollingUpdateWithPCSScaleInDuringUpdateintermittently fails in CI with:triggerPodCliqueRollingUpdatecalls GET → modify → UPDATE sequentially for pc-a, pc-b, pc-c. After the first update, the grove controller reconciles and writes back to the PCS, changing itsresourceVersion. The next update conflicts andRetryOnConflict(5 retries × 10ms) cannot recover.Failures confirmed across multiple branches:
erez/chore/update-kai-version: runs 23736130190, 23480636572crd-upgrader-impl: runs 23756654965, 23747858360, 23445539025Not reproducible locally — CI DinD with CPU throttling causes bursty controller reconciliation that widens the conflict window.
What did you expect to happen?
Rolling update tests should pass reliably in CI regardless of controller reconciliation timing.
Environment
prod-grove-e2e-v1(Docker-in-Docker with CPU limits)