Skip to content

loadbalancer: Fix deletion of backends during resynchronization#44711

Merged
joamaki merged 3 commits intocilium:mainfrom
joamaki:pr/joamaki/lb-resync-fixes
Mar 18, 2026
Merged

loadbalancer: Fix deletion of backends during resynchronization#44711
joamaki merged 3 commits intocilium:mainfrom
joamaki:pr/joamaki/lb-resync-fixes

Conversation

@joamaki
Copy link
Copy Markdown
Contributor

@joamaki joamaki commented Mar 10, 2026

The loadbalancer reflection of EndpointSlices kept track of what endpoints it had reflected and used this when doing a resync to remove orphan backends. This state held the service name, but this was incorrectly not set on the initial synchronization which could have caused backends to be left in the backend table if a resynchronization happened without processing an upsert event containing that backend between the initial and resynchronization. When this happened the following log message would have appeared:

msg="BUG: Unexpected failure to delete backends" module=agent.controlplane.loadbalancer-reflectors.k8s-reflector error="object not found"

This adds the k8s/resync command so we can simulate resynchronization (e.g. loss of api-server connectivity and failure to resume Watch leading to re-listing from scratch). On top of this a regression test is added to show the issue with a backend being left around after resynchronization, and finally a fix for the issue to set the missing svcName.

loadbalancer: Fix issue in resynchronization of state from api-server which may have left stale backends around until an updated EndpointSlice was received

@joamaki joamaki added release-note/bug This PR fixes an issue in a previous release of Cilium. backport/author The backport will be carried out by the author of the PR. needs-backport/1.19 This PR / issue needs backporting to the v1.19 branch labels Mar 10, 2026
joamaki added 3 commits March 11, 2026 09:27
This allows simulating a resynchronization, e.g. disconnect to
api-server and being behind enough that Watch() cannot be retried
from a specific resource version and a full resync with listing is
required.

The k8s/resync command takes the resource name, optional namespace
and new list of objects. Old objects are deleted and all matching
Watch() calls are aborted.

Signed-off-by: Jussi Maki <jussi@isovalent.com>
Regression test for a bug in resynchronization of endpoint slices
which causes an orphan backend. Test is currently failing which
the next commit will fix.

Signed-off-by: Jussi Maki <jussi@isovalent.com>
The initial synchronization of endpoint slices is not setting
the `svcName` in the `endpointsEvent` stored in `currentEndpoints`.
If we get a resynchronization without an endpoint slice having updated
this event entry we will fail in removing the backend as we'll call
`ReleaseBackends` with empty service name.

Fix the issue by setting `svcName`. The regression test added in
previous commit now passes.

Fixes: daf41d1 ("loadbalancer/reflectors: Reflect services directly")
Signed-off-by: Jussi Maki <jussi@isovalent.com>
@joamaki joamaki force-pushed the pr/joamaki/lb-resync-fixes branch from c96c82f to 92dcb85 Compare March 11, 2026 08:28
@joamaki
Copy link
Copy Markdown
Contributor Author

joamaki commented Mar 11, 2026

/test

@joamaki joamaki marked this pull request as ready for review March 11, 2026 13:56
@joamaki joamaki requested review from a team as code owners March 11, 2026 13:56
@joamaki joamaki enabled auto-merge March 13, 2026 14:22
@joamaki joamaki added this pull request to the merge queue Mar 18, 2026
@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Mar 18, 2026
Merged via the queue into cilium:main with commit 7f319b6 Mar 18, 2026
86 of 87 checks passed
@joamaki joamaki deleted the pr/joamaki/lb-resync-fixes branch March 18, 2026 02:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport/author The backport will be carried out by the author of the PR. needs-backport/1.19 This PR / issue needs backporting to the v1.19 branch ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants