Skip to content

Centralize policy incremental update logic#39970

Merged
squeed merged 5 commits intocilium:mainfrom
squeed:allocator-update-policy
Jun 13, 2025
Merged

Centralize policy incremental update logic#39970
squeed merged 5 commits intocilium:mainfrom
squeed:allocator-update-policy

Conversation

@squeed
Copy link
Copy Markdown
Contributor

@squeed squeed commented Jun 9, 2025

This PR fixes a race condition first noticed by @gandro (it should be said, this race condition does not actually happen in production). This race condition blocks FQDN identity pre-allocation.

We currently have two codepaths used for pushing newly-allocated identities in to the policy engine: the allocator (via the parameter notifyOwner = true), and the IPCache, which sets notifyOwner = false and handles policy updates manually. We should combine these paths -- both for correctness' sake as well as fixing the latent race condition.

The race condition was that a new identity may still be in-flight while the allocator reports it as "present", thus the ipcache may incorrectly assume it is already fully propagated.

So, this moves the identity update logic out of the allocator in to it's own cell, then updates the ipcache to use it. In doing so, it unifies incremental updates and prevents interleaved updates.

The ipcache relies heavily on policy updates being completely propagated before proceeding with deletions, so the only serious change was to add a done channel to ensure that all in-flight incremental updates have completed. This solves the race condition noticed previously.

(Note to reviewers: review by commit. One commit is just a cut-and-paste with no code changes. The others are small with new functionality.)

@squeed squeed added the kind/cleanup This includes no functional changes. label Jun 9, 2025
@squeed squeed requested a review from a team as a code owner June 9, 2025 20:03
@squeed squeed added the sig/policy Impacts whether traffic is allowed or denied based on user-defined policies. label Jun 9, 2025
@squeed squeed requested review from a team as code owners June 9, 2025 20:03
@squeed squeed added the release-note/misc This PR makes changes that have no direct user impact. label Jun 9, 2025
@squeed squeed requested a review from a team as a code owner June 9, 2025 20:03
@squeed squeed force-pushed the allocator-update-policy branch 2 times, most recently from 2ee6ea6 to fe4d97b Compare June 10, 2025 07:42
@squeed

This comment was marked as resolved.

@squeed squeed force-pushed the allocator-update-policy branch from fe4d97b to 77efa9f Compare June 10, 2025 12:14
@squeed
Copy link
Copy Markdown
Contributor Author

squeed commented Jun 10, 2025

/test

@squeed
Copy link
Copy Markdown
Contributor Author

squeed commented Jun 10, 2025

One failing test, but nothing exciting. CI looks good otherwise. This is ready for review.

@squeed squeed force-pushed the allocator-update-policy branch from 77efa9f to dc63277 Compare June 10, 2025 16:21
Copy link
Copy Markdown
Member

@fristonio fristonio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢

@fristonio
Copy link
Copy Markdown
Member

/test

Copy link
Copy Markdown
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome cleanup, glad to see this happening! A few minor points of feedback

@squeed
Copy link
Copy Markdown
Contributor Author

squeed commented Jun 11, 2025

@gandro I had to re-think this a bit. I realized there was still a race on identity deletion and updating the selector cache.

The fix was "simple": force the use of notifyOwner when releasing local identities. However, because we batch release in the IPCache, I wrote ReleaseLocalIdentities() to make that possible.

So, we should, finally, be OK. PTAL.

@squeed squeed requested a review from gandro June 11, 2025 18:31
@squeed

This comment was marked as resolved.

Copy link
Copy Markdown
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any changes to the previous version?

@squeed squeed force-pushed the allocator-update-policy branch from dc63277 to fa6df1e Compare June 12, 2025 08:36
@squeed
Copy link
Copy Markdown
Contributor Author

squeed commented Jun 12, 2025

I don't see any changes to the previous version?

Pushed to the wrong branch 🙈. Green CI was too good to be true.

@squeed
Copy link
Copy Markdown
Contributor Author

squeed commented Jun 12, 2025

/test

@squeed squeed force-pushed the allocator-update-policy branch from fa6df1e to 909504d Compare June 12, 2025 09:57
Copy link
Copy Markdown
Member

@gandro gandro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work! These parts of IPCache are always a bit scary, but it looks fine to me from what I can tell.

Copy link
Copy Markdown
Member

@christarazi christarazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, very clean implementation!

@squeed
Copy link
Copy Markdown
Contributor Author

squeed commented Jun 13, 2025

/test

squeed added 5 commits June 13, 2025 10:16
The so-called "IdentityAllocatorOwner" is responsible for pushing
newly-allocated identities in to the SelectorCache and distributing
incremental updates. This logic is duplicated between the IPCache and
the identity cache, but it shouldn't be.

This commit extracts the updater to an independent exported cell. The code
is otherwise unchanged.

Signed-off-by: Casey Callendrello <cdc@isovalent.com>
The SelectorCache cannot guarantee transactional updates when identities
are mutated, rather than being merely created or deleted. In this case,
the caller is responsible for triggering a full policy recalculation.

Add this logic to the identity updater to match the ipcache.

Signed-off-by: Casey Callendrello <cdc@isovalent.com>
This allows for synchronous addition of identities. There are certain
circumstances where we need to wait for full, end-to-end propagation of
incremental identity updates to BPF policy maps. This method exposes
that.

Signed-off-by: Casey Callendrello <cdc@isovalent.com>
This fixes a potential bug when two callers release and acquire the same
identity in short sequence. First of all, it serializes local allocation
behind a lock, ensuring that subsequent releases and acquires never
happen until after the SelectorCache is updated.

Secondly, it exposes a new method, ReleaseLocalIdentities(), which
allows for efficient batched release of a large number of identities.

Otherwise, the following sequence may result in an inconsistent selector
cache:

1. Allocate(lbls=A, notify=True)
2. Release(A, notify=False): deleted
3. Allocate(lbls=A, notify=True); UpdateIdentities(added: A)
4. UpdateIdentities(deleted: A)

Future commits will remove `notifyOwner` from all Release methods.

Signed-off-by: Casey Callendrello <cdc@isovalent.com>
Previously, the ipcache implemented its own duplicated logic for pushing
incremental updates to the policy maps. Switch to the implementation
shared with the identity allocator.

This prevents possible race conditions where the allocator and ipcache
are distributing the same identities simultaneously.

As part of this, switch to AllocateLocalIdentity() and
ReleaseLocalIdentities(). This means that our allocations will be
serialized with any other users of the allocator.

Signed-off-by: Casey Callendrello <cdc@isovalent.com>
@squeed squeed force-pushed the allocator-update-policy branch from 909504d to 29cff54 Compare June 13, 2025 08:24
@squeed
Copy link
Copy Markdown
Contributor Author

squeed commented Jun 13, 2025

/test

@squeed squeed enabled auto-merge June 13, 2025 09:28
@squeed squeed added this pull request to the merge queue Jun 13, 2025
Merged via the queue into cilium:main with commit 47ace2d Jun 13, 2025
68 checks passed
@squeed squeed deleted the allocator-update-policy branch June 13, 2025 09:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/cleanup This includes no functional changes. release-note/misc This PR makes changes that have no direct user impact. sig/policy Impacts whether traffic is allowed or denied based on user-defined policies.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants