Centralize policy incremental update logic#39970
Conversation
2ee6ea6 to
fe4d97b
Compare
This comment was marked as resolved.
This comment was marked as resolved.
fe4d97b to
77efa9f
Compare
|
/test |
|
One failing test, but nothing exciting. CI looks good otherwise. This is ready for review. |
77efa9f to
dc63277
Compare
|
/test |
gandro
left a comment
There was a problem hiding this comment.
Awesome cleanup, glad to see this happening! A few minor points of feedback
|
@gandro I had to re-think this a bit. I realized there was still a race on identity deletion and updating the selector cache. The fix was "simple": force the use of So, we should, finally, be OK. PTAL. |
This comment was marked as resolved.
This comment was marked as resolved.
gandro
left a comment
There was a problem hiding this comment.
I don't see any changes to the previous version?
dc63277 to
fa6df1e
Compare
Pushed to the wrong branch 🙈. Green CI was too good to be true. |
|
/test |
fa6df1e to
909504d
Compare
gandro
left a comment
There was a problem hiding this comment.
Awesome work! These parts of IPCache are always a bit scary, but it looks fine to me from what I can tell.
christarazi
left a comment
There was a problem hiding this comment.
Nice work, very clean implementation!
|
/test |
The so-called "IdentityAllocatorOwner" is responsible for pushing newly-allocated identities in to the SelectorCache and distributing incremental updates. This logic is duplicated between the IPCache and the identity cache, but it shouldn't be. This commit extracts the updater to an independent exported cell. The code is otherwise unchanged. Signed-off-by: Casey Callendrello <cdc@isovalent.com>
The SelectorCache cannot guarantee transactional updates when identities are mutated, rather than being merely created or deleted. In this case, the caller is responsible for triggering a full policy recalculation. Add this logic to the identity updater to match the ipcache. Signed-off-by: Casey Callendrello <cdc@isovalent.com>
This allows for synchronous addition of identities. There are certain circumstances where we need to wait for full, end-to-end propagation of incremental identity updates to BPF policy maps. This method exposes that. Signed-off-by: Casey Callendrello <cdc@isovalent.com>
This fixes a potential bug when two callers release and acquire the same identity in short sequence. First of all, it serializes local allocation behind a lock, ensuring that subsequent releases and acquires never happen until after the SelectorCache is updated. Secondly, it exposes a new method, ReleaseLocalIdentities(), which allows for efficient batched release of a large number of identities. Otherwise, the following sequence may result in an inconsistent selector cache: 1. Allocate(lbls=A, notify=True) 2. Release(A, notify=False): deleted 3. Allocate(lbls=A, notify=True); UpdateIdentities(added: A) 4. UpdateIdentities(deleted: A) Future commits will remove `notifyOwner` from all Release methods. Signed-off-by: Casey Callendrello <cdc@isovalent.com>
Previously, the ipcache implemented its own duplicated logic for pushing incremental updates to the policy maps. Switch to the implementation shared with the identity allocator. This prevents possible race conditions where the allocator and ipcache are distributing the same identities simultaneously. As part of this, switch to AllocateLocalIdentity() and ReleaseLocalIdentities(). This means that our allocations will be serialized with any other users of the allocator. Signed-off-by: Casey Callendrello <cdc@isovalent.com>
909504d to
29cff54
Compare
|
/test |
This PR fixes a race condition first noticed by @gandro (it should be said, this race condition does not actually happen in production). This race condition blocks FQDN identity pre-allocation.
We currently have two codepaths used for pushing newly-allocated identities in to the policy engine: the allocator (via the parameter
notifyOwner = true), and the IPCache, which setsnotifyOwner = falseand handles policy updates manually. We should combine these paths -- both for correctness' sake as well as fixing the latent race condition.The race condition was that a new identity may still be in-flight while the allocator reports it as "present", thus the ipcache may incorrectly assume it is already fully propagated.
So, this moves the identity update logic out of the allocator in to it's own cell, then updates the ipcache to use it. In doing so, it unifies incremental updates and prevents interleaved updates.
The ipcache relies heavily on policy updates being completely propagated before proceeding with deletions, so the only serious change was to add a done channel to ensure that all in-flight incremental updates have completed. This solves the race condition noticed previously.
(Note to reviewers: review by commit. One commit is just a cut-and-paste with no code changes. The others are small with new functionality.)