CES: add option to create CES directly from pods#38388
CES: add option to create CES directly from pods#38388giorio94 merged 18 commits intocilium:mainfrom
Conversation
3bc44bf to
42456bb
Compare
|
/test |
|
cc @sypakine |
qmonnet
left a comment
There was a problem hiding this comment.
Looks OK for doc change, thanks!
giorio94
left a comment
There was a problem hiding this comment.
I took a first look at the PR, and left a couple of initial comments inline. I still need to fully digest all the logic (which is fairly new to me) though, as well as look at the tests.
This PR doesn't allow to disable the creation of CEPs yet (due to [1]), but if that was the case (which I assume being a follow-up step), CES would become incompatible with the clustermesh-apiserver. This component currently only relies on CEP information (which is fine being only a handful of replicas at most). Essentially, this logic [2] should be extended to work with CES as well, starting either one or the other based on the Cilium configuration. IMO that should be addressed before moving further, to avoid introducing yet another incompatibility between features.
FYI, one of the integration tests failures looks legitimate.
[1]:
Lines 3242 to 3248 in 81666d2
[2]:
|
Thanks for taking a look @giorio94! I will open a PR to ensure compatibility between the clustermesh apiserver and CES is retained with these changes. |
pippolo84
left a comment
There was a problem hiding this comment.
Thanks for the PR! 💯
Spotted a couple of leftovers and left a comment about a minor nit inline, but overall LGTM.
Export WireGuard --enable-wireguard flag for use in Operator. CES controller requires --enable-wireguard to set the encryption of the CES. Signed-off-by: jshr-w <shjayaraman@microsoft.com>
Add config option for `ces-controller-mode`. CES Controller can be allowed to run with 'default' mode, as it functions today, or the new mode 'slim', where CES completely replace CEPs in the cluster and CEPs are no longer created. When in default mode, the DefaultController is used, which watches CEPs and creates CES based off those changes. When in slim mode, the new SlimController will be used instead. Signed-off-by: jshr-w <shjayaraman@microsoft.com>
The CES Manager must behave differently in 'default' and 'slim' mode, as we are required to store different state. The default mode will use the existing ceptocesmap as the datastore, while we introduce a new CESCache as the datastore for slim mode since we will need to correlate all the information when we do not rely on CEPs. We separate out the managers for these modes into a DefaultManager and SlimManager, which implement the Manager interface. The reconciler depends on the Manager interface. Signed-off-by: jshr-w <shjayaraman@microsoft.com>
CES Controller in slim mode will watch CiliumNodes to process updates, particularly to the EncryptionKey that is stored in CESs. This requires watching CiliumNodes to handle update and delete. We will track the set of CEPs on a node, so we know which CESs need to be reconciled on a node update. Adds unit tests for node state. Signed-off-by: jshr-w <shjayaraman@microsoft.com>
The Slim CES Controller process CiliumNode updates with an extra handler to pass updates to CiliumNode encryption keys to the local CES cache, so that the affected CESs can be reconciled on changes. Adds unit tests for CES manager node upsert/delete. Signed-off-by: jshr-w <shjayaraman@microsoft.com>
We extend the CESCache to handle updates to CIDs. We will need to associate pods with their identities for creating CES. The SecIds data structure contains a selected ID, which is used if there are multiple ids associated with a label to minimize churn. Adds unit test for cid state. Signed-off-by: jshr-w <shjayaraman@microsoft.com>
Add flow where controller will process CID updates, update the relevant mappings in the manager and local CES cache. Signed-off-by: jshr-w <shjayaraman@microsoft.com>
Allow the local CES datastore to store CES and namespace information. We also introduce the CESCacher interface, to allow the default and slim CES managers to handle their own local cache types. Signed-off-by: jshr-w <shjayaraman@microsoft.com>
Signed-off-by: jshr-w <shjayaraman@microsoft.com>
Signed-off-by: jshr-w <shjayaraman@microsoft.com>
Add store for pod state to CESCache, along with necessary helper functions. Signed-off-by: jshr-w <shjayaraman@microsoft.com>
We split the reconciler into the default and slim, to pass the appropriate params depending on the configuration initialized. Reconciler requires two new interfaces to handle the default/slim modes, on reconciliation of CES. The slim reconciler will reconcile based on pod state and local CES cache state instead of getting data from the CEP. Signed-off-by: jshr-w <shjayaraman@microsoft.com>
|
Leaving a note; I've followed up with most of the changes requested here - there are a few I will kick to a follow-up PR such as adding more tests. I've noted them in the comments here & added them to the associated issue to ensure tracking for their follow-up. This feature is currently behind a hidden flag and partially implemented at this point. Thanks @giorio94 for your patience and help through the entire review process for this super long piece :) |
When a pod is added/updated or deleted, we insert it into the cache. We only attempt to add it to the CES cache if we have sufficient state, otherwise we wait for the necessary info to be ready on the pod. We will add the pod to the cache whether or not it has been assigned an identity -- in the case it has not, we will receive an update when the identity is created and associate it with the pod later. Add unit tests for cescache and for slim manager flow. Signed-off-by: jshr-w <shjayaraman@microsoft.com>
On initialization of the operator and CES controller, we sync CES from the store to our local cache before the controller runs. To do this, we read the existing store state and create the existing CES in our local cache. Signed-off-by: jshr-w <shjayaraman@microsoft.com>
Add start flow for SlimController to run. Modify test helpers for slim mode to be tested. Signed-off-by: jshr-w <shjayaraman@microsoft.com>
|
/test |
|
/ci-eks |
|
@cilium/operator Gentle ping 🙏 I've also marked this PR as blocked for the moment, due to feature freeze. |
|
Hello, gentle ping again for a review from @cilium/operator |
|
Shreya pointed out that this was previously reviewed by @pippolo84 for the @cilium/operator group, but he has stepped down from the group so GitHub no longer recognizes that - hence it was stuck waiting for operator review despite the interface not explicitly showing this in the top right corner. I re-requested review from the group to nudge it forward. |
|
I'm seeing that @giorio94 reviewed the PR overall in detail. I'm glad he did, though it does feel like we should have pushed back more for incremental changes. >1K LoC is generally not an acceptable size for a change. I can tell from the comments in Marco's review that he feels obligated not to block the PR due to the sheer size and effort you've put into the PR. The functionality is useful. We definitely appreciate the work. On the other hand, this is how we end up with code we're not happy with in the repo because no-one wants to iterate further on the PR. How do we track and resolve the concerns about the operator getting into a state where it can't recover and properly sync back? Should we have an issue for it, and can we concretely describe what it would look like to resolve those concerns? |
|
Hi Joe, that's fair to say. My understanding based on previous comments was that it would be okay to break this into smaller commits, but I would have also been willing to (and we can also do this now) break this into smaller PRs if requested. It is not showing as linked, but I have an issue #38507 where I am tracking the other changes needed for this -- I will flesh some of these items out further. |
|
As long as we're tracking those concerns somehow and reviewers are otherwise happy that this change improves overall code health, we should probably move forward to merge this. Making progress is a healthy step towards improving the code overall (taking inspiration from Google's code review guidelines). @giorio94 does that sound good to you, merge then iterate in the tree? |
|
Thanks Joe (and obviously @jshr-w for all the work you put here), I'll then go ahead and merge this PR. I think it sets reasonable foundations for the feature (which is currently gated behind a hidden flag), and it seems easier to me to keep iterating afterwards. My suggestion in that respect would be prioritize the scaffolding of the script tests infrastructure and the introduction of a few initial script tests. That should be of great help to figure out some of the possible rough edges, increase confidence, as well as allow to incrementally validate future improvements and fixes. |
|
Thanks Marco again for all your guidance through this PR! I will make sure the updates to testing are handled as the first priority to follow this PR. And thanks Joe for your help :) |
This PR makes a change to add an option to create CiliumEndpointSlices directly from pods instead of from CiliumEndpoints. This is a step towards eliminating the need for CEPs when CES are present, which will allow us to reduce pressure created on the kube-apiserver by the CEP CRD at high scale.
Descriptions of changes are in each commit.