feat: significantly reduce api calls and introduce partial secret cache#4086
Conversation
Signed-off-by: Mathew Wicks <5735406+thesuperzapper@users.noreply.github.com>
|
/ok-to-test sha=facdfc036c93dfa5bc7fd3041d0353dbad385bee |
|
I'm interested in reviewing this, however, I'll be out next week on Kubecon so I can't promise anything. |
|
For anyone reviewing, there is a discussion here #4099 (comment) about if we should change the behavior of ExternalSecrets with Currently, when
If we want to change this, we can add another check in |
Oh f*ck that should have been inside the above error. How did I miss that? :D :D Thanks. |
|
Thanks for the awesome PR description. That helps a lot to have the right context. |
Skarlso
left a comment
There was a problem hiding this comment.
I did my first review round with questions and some code improvement requests.
Well done, nice work so far!
pkg/controllers/externalsecret/externalsecret_controller_template.go
Outdated
Show resolved
Hide resolved
Signed-off-by: Mathew Wicks <5735406+thesuperzapper@users.noreply.github.com>
Signed-off-by: Mathew Wicks <5735406+thesuperzapper@users.noreply.github.com>
Signed-off-by: Mathew Wicks <5735406+thesuperzapper@users.noreply.github.com>
|
@Skarlso I was chatting with @gusfcarvalho and I realized there was a longstanding bug I could easily fix. The bug was when the See 177b0c6 for the change. |
|
Nice! Thanks for the improvements. |
Signed-off-by: Mathew Wicks <5735406+thesuperzapper@users.noreply.github.com>
Signed-off-by: Mathew Wicks <5735406+thesuperzapper@users.noreply.github.com>
|
@Skarlso I have made another important fix in c3cd973 to prevent endless loops that consume lots of API calls needlessly for "permeant errors" which retrying won't fix:
|
Signed-off-by: Mathew Wicks <5735406+thesuperzapper@users.noreply.github.com>
Signed-off-by: Mathew Wicks <5735406+thesuperzapper@users.noreply.github.com>
|
@thesuperzapper Where are you now with it? I would say, don't do any more fixes. :D Because this is awesome as is and we really would like to get this merged. :) |
|
How the heck did it get worse? :D
We did so many refactors. :D |
|
/ok-to-test sha=4c387ccd01146cfc9fa4e623bec70d4136b5e961 |
|
|
Thank you very much for your contribution! This is a significant increase in performance of ESO and a much needed one. :) Also, reviewing this has been my pleasure that I could learn some more interesting things about the various caching mechanisms of the controller runtime. |
…he (external-secrets#4086) * feat: reduce api calls and introduce partial secret cache Signed-off-by: Mathew Wicks <5735406+thesuperzapper@users.noreply.github.com> * updates from review 1 Signed-off-by: Mathew Wicks <5735406+thesuperzapper@users.noreply.github.com> * updates from review 2 Signed-off-by: Mathew Wicks <5735406+thesuperzapper@users.noreply.github.com> * fix updating CreationPolicy after secret creation Signed-off-by: Mathew Wicks <5735406+thesuperzapper@users.noreply.github.com> * updates from review 3 Signed-off-by: Mathew Wicks <5735406+thesuperzapper@users.noreply.github.com> * prevent loop when two ES claim Owner on the same target secret Signed-off-by: Mathew Wicks <5735406+thesuperzapper@users.noreply.github.com> * updates from review 4 Signed-off-by: Mathew Wicks <5735406+thesuperzapper@users.noreply.github.com> * fix ClusterSecretStore not ready message Signed-off-by: Mathew Wicks <5735406+thesuperzapper@users.noreply.github.com> --------- Signed-off-by: Mathew Wicks <5735406+thesuperzapper@users.noreply.github.com> Co-authored-by: Gergely Brautigam <182850+Skarlso@users.noreply.github.com>



Problem Statement
This PR significantly improves the performance and stability of the ExternalSecrets controller.
While there are many changes (see below), I want to summarize the goals of this PR:
Related Issue
Proposed Changes
I have tested the PR for most use-cases I can think of, and have ensured that the
make testpass, but I can't runmake test.e2elocally.However, there may be subtle behavior changes that I have not considered, so I would appreciate a thorough review.
The following sections cover the specific changes made in this PR.
Changes to caching:
We have a new controller flag
--enable-managed-secrets-cachingwhich defaults totrue.Also, we have a new
reconcile.external-secrets.io/managed=truelabel which is added to all target secrets managed by ExternalSecrets.Previously, we were not caching any Secrets (when
--enable-secrets-caching=false, which is the default).This was causing a lot of unnecessary API calls during every reconcile of each ExternalSecret.
To reduce the number of API calls and still avoid high memory usage on clusters with many secrets, we now have a special cache that only caches secrets with the
reconcile.external-secrets.io/managed=truelabel.Users can disable this cache by setting
--enable-managed-secrets-caching=false.However, this is not recommended, as it's better to increase the memory of the controller to handle all the secrets you want to manage with ExternalSecrets.
Changes when updating target secrets:
We now only update the target secret if the data has changed.
We use the
equality.Semantic.DeepEqual()to ensure we would be making a change before we do it.This change is critical for systems like Kyverno, as they rely on admission events to trigger restarts of deployments when secrets are updated. For example, previously, if an ExternalSecret did not set
spect.target.template.type, it would do an update call every reconcile because it would try and unset theOpaquedefault value.Changes when
spec.creationPolicy=Owner:Previously, some keys might be retained, if they existed previously on the secret before the ExternalSecret took ownership of it.
Now, the ExternalSecret defines the full "desired state" of the target Secret, and any extraneous data keys will be removed. (WARNING: this might mean your target secret keys are "cleaned up" when updating to this PR).
Changes when data key is removed from target
template:Previously, there were strange situations where a data key being removed from an ExternalSecret target template would not result in the key being removed from the target Secret.
Now, the behavior is that any key which was previously managed by the ExternalSecret will be removed if it no longer in the
templateordata/dataFromdefinitions. (WARNING: this might mean your target secret keys are "cleaned up" when updating to this PR)Changes for immutable secrets:
There are two ways a secret could become immutable:
spec.target.template.immutable=true.immutablefield set totrue.We now correctly handle both cases, and we will not attempt to update the target secret's
data, if it's immutable.If the controller updated the
dataof an immutable secret, the ExternalSecret will put into an error status, with a message explaining why. However, we will still update the metadata of the target secret if it changes (for our tracking purposes).Changes when two ExternalSecrets try to claim
creationPolicy=Ownerof the same targetPreviously, we would loop forever in the case that another ExternalSecret (or some other object) was already the owner of the target Secret (because we returned an error on the reconcile loop).
Now we have a nice error condition in the status of the ExternalSecret, and just wait for the target secret to be updated again to see if it's now not owned.
Changes when
spec.refreshInterval=0:We now correctly handle the case where
spec.refreshInterval=0so it matches our documentation:spec.target.templatechanges in the ExternalSecret.Changes when
spec.creationPolicy=Mergeand missing target secret:We have introduced a new
SecretMissingreason which we use to indicate that the target secret does not exist.When an ExternalSecret is in this state, we will check again every
refreshIntervalto see if the target secret has been created.Changes when
spec.creationPolicyis changed away fromOwner:Previously, we would not clean up the
metadata.ownerReferenceson the target secret if the ExternalSecret had its creation policy changed away fromOwner, this led to the possibility of the target secret being deleted when the ExternalSecret was deleted. We also did not clean up thereconcile.external-secrets.io/created-bylabel.We now remove both of these things when the creation policy is changed away from
Owner.Changes to requeue logic:
We now correctly calculate how long we should wait before the next reconcile.
This was inconsistent previously, and sometimes we would wait way longer than
refreshIntervalbetween reconciles.Now we always use our newly added
getRequeueResult()to calculate the next reconcile time.No longer use PATCH for updates:
We now always use
Update()calls rather thanPatch()to update target secrets.The
Patchfunction does not check if the resource has changed before updating it so can be problematic if multiple controllers are updating the same resource. If we get an Update conflict, we requeue immediately.Removals
stautsof ExternalSecrets.spec.immutablefield set to true.errwill never be non-nil when its checked:if errors.Is(err, esv1beta1.NotModifiedErr) {in the code to find what I removed.Future work
We should also cache secrets that are used in SecretStore provider credentials.
Currently, these are still retrieved every loop, and are not currently cached when
--enable-secrets-caching=false.We should apply the changes to how we patch status (using the deferred function) in the other controllers
As they also do unnecessary updates and use
Patch()calls rather thanUpdate()calls.Checklist
git commit --signoffmake testmake reviewable