-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
ExternalSecret reconciliation silently stops with 2.0.1 #6053
Description
Describe the bug
After upgrading to the External Secrets Operator 2.0.1 helm chart, clusters with a large number of ESO resources silently stop reconciling ESO resources.
One cluster where we observed this issue has the following ESO resource counts:
- ExternalSecret: ~3000
- SecretStore: ~140
- ClusterExternalSecrets: 2
- ClusterSecretStores: 2
We are unable to find any errors or warnings in the ESO controller logs that would indicate issues, the .status.refreshTime for ESO resources just stops updating.
We can tell that reconciliation is not completely broken. Restarting the ESO controller Deployment in a cluster does result in at least a single reconciliation of the cluster's ESO resources (based on .status.refreshTime), but reconciliation silently stops shortly after Deployment restart.
The CPU/Memory resource utilization of the ESO controller and webhook pods was well-under the configured requests/limits in clusters where we observed this issue.
We do not see this issue in clusters that have a smaller number of ESO resources deployed.
After reverting to the ESO 2.0.0 helm charts with no other cluster changes, reconciliation is once again working as-expected in all clusters.
To Reproduce
Steps to reproduce the behavior:
- provide all relevant manifests
# example values.yaml with the relevant ESO configuration for a cluster where we observed this issue external-secrets: replicaCount: 3 concurrent: 50 extraArgs: experimental-enable-vault-token-cache: true client-qps: 100 client-burst: 200
- provide the Kubernetes and ESO version
- GKE Version:
1.33.5 - ESO Version:
2.0.1(viaexternal-secretshelm chart)
- GKE Version:
Expected behavior
Reconciliation should not silently stop.
Screenshots
N/A
Additional context
If there is additional information that would be helpful to provide, please let me know and I can gather it. Unfortunately, at the time this issue occurred we weren't collecting all of the ESO metrics which would make this easier to troubleshoot/diagnose.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status