-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Helm hubble-generate-cert job blocks install with --wait due to post-install scheduling #40381
Description
Is there an existing issue for this?
- I have searched the existing issues
Version
equal or higher than v1.17.5 and lower than v1.18.0
What happened?
Attempting to install cilium via helm with hubble.tls.auto.method = cronJob with --wait hangs until timeout as hubble pods cannot reach ready state.
Being able to rely on --wait is important for automation, as it's the primary mechanism for determining that resources are correctly deployed and ready in the cluster before attempting to load dependent resources.
The issue is that the hubble-relay pod relies on the certificates that would be generated by the hubble-generate-cert job, but that job is annotated with "helm.sh/hook": post-install, post-upgrade which per the helm lifecycle docs means that it will not be deployed until all reasources reach ready-state when using --wait, and this creates a dependency deadlock.
I assume the same is true for both hubble and clustermesh, since they use the same mechanism, though I've only tested with hubble at this stage.
| "helm.sh/hook": post-install,post-upgrade |
cilium/install/kubernetes/cilium/templates/clustermesh-apiserver/tls-cronjob/job.yaml
Line 15 in 1facc6a
| "helm.sh/hook": post-install,post-upgrade |
How can we reproduce the issue?
- Set helm values:
hubble.enabled=true
hubble.tls.enabled=true
hubble.tls.auto.enabled=true
hubble.tls.auto.method="cronJob" - Install cilium via helm with
--wait(or use terraform/pulumi/flux/etc, which all rely onwaitbehaviour)
Cilium Version
v1.17.5
Kernel Version
6.15.4
Kubernetes Version
v1.33.2
Regression
No response
Sysdump
No response
Relevant log output
Anything else?
No response
Cilium Users Document
- Are you a user of Cilium? Please add yourself to the Users doc
Code of Conduct
- I agree to follow this project's Code of Conduct