Skip to content

Helm hubble-generate-cert job blocks install with --wait due to post-install scheduling #40381

@pdf

Description

@pdf

Is there an existing issue for this?

  • I have searched the existing issues

Version

equal or higher than v1.17.5 and lower than v1.18.0

What happened?

Attempting to install cilium via helm with hubble.tls.auto.method = cronJob with --wait hangs until timeout as hubble pods cannot reach ready state.

Being able to rely on --wait is important for automation, as it's the primary mechanism for determining that resources are correctly deployed and ready in the cluster before attempting to load dependent resources.

The issue is that the hubble-relay pod relies on the certificates that would be generated by the hubble-generate-cert job, but that job is annotated with "helm.sh/hook": post-install, post-upgrade which per the helm lifecycle docs means that it will not be deployed until all reasources reach ready-state when using --wait, and this creates a dependency deadlock.

I assume the same is true for both hubble and clustermesh, since they use the same mechanism, though I've only tested with hubble at this stage.

"helm.sh/hook": post-install,post-upgrade

How can we reproduce the issue?

  1. Set helm values:
    hubble.enabled=true
    hubble.tls.enabled=true
    hubble.tls.auto.enabled=true
    hubble.tls.auto.method="cronJob"
  2. Install cilium via helm with --wait (or use terraform/pulumi/flux/etc, which all rely on wait behaviour)

Cilium Version

v1.17.5

Kernel Version

6.15.4

Kubernetes Version

v1.33.2

Regression

No response

Sysdump

No response

Relevant log output

Anything else?

No response

Cilium Users Document

  • Are you a user of Cilium? Please add yourself to the Users doc

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

Labels

area/agentCilium agent related.area/helmImpacts helm charts and user deployment experiencearea/hubbleImpacts hubble server or relaykind/bugThis is a bug in the Cilium logic.kind/community-reportThis was reported by a user in the Cilium community, eg via Slack.needs/triageThis issue requires triaging to establish severity and next steps.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions