Skip to content

Add and document option to disable HealthCheckNodePort service health server #11168

@gandro

Description

@gandro

When cilium-agent runs alongside kube-proxy with KubeProxyReplacement=Partial, both kube-proxy and cilium-agent will try to serve a health check for the service on the port specified by HealthCheckNodePort. This will cause the following error:

level=error msg="ListenAndServe failed for service health server" error="listen tcp :31659: bind: address already in use" serviceName=test-lb-local-k8s2 serviceNamespace=default subsys=service-healthserver svcHealthCheckNodePort=31659

The error itself means that the health check is currently served by kube-proxy. We should add an option to disable Cilium's HealthCheckNodePort service health server for Cilium deployments where kube-proxy is running and intended to serve the service health checks.

Proposal (after discussion @brb):

  • Introduce a enable-health-check-nodeport (default: true) flag to allow users to opt-out of the service health server running inside cilium-agent when running NodePort BPF
  • If the user selects kube-proxy-replacement=partial, then we will disable enable-health-check-nodeport, as we assume that kube-proxy is intentionally running in parallel to cilium-agent.
  • If the NodePort BPF is enabled (i.e. EnableNodePort=true) either via kube-proxy-replacement=probe or kube-proxy-replacement=strict, then we will keep the service health server enabled unless the explicitly user opts-out.
  • If NodePort BPF is disabled, then cilium-agent will not start the HealthCheckNodePort server anyways, so no action should be required there.
  • We should extend the above error message with a notice that the service health server can be disabled via flag.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions