Skip to content

Cilium kube-proxy-replacement sometimes binds to cilium_host, breaking NodePort #16019

@eyanulis

Description

@eyanulis

Bug report

General Information

  • Cilium version (run cilium version): 1.9.5 and 1.9.6
  • Kernel version (run uname -a): 5.11.11-200.fc33.x86_64
  • Orchestration system version in use (e.g. kubectl version, ...): Kubernetes 1.20.5
  • Link to relevant artifacts (policies, deployments scripts, ...):
  • Generate and upload a system zip:

Environment details

  • Kubernetes 1.20.5 installed with kubeadm; Cilium 1.9.5 and 1.9.6. Cluster-pool IPAM.
  • Dual-stack, with IPv6 preferred (node IPs are IPv6; v6 is the "first" pod and service CIDR, etc)
  • Control plane nodes are L2-adjacent VMs running on Proxmox; upstream interface is ens18
  • Worker nodes are L2-adjacent bare metal on a different subnet from the control-plane notes; upstream interface is bond0
  • Tunnel and masquerade are disabled. BGP (FRRouting) is used to announce pod IPs for direct routing. Due to heterogeneous nodes, I can't hardcode an external interface (direct routing / nodeport) so this is set to auto-detect.

Description of issue
This is a follow-up from the Slack conversation here.

When all nodes are first booted, kube-proxy-replacement correctly auto-detects the external interface (ens18 on control-plane VMs; bond0 on worker bare-metal nodes).

Sometime later (unknown exactly how long it takes, or what the trigger is), I consistently find that one of two things has occurred:

  • KPR is bound to both the correct interface and cilium_host (with cilium_host selected as the Direct Routing interface), OR
  • KPR is bound to only cilium_host.

In both cases, all NodePort-related services on the host are broken (including MetalLB LoadBalancer services, which rely on NodePort).

In order to get out of this state, I have to reboot the node. No amount of restarting the Cilium pod restores the node to the correct state.

As discussed on Slack, I have an unproven suspicion that this may have something to do with how cilium_host is addressed in a V6 or dual-stack environment:

  • For the IPv4 address family, cilium_host is assigned an address out of the node's pod CIDR.
  • For the IPv6 address family, cilium_host is assigned the same IP as on the external interface. Unclear of the logic here (are we cloning the node IP onto cilium_host?)

I'll do a bit of code spelunking once this is filed, as I'd like to understand both how the cilium_host V6 address is assigned, and the logic by which the interface auto-detection works (and if that could be contributing to this "mis-detection" of cilium_host as a valid interface).

I've attached a sysdump+bugtool from 1.9.5 when one node (usden1storage01) was in the bad state. Upgrading to 1.9.6 (and restarting cilium pods to do so, hitting the trigger noted below) immediately put all the nodes into the "dual bind" state, so I've attached a sysdump from that state as well.

How to reproduce the issue

This seems at least partially correlated to restarts of the Cilium pod on a given node. In order to get into the state where KPR is bound to both cilium_host and the correct external interface, restart the Cilium pod on a "good" node and it will come back up in this state.

I have not found a definitive way to trigger the state where KPR is bound to only cilium_host - I'll update this if/when I shake out a trigger for that.

In any case, as @pchaigno noted in our Slack conversation, KPR probably shouldn't be allowed to bind to cilium_host - since in the vast majority of cases I assume it'd be impossible for that to have any useful outcome.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/datapathImpacts bpf/ or low-level forwarding details, including map management and monitor messages.area/kprAnything related to our kube-proxy replacement.kind/bugThis is a bug in the Cilium logic.kind/community-reportThis was reported by a user in the Cilium community, eg via Slack.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions