Cilium kube-proxy-replacement sometimes binds to cilium_host, breaking NodePort

## Bug report



**General Information**

- Cilium version (run `cilium version`): 1.9.5 and 1.9.6
- Kernel version (run `uname -a`): `5.11.11-200.fc33.x86_64`
- Orchestration system version in use (e.g. `kubectl version`, ...): Kubernetes 1.20.5
- Link to relevant artifacts (policies, deployments scripts, ...):
- Generate and upload a system zip:
  - 1.9.5 sysdump with one node (`usden1storage01`) in problem state (KPR bound only to cilium_host): [cilium-sysdump-20210503-144026.zip](https://github.com/cilium/cilium/files/6424447/cilium-sysdump-20210503-144026.zip)
  - 1.9.5 bugtool from affected node: [cilium-bugtool-20210503-203727.157+0000-UTC-382197453.tar.gz](https://github.com/cilium/cilium/files/6424456/cilium-bugtool-20210503-203727.157%2B0000-UTC-382197453.tar.gz)
  - 1.9.6 sysdump with *all* nodes in problem state (KPR bound to default route interface & cilium_host): [cilium-sysdump-20210503-154752.zip](https://github.com/cilium/cilium/files/6424457/cilium-sysdump-20210503-154752.zip)

**Environment details**
- Kubernetes 1.20.5 installed with kubeadm; Cilium 1.9.5 and 1.9.6. Cluster-pool IPAM.
- Dual-stack, with IPv6 preferred (node IPs are IPv6; v6 is the "first" pod and service CIDR, etc)
- Control plane nodes are L2-adjacent VMs running on Proxmox; upstream interface is `ens18`
- Worker nodes are L2-adjacent bare metal on a different subnet from the control-plane notes; upstream interface is `bond0`
- Tunnel and masquerade are disabled. BGP (FRRouting) is used to announce pod IPs for direct routing. Due to heterogeneous nodes, I can't hardcode an external interface (direct routing / nodeport) so this is set to auto-detect.

**Description of issue**
This is a follow-up from the Slack conversation [here](https://cilium.slack.com/archives/C53TG4J4R/p1620072235442800).

When all nodes are first booted, kube-proxy-replacement correctly auto-detects the external interface (`ens18` on control-plane VMs; `bond0` on worker bare-metal nodes).

Sometime later (unknown exactly how long it takes, or what the trigger is), I consistently find that one of two things has occurred:
- KPR is bound to _both_ the correct interface and `cilium_host` (with `cilium_host` selected as the Direct Routing interface), OR
- KPR is bound to _only_ `cilium_host`.

In both cases, all NodePort-related services on the host are broken (including MetalLB LoadBalancer services, which rely on NodePort).

In order to get out of this state, I have to reboot the node. No amount of restarting the Cilium pod restores the node to the correct state.

As discussed on Slack, I have an unproven suspicion that this _may_ have something to do with how `cilium_host` is addressed in a V6 or dual-stack environment:
- For the IPv4 address family, `cilium_host` is assigned an address out of the node's pod CIDR.
- For the IPv6 address family, `cilium_host` is assigned the same IP as on the external interface. Unclear of the logic here (are we cloning the node IP onto `cilium_host`?)

I'll do a bit of code spelunking once this is filed, as I'd like to understand both how the `cilium_host` V6 address is assigned, and the logic by which the interface auto-detection works (and if that could be contributing to this "mis-detection" of `cilium_host` as a valid interface).

I've attached a sysdump+bugtool from 1.9.5 when one node (`usden1storage01`) was in the bad state. Upgrading to 1.9.6 (and restarting cilium pods to do so, hitting the trigger noted below) immediately put _all_ the nodes into the "dual bind" state, so I've attached a sysdump from that state as well.

**How to reproduce the issue**

This seems at least partially correlated to restarts of the Cilium pod on a given node. In order to get into the state where KPR is bound to _both_ `cilium_host` and the correct external interface, restart the Cilium pod on a "good" node and it will come back up in this state.

I have not found a definitive way to trigger the state where KPR is bound to _only_ `cilium_host` - I'll update this if/when I shake out a trigger for that.

In any case, as @pchaigno noted in our Slack conversation, KPR probably shouldn't be allowed to bind to `cilium_host` - since in the vast majority of cases I assume it'd be impossible for that to have any useful outcome.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cilium kube-proxy-replacement sometimes binds to cilium_host, breaking NodePort #16019

Bug report

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cilium kube-proxy-replacement sometimes binds to cilium_host, breaking NodePort #16019

Description

Bug report

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions