Bug report
General Information
- Cilium version (run
cilium version): 1.9.5 and 1.9.6
- Kernel version (run
uname -a): 5.11.11-200.fc33.x86_64
- Orchestration system version in use (e.g.
kubectl version, ...): Kubernetes 1.20.5
- Link to relevant artifacts (policies, deployments scripts, ...):
- Generate and upload a system zip:
Environment details
- Kubernetes 1.20.5 installed with kubeadm; Cilium 1.9.5 and 1.9.6. Cluster-pool IPAM.
- Dual-stack, with IPv6 preferred (node IPs are IPv6; v6 is the "first" pod and service CIDR, etc)
- Control plane nodes are L2-adjacent VMs running on Proxmox; upstream interface is
ens18
- Worker nodes are L2-adjacent bare metal on a different subnet from the control-plane notes; upstream interface is
bond0
- Tunnel and masquerade are disabled. BGP (FRRouting) is used to announce pod IPs for direct routing. Due to heterogeneous nodes, I can't hardcode an external interface (direct routing / nodeport) so this is set to auto-detect.
Description of issue
This is a follow-up from the Slack conversation here.
When all nodes are first booted, kube-proxy-replacement correctly auto-detects the external interface (ens18 on control-plane VMs; bond0 on worker bare-metal nodes).
Sometime later (unknown exactly how long it takes, or what the trigger is), I consistently find that one of two things has occurred:
- KPR is bound to both the correct interface and
cilium_host (with cilium_host selected as the Direct Routing interface), OR
- KPR is bound to only
cilium_host.
In both cases, all NodePort-related services on the host are broken (including MetalLB LoadBalancer services, which rely on NodePort).
In order to get out of this state, I have to reboot the node. No amount of restarting the Cilium pod restores the node to the correct state.
As discussed on Slack, I have an unproven suspicion that this may have something to do with how cilium_host is addressed in a V6 or dual-stack environment:
- For the IPv4 address family,
cilium_host is assigned an address out of the node's pod CIDR.
- For the IPv6 address family,
cilium_host is assigned the same IP as on the external interface. Unclear of the logic here (are we cloning the node IP onto cilium_host?)
I'll do a bit of code spelunking once this is filed, as I'd like to understand both how the cilium_host V6 address is assigned, and the logic by which the interface auto-detection works (and if that could be contributing to this "mis-detection" of cilium_host as a valid interface).
I've attached a sysdump+bugtool from 1.9.5 when one node (usden1storage01) was in the bad state. Upgrading to 1.9.6 (and restarting cilium pods to do so, hitting the trigger noted below) immediately put all the nodes into the "dual bind" state, so I've attached a sysdump from that state as well.
How to reproduce the issue
This seems at least partially correlated to restarts of the Cilium pod on a given node. In order to get into the state where KPR is bound to both cilium_host and the correct external interface, restart the Cilium pod on a "good" node and it will come back up in this state.
I have not found a definitive way to trigger the state where KPR is bound to only cilium_host - I'll update this if/when I shake out a trigger for that.
In any case, as @pchaigno noted in our Slack conversation, KPR probably shouldn't be allowed to bind to cilium_host - since in the vast majority of cases I assume it'd be impossible for that to have any useful outcome.
Bug report
General Information
cilium version): 1.9.5 and 1.9.6uname -a):5.11.11-200.fc33.x86_64kubectl version, ...): Kubernetes 1.20.5usden1storage01) in problem state (KPR bound only to cilium_host): cilium-sysdump-20210503-144026.zipEnvironment details
ens18bond0Description of issue
This is a follow-up from the Slack conversation here.
When all nodes are first booted, kube-proxy-replacement correctly auto-detects the external interface (
ens18on control-plane VMs;bond0on worker bare-metal nodes).Sometime later (unknown exactly how long it takes, or what the trigger is), I consistently find that one of two things has occurred:
cilium_host(withcilium_hostselected as the Direct Routing interface), ORcilium_host.In both cases, all NodePort-related services on the host are broken (including MetalLB LoadBalancer services, which rely on NodePort).
In order to get out of this state, I have to reboot the node. No amount of restarting the Cilium pod restores the node to the correct state.
As discussed on Slack, I have an unproven suspicion that this may have something to do with how
cilium_hostis addressed in a V6 or dual-stack environment:cilium_hostis assigned an address out of the node's pod CIDR.cilium_hostis assigned the same IP as on the external interface. Unclear of the logic here (are we cloning the node IP ontocilium_host?)I'll do a bit of code spelunking once this is filed, as I'd like to understand both how the
cilium_hostV6 address is assigned, and the logic by which the interface auto-detection works (and if that could be contributing to this "mis-detection" ofcilium_hostas a valid interface).I've attached a sysdump+bugtool from 1.9.5 when one node (
usden1storage01) was in the bad state. Upgrading to 1.9.6 (and restarting cilium pods to do so, hitting the trigger noted below) immediately put all the nodes into the "dual bind" state, so I've attached a sysdump from that state as well.How to reproduce the issue
This seems at least partially correlated to restarts of the Cilium pod on a given node. In order to get into the state where KPR is bound to both
cilium_hostand the correct external interface, restart the Cilium pod on a "good" node and it will come back up in this state.I have not found a definitive way to trigger the state where KPR is bound to only
cilium_host- I'll update this if/when I shake out a trigger for that.In any case, as @pchaigno noted in our Slack conversation, KPR probably shouldn't be allowed to bind to
cilium_host- since in the vast majority of cases I assume it'd be impossible for that to have any useful outcome.