Is there an existing issue for this?
Version
higher than v1.16.0 and lower than v1.17.0
What happened?
Recently I upgrade my homelab k3s cluster from cilium 1.15.7 to 1.16.0 and experiment with the netkit datapath mode, but I found that all the pods in the flux-system namespace fail to start, the remaining pods in the cluster are working as expected. I tried reboot the node but the issue persists.
In the hubble UI, I found that in netkit mode the k3s kubelet healthcheck packets are dropped by cilium. The flux-system namespace uses some k8s networkpolicies to deny all ingress traffic by default, but it is working fine if I switched to veth mode.
How can we reproduce the issue?
- install a single node k3s cluster without default cni or networkpolicy addons
- It's maybe worth mentioning that I have two network interfaces in my setup - eth0 (default route) and wg0, and my k3s node-ip is the wg0 ip, might also reproduced with eth0 and eth1 setup or just single main interface.
- install cilium 1.16.0, enable netkit datapath mode and use default networkpolicy behavior
- create a test k8s namespace and some k8s ingress/egress networkpolicies, e.g. flux-system
- create a demo k8s deployment with Liveness/Readiness probe
- all the pod in the test namespace are in CrashLoopBackOff state due to failed healthcheck probe
- check the verdicts in hubble ui showing flow dropped due to Policy denied
- One of the top source IP of ingress dropped is from the main network interface - eth0.
Cilium Version
1.16.0
Kernel Version
6.10.1
Kubernetes Version
v1.30.2+k3s1
Regression
No response
Sysdump
cilium-sysdump-20240727-212749.zip
Relevant log output
No response
Anything else?
No response
Cilium Users Document
Code of Conduct
Is there an existing issue for this?
Version
higher than v1.16.0 and lower than v1.17.0
What happened?
Recently I upgrade my homelab k3s cluster from cilium 1.15.7 to 1.16.0 and experiment with the netkit datapath mode, but I found that all the pods in the flux-system namespace fail to start, the remaining pods in the cluster are working as expected. I tried reboot the node but the issue persists.
In the hubble UI, I found that in netkit mode the k3s kubelet healthcheck packets are dropped by cilium. The flux-system namespace uses some k8s networkpolicies to deny all ingress traffic by default, but it is working fine if I switched to veth mode.
How can we reproduce the issue?
Cilium Version
1.16.0
Kernel Version
6.10.1
Kubernetes Version
v1.30.2+k3s1
Regression
No response
Sysdump
cilium-sysdump-20240727-212749.zip
Relevant log output
No response
Anything else?
No response
Cilium Users Document
Code of Conduct