-
Notifications
You must be signed in to change notification settings - Fork 3.7k
KPR misbehaving when a kubernetes service of type LB advertise a node IP in its status #43206
Description
Is there an existing issue for this?
- I have searched the existing issues
Version
Reproduced on main (944e308), a user also reported some issues in 1.18.4 -> #42890 (this issue is a mix of GW-API and NodeIPAM issues so I opened a separate one)
What happened?
If a node IP is advertised in a service load balancer when KPR is enabled various networks issues are happening (see the end of the reproducer below for more details). If KPR is not enabled everything works as expected.
How can we reproduce the issue?
Reproduced it in main (easiest for me but most likely reproducible in other versions):
make kind kind-image- append the following config in
contrib/testing/kind-common.yaml:
defaultLBServiceIPAM: nodeipam
nodeIPAM:
enabled: true
kubeProxyReplacement: true
- remove kube-proxy:
kubectl delete ds -n kube-system kube-proxy+docker exec -it kind-control-plane /bin/bash -c 'iptables-save | grep -v KUBE | iptables-restore'+docker exec -it kind-worker /bin/bash -c 'iptables-save | grep -v KUBE | iptables-restore' - Apply the following yaml:
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:latest
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: nginx-service
labels:
app: nginx
spec:
type: LoadBalancer
selector:
app: nginx
ports:
- name: http
protocol: TCP
port: 8484
targetPort: 80
NodeIPAM will start advertising the node IP of the worker (it exclude the control plane node by default) in the service LoadBalancer like that (didn't tested but most likely reproducible by just patching the service status since that's the only thing that NodeIPAM is doing):
status:
loadBalancer:
ingress:
- ip: 172.21.0.3
ipMode: VIP
And various issues are now happening in the worker node, like cannot being able to see logs of any pods in the worker via kubernetes API (guessing kubernetes API cannot reach the kubelet there anymore), operator fail readiness/liveness, while having a shell in one of the kind nodes a curl to the service IP (via cluster IP or the node ip) only succeeds erratically
Cilium Version
Reproduced in main it while helping a user that reported those issues in 1.18.4 (#42890) so it probably affect at least 1.18.4 and main
The commit I reproduced in main from was: 944e308
Kernel Version
Linux LPFR0512 6.8.0-88-generic #89-Ubuntu SMP PREEMPT_DYNAMIC Sat Oct 11 01:02:46 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Kubernetes Version
v1.34.0
Regression
Most likely but I don't know when it broke unfortunately
Sysdump
I don't know if it will help but the sysdump of my test cluster is here:
cilium-sysdump-20251209-001828.zip
Relevant log output
Anything else?
No response
Cilium Users Document
- Are you a user of Cilium? Please add yourself to the Users doc
Code of Conduct
- I agree to follow this project's Code of Conduct