-
Notifications
You must be signed in to change notification settings - Fork 3.7k
cilium connectivity tests fail in v1.17 and later with nodelocaldns and host port #42528
Description
Is there an existing issue for this?
- I have searched the existing issues
Version
equal or higher than v1.17.9 and lower than v1.18.0
What happened?
I'm trying to upgrade to cilium 1.17 but there are widespread connectivity test failures. The general issue is basically the same as originally reported here: #20055 . Our clusters also use kubespray and nodelocaldns listens on 169.254.25.10 of each node.
There was an attempted fix #20683 but it was closed.
Something similar was merged though: https://github.com/cilium/cilium-cli/pull/997/files allowing port 53 to world.
However I believe due to #18644 and #16308 , a change was made in #25298 which causes link-local addresses to be correctly identified as 'host' instead of 'world'. So the previous fix is no longer correct.
How can we reproduce the issue?
Try cilium connectivity tests in v1.17 or later on a cluster with nodelocaldns running , listening with a hostPort on a link local address.
$ cilium -n cilium connectivity test --test "to-entities-world/pod-to-world"
ℹ️ curl stdout:
:0 -> :0 = 000
ℹ️ curl stderr:
curl: (28) Resolving timed out after 2000 milliseconds
curl: (28) Resolving timed out after 2001 milliseconds
curl: (28) Resolving timed out after 2001 milliseconds
curl: (28) Resolving timed out after 2001 milliseconds
[.] Action [to-entities-world/pod-to-world:https-to-one.one.one.one.-ipv4-0: cilium-test-1/client3-7f986c467b-zn68z (10.224.4.63) -> one.one.one.one.-https (one.one.one.one.:443)]
ℹ️ 📜 Applying CiliumNetworkPolicy 'client-egress-to-entities-world' to namespace 'cilium-test-1' on cluster cluster-dev.local..
[-] Scenario [to-entities-world/pod-to-world]
[.] Action [to-entities-world/pod-to-world:http-to-one.one.one.one.-ipv4-0: cilium-test-1/client3-7f986c467b-zn68z (10.224.4.63) -> one.one.one.one.-http (one.one.one.one.:80)]
❌ command "curl --silent --fail --show-error --connect-timeout 2 --max-time 10 -4 -H Host: one.one.one.one -w %{local_ip}:%{local_port} -> %{remote_ip}:%{remote_port} = %{response_code}\n --output /dev/null --retry 3 --retry-all-errors --retry-delay 3 http://one.one.one.one.:80" failed: command failed (pod=cilium-test-1/client3-7f986c467b-zn68z, container=client3): command terminated with exit code 28
It can be reproduced in a cilium-test-1 client pod, but only when the CNP client-egress-to-entities-world is applied:
$ curl --silent --fail --show-error --connect-timeout 2 --max-time 10 -4 -H Host: one.one.one.one -w "%{local_ip}:%{local_port} -> %{remote_ip}:%{remote_port} = %{response_code}\n" --output /dev/null --retry 3 --retry-all-errors --retry-delay 3 http://one.one.one.one.:80
curl: (28) Resolving timed out after 2001 milliseconds
The blocked packets are not shown in hubble! But you can see them this way:
# cilium monitor -t policy-verdict
Listening for events on 4 CPUs with 64x4096 of shared memory
Press Ctrl-C to quit
time="2025-10-31T22:10:36.199116621Z" level=info msg="Initializing dissection cache..." subsys=monitor
Policy verdict log: flow 0x763578d7 local EP ID 506, remote ID host, proto 17, egress, action deny, auth: disabled, match none, 10.224.3.113:53046 -> 169.254.25.10:53 udp
Policy verdict log: flow 0x763578d7 local EP ID 506, remote ID host, proto 17, egress, action deny, auth: disabled, match none, 10.224.3.113:53046 -> 169.254.25.10:53 udp
Policy verdict log: flow 0xf3f5418e local EP ID 506, remote ID host, proto 17, egress, action deny, auth: disabled, match none, 10.224.3.113:57037 -> 169.254.25.10:53 udp
Policy verdict log: flow 0xf3f5418e local EP ID 506, remote ID host, proto 17, egress, action deny, auth: disabled, match none, 10.224.3.113:57037 -> 169.254.25.10:53 udp
Policy verdict log: flow 0x51bab760 local EP ID 506, remote ID host, proto 17, egress, action deny, auth: disabled, match none, 10.224.3.113:50659 -> 169.254.25.10:53 udp
I confirmed that adding the following to the client-egress-to-entities-world CNP fixes this specific case:
- toEntities:
- host
toPorts:
- ports:
- port: "53"
protocol: UDP
- port: "53"
protocol: TCP
Cilium Version
$ cilium -n cilium version
cilium-cli: v0.18.8 compiled with go1.25.3 on linux/amd64
cilium image (default): v1.18.2
cilium image (stable): v1.18.3
cilium image (running): 1.17.9
Kernel Version
N/A
Kubernetes Version
v1.29.10
Regression
It worked in v1.16
Sysdump
No response
Relevant log output
Anything else?
The CNPs already allow port 53 to world so they should really also allow port 53 to host.
Cilium Users Document
- Are you a user of Cilium? Please add yourself to the Users doc
Code of Conduct
- I agree to follow this project's Code of Conduct