Skip to content

cilium connectivity tests fail in v1.17 and later with nodelocaldns and host port #42528

@rptaylor

Description

@rptaylor

Is there an existing issue for this?

  • I have searched the existing issues

Version

equal or higher than v1.17.9 and lower than v1.18.0

What happened?

I'm trying to upgrade to cilium 1.17 but there are widespread connectivity test failures. The general issue is basically the same as originally reported here: #20055 . Our clusters also use kubespray and nodelocaldns listens on 169.254.25.10 of each node.

There was an attempted fix #20683 but it was closed.
Something similar was merged though: https://github.com/cilium/cilium-cli/pull/997/files allowing port 53 to world.

However I believe due to #18644 and #16308 , a change was made in #25298 which causes link-local addresses to be correctly identified as 'host' instead of 'world'. So the previous fix is no longer correct.

How can we reproduce the issue?

Try cilium connectivity tests in v1.17 or later on a cluster with nodelocaldns running , listening with a hostPort on a link local address.

$ cilium -n cilium connectivity test --test "to-entities-world/pod-to-world"

  ℹ️  curl stdout:
  :0 -> :0 = 000
  ℹ️  curl stderr:
  curl: (28) Resolving timed out after 2000 milliseconds
curl: (28) Resolving timed out after 2001 milliseconds
curl: (28) Resolving timed out after 2001 milliseconds
curl: (28) Resolving timed out after 2001 milliseconds
  
  [.] Action [to-entities-world/pod-to-world:https-to-one.one.one.one.-ipv4-0: cilium-test-1/client3-7f986c467b-zn68z (10.224.4.63) -> one.one.one.one.-https (one.one.one.one.:443)]
  ℹ️  📜 Applying CiliumNetworkPolicy 'client-egress-to-entities-world' to namespace 'cilium-test-1' on cluster cluster-dev.local..
  [-] Scenario [to-entities-world/pod-to-world]
  [.] Action [to-entities-world/pod-to-world:http-to-one.one.one.one.-ipv4-0: cilium-test-1/client3-7f986c467b-zn68z (10.224.4.63) -> one.one.one.one.-http (one.one.one.one.:80)]
  ❌ command "curl --silent --fail --show-error --connect-timeout 2 --max-time 10 -4 -H Host: one.one.one.one -w %{local_ip}:%{local_port} -> %{remote_ip}:%{remote_port} = %{response_code}\n --output /dev/null --retry 3 --retry-all-errors --retry-delay 3 http://one.one.one.one.:80" failed: command failed (pod=cilium-test-1/client3-7f986c467b-zn68z, container=client3): command terminated with exit code 28

It can be reproduced in a cilium-test-1 client pod, but only when the CNP client-egress-to-entities-world is applied:

$ curl --silent --fail --show-error --connect-timeout 2 --max-time 10 -4 -H Host: one.one.one.one -w "%{local_ip}:%{local_port} -> %{remote_ip}:%{remote_port} = %{response_code}\n" --output /dev/null --retry 3 --retry-all-errors --retry-delay 3 http://one.one.one.one.:80
curl: (28) Resolving timed out after 2001 milliseconds

The blocked packets are not shown in hubble! But you can see them this way:

# cilium monitor -t policy-verdict
Listening for events on 4 CPUs with 64x4096 of shared memory
Press Ctrl-C to quit
time="2025-10-31T22:10:36.199116621Z" level=info msg="Initializing dissection cache..." subsys=monitor
Policy verdict log: flow 0x763578d7 local EP ID 506, remote ID host, proto 17, egress, action deny, auth: disabled, match none, 10.224.3.113:53046 -> 169.254.25.10:53 udp
Policy verdict log: flow 0x763578d7 local EP ID 506, remote ID host, proto 17, egress, action deny, auth: disabled, match none, 10.224.3.113:53046 -> 169.254.25.10:53 udp
Policy verdict log: flow 0xf3f5418e local EP ID 506, remote ID host, proto 17, egress, action deny, auth: disabled, match none, 10.224.3.113:57037 -> 169.254.25.10:53 udp
Policy verdict log: flow 0xf3f5418e local EP ID 506, remote ID host, proto 17, egress, action deny, auth: disabled, match none, 10.224.3.113:57037 -> 169.254.25.10:53 udp
Policy verdict log: flow 0x51bab760 local EP ID 506, remote ID host, proto 17, egress, action deny, auth: disabled, match none, 10.224.3.113:50659 -> 169.254.25.10:53 udp

I confirmed that adding the following to the client-egress-to-entities-world CNP fixes this specific case:

  - toEntities:
    - host
    toPorts:
    - ports:
      - port: "53"
        protocol: UDP
      - port: "53"
        protocol: TCP

Cilium Version

$ cilium -n cilium version
cilium-cli: v0.18.8 compiled with go1.25.3 on linux/amd64
cilium image (default): v1.18.2
cilium image (stable): v1.18.3
cilium image (running): 1.17.9

Kernel Version

N/A

Kubernetes Version

v1.29.10

Regression

It worked in v1.16

Sysdump

No response

Relevant log output

Anything else?

The CNPs already allow port 53 to world so they should really also allow port 53 to host.

Cilium Users Document

  • Are you a user of Cilium? Please add yourself to the Users doc

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/cliImpacts the command line interface of any command in the repository.area/datapathImpacts bpf/ or low-level forwarding details, including map management and monitor messages.kind/bugThis is a bug in the Cilium logic.kind/community-reportThis was reported by a user in the Cilium community, eg via Slack.sig/policyImpacts whether traffic is allowed or denied based on user-defined policies.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions