Skip to content

bpf:nat When DSR and PER_PACKET_LB are enabled, cilium connectivity tests fail. #41962

@gyutaeb

Description

@gyutaeb

Is there an existing issue for this?

  • I have searched the existing issues

Version

equal or higher than v1.18.2 and lower than v1.19.0

What happened?

When DSR and PER_PACKET_LB are enabled, connections fail if a client pod sends a request to a remote nodes's NodePort service while the server pod is located on the same node as the client. This issue causes the cilium connectivity tests to fail.

❯ cilium connectivity test --test no-policies -v -d
🐛 Detected features:
🐛   bpf-lb-external-clusterip: Disabled
🐛   cidr-match-nodes: Disabled
🐛   cilium-clusterwide-network-policy: Enabled
🐛   cilium-network-policy: Enabled
🐛   clustermesh-enable-endpoint-sync: Disabled
🐛   cni-chaining: Disabled:none
🐛   enable-bgp-control-plane: Disabled
🐛   enable-egress-gateway: Disabled
🐛   enable-encryption-strict-mode: Disabled
🐛   enable-envoy-config: Disabled
🐛   enable-gateway-api: Disabled
🐛   enable-ipsec: Disabled
🐛   enable-local-redirect-policy: Disabled
🐛   enable-policy-secrets-sync: Enabled
🐛   encryption-node: Disabled
🐛   encryption-pod: Disabled:disabled
🐛   endpoint-routes: Disabled
🐛   flavor: Enabled:kind
🐛   health-checking: Enabled
🐛   host-firewall: Disabled
🐛   host-port: Enabled
🐛   icmp-policy: Enabled
🐛   ingress-controller: Disabled
🐛   ipam: Disabled:kubernetes
🐛   ipv4: Enabled
🐛   ipv6: Enabled
🐛   k8s-network-policy: Enabled
🐛   kpr-external-ips: Enabled
🐛   kpr-hostport: Enabled
🐛   kpr-mode: Enabled:true
🐛   kpr-nodeport: Enabled
🐛   kpr-nodeport-acceleration: Disabled:true
🐛   kpr-session-affinity: Enabled
🐛   kpr-socket-lb: Enabled
🐛   kpr-socket-lb-hostns-only: Enabled
🐛   l7-port-ranges: Enabled
🐛   l7-proxy: Enabled
🐛   loadbalancer-l7: Disabled
🐛   monitor-aggregation: Disabled:none
🐛   multicast-enabled: Disabled
🐛   mutual-auth-spiffe: Disabled
🐛   node-local-dns: Disabled
🐛   node-without-cilium: Disabled
🐛   policy-default-local-cluster: Enabled
🐛   policy-secrets-only-from-secrets-namespace: Enabled
🐛   port-ranges: Enabled
🐛   tunnel: Enabled:geneve
🐛   tunnel-port: Disabled:6081
🐛   wireguard-encapsulate: Disabled
ℹ️  Skipping tests that require a node Without Cilium
🐛 Validating Deployments...
⌛ [kind-kind] Waiting for deployment cilium-test-1/client to become ready...
⌛ [kind-kind] Waiting for deployment cilium-test-1/client2 to become ready...
⌛ [kind-kind] Waiting for deployment cilium-test-1/echo-same-node to become ready...
⌛ [kind-kind] Waiting for deployment cilium-test-1/client3 to become ready...
⌛ [kind-kind] Waiting for deployment cilium-test-1/echo-other-node to become ready...
⌛ [kind-kind] Waiting for pod cilium-test-1/client-64d966fcbd-vbk5q to reach DNS server on cilium-test-1/echo-same-node-8468f8f85c-tj6w9 pod...
⌛ [kind-kind] Waiting for pod cilium-test-1/client2-5f6d9498c7-6dqxc to reach DNS server on cilium-test-1/echo-same-node-8468f8f85c-tj6w9 pod...
⌛ [kind-kind] Waiting for pod cilium-test-1/client3-576ffd549d-dwfkr to reach DNS server on cilium-test-1/echo-same-node-8468f8f85c-tj6w9 pod...
⌛ [kind-kind] Waiting for pod cilium-test-1/client-64d966fcbd-vbk5q to reach DNS server on cilium-test-1/echo-other-node-7d99cbc6c9-cclwd pod...
⌛ [kind-kind] Waiting for pod cilium-test-1/client2-5f6d9498c7-6dqxc to reach DNS server on cilium-test-1/echo-other-node-7d99cbc6c9-cclwd pod...
⌛ [kind-kind] Waiting for pod cilium-test-1/client3-576ffd549d-dwfkr to reach DNS server on cilium-test-1/echo-other-node-7d99cbc6c9-cclwd pod...
⌛ [kind-kind] Waiting for pod cilium-test-1/client-64d966fcbd-vbk5q to reach default/kubernetes service...
⌛ [kind-kind] Waiting for pod cilium-test-1/client2-5f6d9498c7-6dqxc to reach default/kubernetes service...
⌛ [kind-kind] Waiting for pod cilium-test-1/client3-576ffd549d-dwfkr to reach default/kubernetes service...
⌛ [kind-kind] Waiting for Service cilium-test-1/echo-other-node to become ready...
⌛ [kind-kind] Waiting for Service cilium-test-1/echo-other-node to be synchronized by Cilium pod kube-system/cilium-6dkmb
⌛ [kind-kind] Waiting for Service cilium-test-1/echo-other-node to be synchronized by Cilium pod kube-system/cilium-tm2pt
⌛ [kind-kind] Waiting for Service cilium-test-1/echo-same-node to become ready...
⌛ [kind-kind] Waiting for Service cilium-test-1/echo-same-node to be synchronized by Cilium pod kube-system/cilium-6dkmb
⌛ [kind-kind] Waiting for Service cilium-test-1/echo-same-node to be synchronized by Cilium pod kube-system/cilium-tm2pt
⌛ [kind-kind] Waiting for NodePort 10.89.0.3:32155 (cilium-test-1/echo-other-node) to become ready...
⌛ [kind-kind] Waiting for NodePort 10.89.0.3:30741 (cilium-test-1/echo-same-node) to become ready...
⌛ [kind-kind] Waiting for NodePort 10.89.0.2:32155 (cilium-test-1/echo-other-node) to become ready...
🐛 [kind-kind] Error checking NodePort 10.89.0.2:32155 (cilium-test-1/echo-other-node): command failed (pod=cilium-test-1/client3-576ffd549d-dwfkr, container=): command terminated with exit code 1:
🐛 [kind-kind] Error checking NodePort 10.89.0.2:32155 (cilium-test-1/echo-other-node): command failed (pod=cilium-test-1/client3-576ffd549d-dwfkr, container=): command terminated with exit code 1:
^Ctimeout reached waiting for NodePort 10.89.0.2:32155 (cilium-test-1/echo-other-node) (last error: command failed (pod=cilium-test-1/client3-576ffd549d-dwfkr, container=): command terminated with exit code 1)

How can we reproduce the issue?

Use DSR and PER_PACKET_LB

bpf-lb-mode: dsr
bpf-lb-dsr-dispatch: geneve
bpf-lb-sock-hostns-only: true
kube-proxy-replacement: true
tunnel-protocol: geneve

And execute connectivity test

cilium connectivity test --test no-policies -v -d

Cilium Version

❯ cilium version
cilium-cli: v0.18.6-11-gdf22f5c34 compiled with go1.25.0 on darwin/arm64
cilium image (default): v1.18.0
cilium image (stable): v1.18.2
cilium image (running): 1.19.0-dev

Kernel Version

❯ kubectl get node -owide
NAME                 STATUS   ROLES           AGE   VERSION   INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                         KERNEL-VERSION            CONTAINER-RUNTIME
kind-control-plane   Ready    control-plane   44m   v1.34.0   10.89.0.3     <none>        Debian GNU/Linux 12 (bookworm)   6.15.9-201.fc42.aarch64   containerd://2.1.3
kind-worker          Ready    <none>          44m   v1.34.0   10.89.0.2     <none>        Debian GNU/Linux 12 (bookworm)   6.15.9-201.fc42.aarch64   containerd://2.1.3

Kubernetes Version

❯ kubectl version
Client Version: v1.34.0
Kustomize Version: v5.7.1
Server Version: v1.34.0

Regression

No response

Sysdump

No response

Relevant log output

Anything else?

No response

Cilium Users Document

  • Are you a user of Cilium? Please add yourself to the Users doc

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

Labels

area/datapathImpacts bpf/ or low-level forwarding details, including map management and monitor messages.area/loadbalancingImpacts load-balancing and Kubernetes service implementationsfeature/dsrRelates to Cilium's Direct-Server-Return feature for KPR.kind/bugThis is a bug in the Cilium logic.kind/community-reportThis was reported by a user in the Cilium community, eg via Slack.staleThe stale bot thinks this issue is old. Add "pinned" label to prevent this from becoming stale.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions